Implementation details

Multi-Tenant, Scalable, and Highly Available On-Prem Data Platform

Implementation details – Data Integration and Quality

Client has opted for Talend Data Management Platform which includes the following capabilities:

Design and Productivity Tools (Studio)

Talend Studio is a software that you download and install to visually create and test Jobs. Studio features include:

  • Control and orchestrate data flows and data integrations with master Jobs
  • Map, aggregate, sort, enrich and merge data
  • Team collaboration with shared repository
  • Continuous integration 
  • Audit, Job compare, impact analysis, testing, debugging and tuning
  • Metadata bridge for metadata import/export and centralized metadata management
  • Distant run and parallelization
  • Dynamic schema, re-usable Joblets and reference projects
  • Wizards and interactive data viewer
  • Versioning
  • Export and execute standalone Jobs in runtime environments
  • Automatic documentation
  • Controlled patch management

Studio Connectors

Talend Studio includes the following connectors for Job creation:

  • RDBMS, Streaming Message Queues, Cloud DB, Cloud Storage, SaaS / Business, Big Data, DB for Analytics

Full list of components:

https://www.talendforge.org/components/index.php?version=255&edition=8&showAll=1

Management and Monitoring for Jobs

Talend Administration Center, a software to manage Talend applications and components as well as the administrative features and configurations that surround them:

  • Ability to manage or view users, permissions, projects, execution engines
  • Real-Time statistics to track down rejected records or where executions have failed
  • Design and schedule plans to chain or parallelize tasks including error recovery
  • Time and event-based scheduler for tasks and plans
  • Job execution logs are collected and can be viewed
  • Audit logs are stored in files for reference and compliance
  • High availability, load balancing, failover for tasks and plans executions
  • Engine clusters for Jobs
  • Single Sign-On (SSO) integration with several SSO providers

Data Quality

Talend Data Management Platform includes data quality features to profile, cleanse and mask data. Data quality features include:

  • Data profiling and analytics with graphical charts and drilldown data
  • Data privacy with masking and encryption
  • Automated data standardization, cleansing and rules enforcement
  • Data quality data mart containing the analyses and reports executed in Talend Studio
  • Semantic discovery with automatic detection of patterns
  • Data sampling
  • Enrichment, harmonization, fuzzy matching and de-duplication
  • Pattern library
  • Advanced Data Profiling:
    • Fraud pattern detection using Benford Law
    • Advanced statistics with indicator thresholds
    • Column set analysis 
    • Advanced matching analysis 
    • Time column correlation analysis

Detalji implementacije: Data Catalog

Client has opted for Qlik Talend Data Catalog Advanced Edition whic includes the following capabilities:

  • Faceted search, data sampling, semantic discovery, categorizing and auto-profiling
  • Social curation with data tagging, comments, review, promotion, certification
  • Data relationship discovery and certification
  • Automatic discovery of the data lake and other data stores