Harvester™ accelerates information integration and data sharing by automating the discovery and mapping process.
Every data integration initiative – whether it supports better decision making, a merger/acquisition, regulatory compliance, or other business need – first requires the ability to understand what data is available and how data relates between sources. This process must occur before an organization can effectively transfer, merge, and integrate the disparate data sources that are required to help manage various business activities.
Though critical, this process of understanding – or "discovering" – data, and determining where relationships exist across sources – often referred to as "mapping" data relationships – is very costly and time consuming, requiring analysts to review data sources and fields one by one. The process is made more difficult if an organization lacks subject matter expertise or documentation for their data sources.
Sypherlink Harvester automates and accelerates the traditionally manual discovery and mapping processes, significantly reducing the time, effort and cost required, while increasing the quality of data involved. In addition, Harvester provides a flexible, shareable platform which can be leveraged across various data integration initiatives, helping organizations move away from one-off work to embrace organizational best practices.
Automated Data and Metadata Discovery
Harvester automatically discovers key information, including metadata, database statistics, sample data and data profiling metrics from a variety of data sources, including relational, non-relational and non-traditional data sources. This information is leveraged by Harvester to determine where relationships exist across systems.
Harvester is made up of two components: Harvester Analyzer, which performs the data analysis and identifies matches among data sources, and Harvester Relationship Manager, which is used to review and validate the automated analysis results.
Harvester Analyzer: Automated Many-to-Many Analysis
Harvester Analyzer utilizes unique, patented heuristics-matching technology to speed the analysis of relationships across multiple data sources. This analysis is especially valuable in projects where there is limited information or domain expertise available about data sources and their contents. The software examines both the database field content and the field metadata – such as field names and field attributes. This analysis can be performed on multiple systems simultaneously, further accelerating the previously tedious and time-consuming mapping process. Users have the ability to define which data sources are the source(s) and which are the target.
Harvester Analyzer performs two types of heuristics analysis against the data within the defined sources:
- Name Matching – Automatically identifies similar or matching column names. Name Matching is beneficial in handling abbreviations or misspellings in schema names. For example, the column name "LastName" and "LName".
- Data Matching –
Automatically identifies similar or matching content between columns. For example, the value "Smith" appearing across two columns would be considered a Data Match.
Users can refine the automated results achieved by turning individual heuristics "on/off", or weighting their importance based on the specific project and types of data sources.
For example, in the early stages of a business intelligence / data warehouse initiative, the "Name Matching" heuristic would be especially useful when determining the design of the data warehouse. Because the target warehouse would not yet be populated, Harvester would be run against the existing data sources to determine the best data for populating the warehouse.
This user-controlled functionality is also helpful when there is not full access to the data source, but just to the data source schema, allowing the software to analyze just the names of the columns and tables.
The results from Harvester Analyzer are used to determine confidence scoring, or the likelihood that a relationship exists between fields. This information is presented to the user for quick review and validation.
Harvester Relationship Manager: An Intuitive Mapping Interface
Harvester Relationship Manager is a powerful platform for allowing data architects and analysts to review, approve and manage the results of the automated data discovery and mapping process.
The intuitive, point-and-click interface allows users to view database relationships – both direct and indirect – and the underlying source data. It provides visual feedback to the user via detailed progress displays to enable users to track the mapping process from start to finish and simplify the mapping of exceptions found during the automated discovery and mapping process.
A Powerful Data Management Platform
Harvester is a powerful and flexible platform for supporting the ongoing need to manage relationships across critical data assets. Harvester provides central management and administration for all of the applicationís analyses, generated relationships, and both automated and manual mappings. This information can be easily accessed via reports, enabling the user to track a projectís overall progress.
The software also enables project collaboration, providing the ability to partition and assign individual mapping and ETL tasks based on specific schemas/tables in designated sources and targets. Once complete, these individual tasks can then be imported back into the master project.
Accelerates ETL and Complementary Data Management Tools
Harvester speeds the implementation of popular Extract, Transform and Load (ETL), design and modeling, and metadata management tools, and feeds Harvester Integrator for data integration.. For example, the mapping relationships created and validated within Harvester can be used to generate a definition of ETL instructions. This allows the analyst to design and develop ETL constructs earlier in the overall integration process. Harvester also provides input and output stream objects, joiner and database lookup objects, a sequence generator object, a local constants object, and extensible function objects. Once created in Harvester, these objects are then seamlessly passed downstream to third-party ETL tools for execution.
In addition, Harvester can automatically generate an ETL specification document, which is a detailed spreadsheet used by ETL architects for the review and approval of ETL plans.
Harvester’s seamless integration into the data management process is simple and transparent to the user. Regardless of the combination of tools, Harvester accelerates the implementation; reduces risk, cost and resources; and maximizes the value of development and reengineering efforts.
Although many applications exist to facilitate other steps in the integration chain, such as data profiling, cleansing, quality and movement, none provide the unique capabilities that Sypherlink Harvester offers to drive cost out of the process.