Integration Of Data Sources In Data Mining

After data cleaning steps in data mining. You need integration of data sources in data mining. It means combining disparate data sources into a single schematic structure.

Advertisements

There are two major type of integration – schema integration and data integration.

Schema integration forms an integrated schematic structure from the disparate data sources.

The data integration is about cleaning and merging data from different sources.

Schema Integration

Consider the following schemata:

Cars (serial No, model, color, stereo, glass-tint,...)
and
Auto(serienNr, modelle, farbe)

There are two tables and one in German. With this type of data there are few challenges listed below.

Advertisements
  • Naming difference
  • Structural difference
  • Data type difference
  • Missing fields
  • Semantic differences

Generic Architecture of Integrator

The generic architecture of an integrator has components such as extractors, mediator, data sources.

Figure 1 - Generic Architecture of an Integrator
Figure 1 – Generic Architecture of an Integrator

Extractor maps into a standard schema set. The extractors are responsible for transferring data from operational database to data warehouse.

Mediators constructs to overall schema. It is the one responsible for data integration from heterogeneous data sources.

Wrapper / Extractor

Creates a common view across all data sources. It bridges the differences in the naming, type and schema structure. Wrappers do not physically extract data from the data sources, but, send a local query to the target database. The result from this local query is converted into a relational form.

Mediator / Constructor

The mediator constructs an integrated schematic structure and performs data integration and populate the data warehouse.

Tools for Data Cleaning and Integration – dfPower

  • It is from Dataflux corporation.
  • has a de-duplication engine.
  • analyze data based on values and no of occurrences.
  • does not support detection of semantic duplicates based on user specified rules
  • permits duplicates to be grouped or merged.

Website: http://support.sas.com/software/products/dfdmstudioserver/index.html

Advertisements

Ads Blocker Image Powered by Code Help Pro

Ads Blocker Detected!!!

We have detected that you are using extensions to block ads. Please support us by disabling these ads blocker.