We now discuss the data mining architecture and its components. We will learn about the functionality of each component and its role in the data mining system.
These are the components that found in a typical data mining system. In some systems, the components are integrated into one, however, the functionality is different at different time with data sets.
Databases and Data warehouse
The databases, data warehouse, world wide web, and other repositories are one or more set of databases on which data cleaning, data integration is performed.
All information is stored in database servers or data warehouse server responsible for fetching relevant data based on user’s data mining request.
Domain Knowledge Base
The domain knowledge base guide the search or evaluate the interesting patterns in the result of a data mining query. There are knowledge such as concept hierarchies that explore different types of hierarchical relationships in data. The attributes and values are organized into different levels of abstraction. e.g schema hierarchy.
There is also user beliefs in evaluating the patterns . A set of belief is used to compare with the result e.g unexpectedness.
Data Mining Engine
Each domain has specific problems that you want to solve, for which you need to run data mining tasks. But this process is not so easy. You need knowledge of domain and identify which tasks are suitable in solving those problems.
The data mining engine does the data mining tasks using a set of functional modules. The tasks are
- correlation analysis
- cluster analysis
- outlier analysis and
- evolution analysis
There are tools such as KIRA that can guide you through the data mining tasks identification process based on your domain.
Pattern Evaluation Module
The pattern evaluation module interacts with the data mining modules and focus the search towards only interesting patterns. It basically filter out discovered patterns using some interestingness measure.
The pattern evaluation module is sometimes integrated into the data mining modules depending on the mining method you are using.
This is a separate module that communicates between the users and the data mining system. It allows users to
- specify data mining query
- providing information to guide the search
- exploratory data mining on intermediate results
- browse, visualize the database and data warehouse in different forms.
The data mining can be viewed as advanced stage of OLTP from data warehouse perspective. Therefore, the data analysis system should handle large amount of data, otherwise, it can be termed something smaller as machine learning system that uses AI, or some statistical data analysis tool, or experimental system.
You must have heard about Big Data. It only mean huge amount of data and nothing else. It define the huge data using 5 V’s – volume, variety, veracity, and velocity.
The data mining on contrast work on large data and extract interesting information.