stty consulting › our future
Data Mining: Another Tool To Increase Productivity In Manufacturing?
On-line Analytical Processing & On-line Transactional Processing
What is the difference between OLTP, OLAP and Data Mining?
On-line Transactional Processing (OLTP) relies on complete transactional data. Therefore, this style of query and reporting must use the operational or business system as a source of data. With the most obvious reason for separating the operational (transactional) data from the analysis data has always been the degradation of the response time of the operational systems. Operational systems often rely on high performance and quick response times, the loss of the efficiency and associated costs incurred through the poor performance of such systems can be easily calculated or measured [13]. For the casual user of the reporting tool an additional 10-second would be negligible, yet if extrapolated across a transactional system processing hundreds or thousands of individual operations the effect would be devastating.
On-line Analytical Processing (OLAP) is a descriptive querying tool where analysts verify a hypothesis. Typically OLAP analysis uses predefined, summated or aggregated data, such as 'multidimensional cubes', where as Data Mining requires detailed data that is high level of granularity, very denormalised and then analysed at the individual record level. Regardless of how the questions are formulated, the results returned OLAP applications are purely factual. For example, the number of blue shoes sold in March in Paris was 123.
Data Mining on the other hand, is a form of discovery driven analysis. The use of artificial intelligence and statistical techniques allow the model to make predictions or estimates about outcomes to future events. "Data Mining techniques are used to find interesting, often complex, and previously unknown patterns in data."[12], i.e., How many blue shoes should be ordered for Europe next year?
Hand et al in their study into On-line Analytical Mining claim "based on our analysis, there is no fundamental difference between the data cube required for OLAP and that for OLAM, although OLAM analysis may often involve the analysis of more dimensions with finer granularities, or involve the discovery-driven exploration of multi-feature aggregations on the data cube."[23]. This affirms that the difference in the underlying data source for both OLAP and Data Mining do not need to be different, despite the fact that many Data Warehouses are unsuitable for both.
Problems for Data Mining using a Data Warehouse
"Even though data mining in the detail data may account for a very small percentage of the data warehouse activity, the most useful data analysis might be done in the detail data." [13], it appears that Microsoft have conceded that a Data Warehouse may have problems satisfying the Data Mining requirements. The granularity of the data warehouse schema will have been optimised for OLAP use. With data summarised to a weekly or greater level, most Data Mining techniques would not be valid. If we looked at the Data Warehouse, we may notice the subject hierarchy of the structure worked at a regional level and therefore would be an inadequate data source to investigate customer behaviour in depth. Most commonly, the Data Warehouse has been constructed to fulfil a specific requirement rather than a generic data repository for global analysis [12,23]. Later in this paper, I sight the resource intensive nature of the gathering, cleaning and loading of the data into the Data Warehouse as a major limiting factor when contemplating a Data Warehouse project. With this in mind, the data sets created tend to become 'leaner' and narrowly focused. The data structure is formulated from the reporting requirements as set out by the business, this invariably produces a limited focus.
Within Data Warehouses the data a generally pre-processed prior to loading. By applying standard business rules the data will support a high level of consistency, yet the consolidated views may be to narrow. As Data Mining algorithms will not be able to give a comprehensive analysis if the data is over processed, the Data Warehouse may be a false grail.