Data+Mining

toc =Introduction=

[|Data mining] is a popular term used to describe the actions of various programs that search through data warehouses, which are essentially large volumes of related data, for patterns that may represent hidden or opaque market trends. In general practice, data is gleaned from customers when they purchase items, fill out user response forms, respond to advertising, etc. Lately, unscrupulous marketers have used such stronghanded practices as spyware and tracking cookies, which allow businesses to figure out where users go online, what they look at, and for how long. This data is put in to several separate databases, which may or may not be used in general day-to-day activities; the customer purchase database or the orders database would be examples of this. Information from these databases is culled (usually at specified time periods by automated warehousing software) into a data warehouse, which records specific statistical information about the data the databases hold (taking a sales database, and recording average purchases per customer, or customers by region for instance, while cross-referencing it). The actual raw data is never put into the warehouse - the warehouse retains none of the operational functionality any of the previous databases; warehouses are designed specifically to display trends between this information. An IT professional then uses data mining software to view this information in any myriad of different ways, cross-referencing data from multiple sources. Usually data mining software allows the data to be graphed or put into various standard reports; most support RegEx data searches, more complex systems may also include AI cross-referencing and data analysis functionality as well.

Data mining can be traced back to the 1960s, when it was first called statistical analysis. The history of data mining is best described as the combination of evolutionary developments of classical statistics, Artificial Intelligence (AI) and machine learning. Classical statistics involved concepts such as regression analysis, standard deviation and cluster analysis, which are all used to study data and data relationships and therefore is the building block of today’s data mining. Artificial Intelligence, also know as AI, is the attempt of applying human-like-thoughts to processing statistical problems. AI was not a large influence to today’s data mining until recent improvement to the processing power of computers. The last discipline having an impact to data mining is machine learning. Machine learning is in fact just the combination of classical statistics and AI.

Wth the right approach, data mining can provide valuable information that will help a company in many ways. With that said, the question is what the right approach is and what are the approaches? First off, there are 5 basic elements in data mining: Every approach to data mining will consist of these 5 basic steps. There are obviously several analysis techniques/approaches that can be used. Data mining today has moved beyond its original approaches of classical statistics, and has become more insightful. Some of the more common approaches include rule-based analysis, neural networks, genetic algorithms, nearest neighbor method and decision trees.
 * 1) Extracting data onto the data warehouse
 * 2) Storing and managing the data
 * 3) Providing access of the date to business analysts
 * 4) Analyzing the data
 * 5) Presenting the data in a useful manner


 * **Technology** || **Advantages** || **Limitations** ||
 * Rule-based analysis || Good for data that is "complete" with data relationships that can be modeled in via if . . . then rules or decision trees. Rules are readable. || Having Large number of rules is difficult to understand. Data may not have strong rules-based relationships. ||
 * Neural networks || Good for data with non-linear relationships. Can work well if data is missing some values. || Inability to explain the found relationships, although some leading-edge tools are attempting to create explanations of the decisions. Requires non-numeric data to be converted to numeric data values. ||
 * K-nearest-neighbor || Good for discovering clusters; can utilize an entire data source rather than require sampling for training. || Requires a large amount of memory (this technology is also called memory-based reasoning). May be overly sensitive to closely matching records. ||
 * Genetic algorithms || Good for forecasting problems involving data with non-linear relationships. Can work well if data is missing some values. || Inability to explain the found relationships, although some leading-edge tools are attempting to create explanations of the decisions. Requires non-numeric data to be converted to numeric data values. ||



=Purpose=


 * Data Mining:** The analysis of data to establish relationships and identify patterns.The patterns can be used to:
 * To gain insight into aspects of the organization's operations
 * To predict outcomes for future situations
 * To help the corporate analysis to suggest marketing strategies. For example, retail stores, promotions

Data mining can also help to:

 * Profiling customers
 * Targeting direct mailings
 * Managing credit risks
 * Detecting telecommunications
 * Detecting credit card fraud

Data mining technology helps businesses by offering them the ability to finding predictive information within large databases automatically. It allows questions to be answered quickly and directly from the data, which used to require extensive analysis. In addition, data mining also enables an automated finding of previously unknown patterns (e.g. hidden patterns within databases). One example is the process of detecting fraudulent credit card transactions.

=Data Mining Components=

A crucial component to data mining is the Data Warehouse. Many businesses perform thousands of transactions a day and thus record these transactions into Operational Databases. The operational database is temporary, in the sense it stores day to day operational data. This data is than transferred over to a data warehouse. The purpose of the data warehouse is to gather, collect, and store vast amounts of information pertaining to business processes. This data is useful to make strategic business decisions, but in order to do so you need to implement data mining agents to extract that information.

Basic Elements in Data Mining There are 5 basic elements in data mining that are crucial

=Data Mining Software=

http://www.bitpipe.com/tlist/Data-Mining-Software.html



= = =References=


 * Class notes of CCT 260 by Elizabeth Little-john
 * Data mining image:< **http://**[|**www.datapult.com/ Data_Mining.htm**]>
 * Data mining image: http://techpubs.sgi.com/library/dynaweb_docs/0650/SGI_EndUser/books/MineSetNT_T/sgi_html/figures/dataprocess.gif