The purpose of this report is to examine the different types of mining and provide insight into their applicable use and relevance. It also provides a number of notable uses for each type of mining along with some personal reflection from the author on the implications of mining on privacy and ethics.
Turban, Sharda and Delen (2011) define data mining as “a term used to describe discovering or mining knowledge from large amounts of data.” This definition is more or less consistent among published works but may vary slightly in the context of the subject being examined. For example, Mena (1999) in a relative early discussion of the topic in relation to data mining for web sites referred to it as “inductive data analysis.” Tan (1999), building on the work of Hearst (1997), and in the context of the challenges of text mining, referred to data mining as “knowledge discovery from textual databases” when presenting a framework for text mining.
Regardless of the definition, application or emerging technologies, at the core of data mining is data; stored in any location, in any format and accessible by commercial off the shelf or custom developed solutions that allow for the discovery and analysis of that data. If you collect enough data, for long enough, someone, somewhere will want to know about it. This report discusses the pros, cons and use of three specific types of mining, data, text and web mining.
3 DATA MINING
As previously discussed there are many similar definitions, however, Gartner (2013) quite succinctly defines data mining as “the process of discovering meaningful correlations, patterns and trends by sifting through large amounts of data stored in repositories. Data mining employs pattern recognition technologies, as well as statistical and mathematical techniques.”
3.2 POSITIVES AND NEGATIVES
Perhaps one of the most positive aspects of data mining is its flexibility when it comes to providing insights into data. Large datasets can provide an opportunity for new and innovative ways of surfacing data. A good example of this is the framework described in Schultx, Eskin, Zadok, and Stolfo (2001), that was used to detect malicious executables by looking for patterns in data sets. From a business perspective pattern analysis can be used to make faster, more informed business decisions.
It does however take time to gather, structure data, and put in place the infrastructure components needed to be able to extract the benefits of data mining. Whilst there are many commercially available tools to assist in the process, technical skill is still required to complete the extraction, transformation and loading tasks. Non technical staff need to be trained on how to interpret and extract the data correctly so as to no misinform or negatively impact when it come to critical and everyday business decisions.
3.3 RELEVANCE TO BUSINESS INTELLIGENCE
Typically commercial and custom built business intelligence applications use data extracted from a data warehouse. The key to data mining being relevant to business intelligence, is for business to understand that it is an enabling technology that requires ongoing maintenance and support to remain a viable business decision making tool. A well built and managed data mining capability can provide ongoing support for answering key business questions such as “what are future sales likely to be if event x happens”. Without the underlying data mining capability the business intelligence cannot be formed to be used strategically within an organization.
3.4 USE AND APPLICATION
Significant benefits have been gained for both small and large businesses in the area of Customer Relationship Management. No longer is the financial capacity of an organisation a major benefit when it comes to understanding where the potential value lies in your customer base. The ability to…