Example: air traffic controller

Vol. 7 No. 11, 2016 Text Mining: Techniques, Applications ...

(IJACSA) International Journal of Advanced Computer Science and Applications ,Vol. 7 No. 11, 2016 Text mining : Techniques, Applications and IssuesRamzan Talib , Muhammad Kashif Hanif , Shaeela Ayesha , and Fakeeha Fatima Department of Computer Science,Government College University, Faisalabad, PakistanAbstract Rapid progress in digital data acquisition tech-niques have led to huge volume of data. More than 80 percentof today s data is composed of unstructured or semi-structureddata. The discovery of appropriate patterns and trends to analyzethe text documents from massive volume of data is a big mining is a process of extracting interesting and non-trivial patterns from huge amount of text documents. Thereexist different techniques and tools to mine the text and discovervaluable information for future prediction and decision makingprocess.

sources [3]. Text mining is a multi-disciplinary field based on information retrieval, data mining, machine learning, statistics, and computational linguistics [3]. Figure 1 shows the Venn diagram of text mining and its interaction with other fields. Several text mining techniques like summarization, classifi-

Tags:

  Texts, Mining, Text mining

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Vol. 7 No. 11, 2016 Text Mining: Techniques, Applications ...

1 (IJACSA) International Journal of Advanced Computer Science and Applications ,Vol. 7 No. 11, 2016 Text mining : Techniques, Applications and IssuesRamzan Talib , Muhammad Kashif Hanif , Shaeela Ayesha , and Fakeeha Fatima Department of Computer Science,Government College University, Faisalabad, PakistanAbstract Rapid progress in digital data acquisition tech-niques have led to huge volume of data. More than 80 percentof today s data is composed of unstructured or semi-structureddata. The discovery of appropriate patterns and trends to analyzethe text documents from massive volume of data is a big mining is a process of extracting interesting and non-trivial patterns from huge amount of text documents. Thereexist different techniques and tools to mine the text and discovervaluable information for future prediction and decision makingprocess.

2 The selection of right and appropriate text miningtechnique helps to enhance the speed and decreases the timeand effort required to extract valuable information. This paperbriefly discuss and analyze the text mining techniques and theirapplications in diverse fields of life. Moreover, the issues in thefield of text mining that affect the accuracy and relevance ofresults are Classification; Knowledge Discovery; Applications ;Information Extraction; size of data is increasing at exponential rates dayby day. Almost all type of institutions, organizations, andbusiness industries are storing their data electronically. A hugeamount of text is flowing over the internet in the form ofdigital libraries, repositories, and other textual informationsuch as blogs, social media network and e-mails [1].

3 It ischallenging task to determine appropriate patterns and trends toextract valuable knowledge from this large volume of data [2].Traditional data mining tools are incapable to handle textualdata since it requires time and effort to extract mining is a process to extract interesting and sig-nificant patterns to explore knowledge from textual datasources [3]. Text mining is a multi-disciplinary field based oninformation retrieval, data mining , machine learning, statistics,and computational linguistics [3]. Figure 1 shows the Venndiagram of text mining and its interaction with other text mining techniques like summarization, classifi-cation, clustering etc., can be applied to extract mining deals with natural language text which is storedin semi-structured and unstructured format [4].

4 Text miningtechniques are continuously applied in industry, academia, webapplications, internet and other fields [5]. Application areaslike search engines, customer relationship management system,filter emails, product suggestion analysis, fraud detection, andsocial media analytics use text mining for opinion mining ,feature extraction, sentiment, predictive, and trend analysis [6].Generic process of text mining performs the following steps(Figure 2) Collecting unstructured data from different sourcesFig. diagram of text mining interaction with other fields [4]available in different file formats such as plain text,web pages, pdf files etc. Pre-processing and cleansing operations are performedto detect and remove anomalies. Cleansing processmake sure to capture the real essence of text availableand is performed to remove stop words stemming(process of identifying the root of certain word) andindexing the data [7].

5 Processing and controlling operations are applied toaudit and further clean the data set by automaticprocessing. Pattern analysis is implemented by Management In-formation System (MIS). Information processed in the above steps are used toextract valuable and relevant information for effectiveand timely decision making and trend analysis [8].Fig. mining process [5]Extraction of valuable information from a corpus of differ-ent document is a tedious and tiresome task. The selection | P a g e(IJACSA) International Journal of Advanced Computer Science and Applications ,Vol. 7 No. 11, 2016appropriate technique for mining text reduce the time and effortto find the relevant patterns for analysis and decision objective of this paper is to analyze different text miningtechniques which help to perform text analytics effectively andefficiently from large amount of data.

6 Moreover, the issues thatarise during text mining process are paper is organized in different sections. Previous workis discussed in section II. In section III, different techniques oftext mining are explained. Section IV presents the applicationareas of text mining techniques. In section V, issues andchallenges in text mining field are highlighted. Section VIconcludes the OFLITERATURE[5] described that gathering, extracting, pre-processing,text transformation, feature extraction, pattern selection, andevaluation steps are part of text mining process. In addition,different widely used text mining techniques, , clustering,categorization, decision tree categorization, and their applica-tion in diverse fields are surveyed.

7 [8] highlighted the issuesin text mining Applications and techniques. They discussedthat dealing with unstructured text is difficult as comparedto structured or tabular data using traditional mining toolsand techniques. They have shown the Applications of textmining process in bioinformatics, business intelligence andnational security system. Natural language processing andentity recognition techniques has reduced the issues that occurduring text mining process. However, there exist issues whichneed attention.[9] explored MEDLINE biomedical database by integrat-ing a framework for named entity recognition, classification oftext, hypothesis generation and testing, relationship and syn-onym extraction, extract abbreviations.

8 This new frameworkhelps to eliminate unnecessary details and extract valuableinformation. [10] analyzed the text using text mining patternsand showed term based approaches cannot analyze synonymsand polysemy properly. Moreover, a prototype model wasdesigned for specification of patterns in terms of assigningweight according to their distribution. This approach helps toenhance the efficiency of text mining process. [11] presenteda crime detection system using text mining tools and relationdiscovery algorithm was designed to correlate the term withabbreviation.[12] presented a top down and bottom up approach forweb based text mining process. To combine the similar textdocuments, they apply k-mean clustering technique for bottomup partitioning.

9 To find out the similarity within the documentTF-IDF (Term Frequency- Inverse Document Frequency) al-gorithm has been used to find information regarding specificsubjects. [13] gave an overview of Applications , tools andissues arises to mine the text. They discussed that documentsmay be structured, semi structured or unstructured and ex-tracting useful information is a tiresome task. They presenteda generic framework for concept based mining which can bevisualized as text refinement and knowledge distillation intermediate form of entity representation mining dependson specific domain.[14] presented innovative and efficient pattern discoverytechniques. They used the pattern evolving and discoveringtechniques to enhance the effectiveness of discovering relevantand appropriate information.

10 They performed BM25 and vectorsupport machine based filtering on router corpus volume 1 andtext retrieval conference data to estimate the effectiveness ofthe suggested technique. [15] performed various experimentsof classification using multi-word features on the text. Theyproposed a hand-crafted method to extract multi-word featuresfrom the data set. To classify and extract multi-word textthey divide text into linear and nonlinear polynomial form insupport of vector machine that improve the effectiveness ofthe extracted text mining techniques are available that areapplied for analyzing the text patterns and their mining pro-cess [16]. Figure 3 shows the Venn diagram for the inter-relationship among text mining techniques and their corefunctionality.


Related search queries