| advertise add site services publishers database health videos | ![]() | about toolbar stats live show health store more stuff JOIN/LOGIN |
High-Impact Human Diseases through Text... bmir.stanford.edu | Curbell Electronics: Comfort Call Text Message Pagers for sending text... curbellelectronics.com | Taoism: Texts, directory for Taoism/Texts healthysense.com |
Text mining, sometimes alternately referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. High-quality information is typically derived through the divining of patterns and trends through means such as statistical pattern learning. Text mining usually involves the process of structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a database), deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interestingness. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling (i.e., learning relations between named entities).
[edit] HistoryLabor-intensive manual text-mining approaches first surfaced in the mid-1980s, but technological advances have enabled the field to advance swiftly during the past decade. Text mining is an interdisciplinary field which draws on information retrieval, data mining, machine learning, statistics, and computational linguistics. As most information (common estimates say over 80%[1]) is currently stored as text, text mining is believed to have a high commercial potential value. Increasing interest is being paid to multilingual data mining: the ability to gain information across languages and cluster similar items from different linguistic sources according to their meaning. [edit] ApplicationsRecently, text mining has received attention in many areas. [edit] Security applicationsMany text mining software packages are marketed towards security applications, particularly analysis of plain text sources such as Internet news. [edit] Biomedical applicationsMain article: Biomedical text mining A range of text mining applications in the biomedical literature has been described.[2] One example is PubGene that combines biomedical text mining with network visualization as an Internet service.[3] Another example, which uses ontologies with textmining is GoPubMed.org.[4] Semantic similarity has also been used by text-mining systems, namely, GOAnnotator. [5] [edit] Software and applicationsResearch and development departments of major companies, including IBM and Microsoft, are researching text mining techniques and developing programs to further automate the mining and analysis processes. Text mining software is also being researched by different companies working in the area of search and indexing in general as a way to improve their results. [edit] Online Media applicationsText mining is being used by large media companies, such as the Tribune Company, to disambiguate information and to provide readers with greater search experiences, which in turn increases site "stickiness" and revenue. Additionally, on the back end, editors are benefiting by being able to share, associate and package news across properties, significantly increasing opportunities to monetize content. [edit] Marketing applicationsText mining is starting to be used in marketing as well, more specifically in analytical Customer relationship management. Coussement and Van den Poel (2008) apply it to improve predictive analytics models for customer churn (Customer attrition).[6] [edit] Sentiment analysisSentiment analysis may, for example, involve analysis of movie reviews for estimating how favorable a review is for a movie.[7] Such an analysis may require a labeled data set or labeling of the affectivity of words. A resource for affectivity of words has been made for WordNet.[8] [edit] Academic applicationsThe issue of text mining is of importance to publishers who hold large databases of information requiring indexing for retrieval. This is particularly true in scientific disciplines, in which highly specific information is often contained within written text. Therefore, initiatives have been taken such as Nature's proposal for an Open Text Mining Interface (OTMI) and NIH's common Journal Publishing Document Type Definition (DTD) that would provide semantic cues to machines to answer specific queries contained within text without removing publisher barriers to public access. Academic institutions have also become involved in the text mining initiative: The National Centre for Text Mining (NaCTeM), is the first publicly funded text mining centre in the world. NaCTeM is operated by the University of Manchester [1] in close collaboration with the Tsujii Lab, University of Tokyo. NaCTeM provides customised tools, research facilities and offers advice to the academic community. They are funded by the Joint Information Systems Committee (JISC) and two of the UK Research Councils (EPSRC & BBSRC). With an initial focus on text mining in the biological and biomedical sciences, research has since expanded into the areas of Social Science. In the United States, the School of Information at University of California, Berkeley is developing a program called BioText to assist bioscience researchers in text mining and analysis. [edit] Notable Software and applicationsResearch and development departments of major companies, including IBM and Microsoft, are researching text mining techniques and developing programs to further automate the mining and analysis processes. Text mining software is also being researched by different companies working in the area of search and indexing in general as a way to improve their results. There are a large number of companies that provide commercial computer programs:
[edit] Notable Open-source software and applications
[edit] ImplicationsUntil recently websites most often used text-based lexical searches; in other words, users could find documents only by the words that happened to occur in the documents. Text mining may allow searches to be directly answered by the semantic web; users may be able to search for content based on its meaning and context, rather than just by a specific word. Additionally, text mining software can be used to build large dossiers of information about specific people and events. For example, by using software that extracts specifics facts about businesses and individuals from news reports, large datasets can be built to facilitate social networks analysis or counter-intelligence. In effect, the text mining software may act in a capacity similar to an intelligence analyst or research librarian, albeit with a more limited scope of analysis. Text mining is also used in some email spam filters as a way of determining the characteristics of messages that are likely to be advertisements or other unwanted material. [edit] Notes
[edit] References
[edit] See also
[edit] External links
|
| ↑ top of page ↑ | about thumbshots |