BIOMEDICAL TEXT MINING
Text mining, sometimes alternately referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning. Text mining usually involves the process of structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a database), deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interestingness. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling (i.e., learning relations between named entities
Text mining uses the process of "stemming". Words that have similar root stems are considered the same. Therefore similar items can be grouped using text mining to reduce the number of categories.
Biomedical text mining (also known as BioNLP) refers to text mining applied to texts and literature of the biomedical and molecular biology domain. It is a rather recent research field on the edge of natural language processing, bioinformatics, medical informatics and computational linguistics.
There is an increasing interest in text mining and information extraction strategies applied to the biomedical and molecular biology literature due to the increasing number of electronically available publications stored in databases such as PubMed.
.some tools used for biomedical data mining are listed below:
KLEIO - an advanced information retrieval system providing knowledge enriched searching for biomedicine.
FACTA+ - a MEDLINE search engine for finding associations between biomedical concepts. The FACTA+ Visualizer helps intuitive understanding of FACTA+ search results through graphical visualization of the results
U-Compare - U-Compare is an integrated text mining/natural language processing system based on the UIMA Framework, with an emphasis on components for biomedical text mining.
TerMine - a term management system that identifies key terms in biomedical and other text types.
MEDIE - an intelligent search engine to retrieve biomedical correlations from MEDLINE, based on indexing by Natural Language Processing and Text Mining techniques
AcroMine - an acronym dictionary which can be used to find distinct expanded forms of acronyms from MEDLINE
AcroMine Disambiguator - Disambiguates abbreviations in biomedical text with their correct full forms.
GENIA tagger - Analyses biomedical text and outputs base forms, part-of-speech tags, chunk tags, and named entity tags
NEMine - Recognizes gene/protein names in text
Yeast MetaboliNER - Recognizes yeast metabolite names in text.
Smart Dictionary Lookup - machine learning-based gene/protein name lookup.
Chilibot — A tool for finding relationships between genes or gene products.