advertisement
Login | Register   
  Include Code  Search Tips
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   TIP BANK
Browse DevX
Download the sample code for this article
Partners & Affiliates
advertisement
advertisement
advertisement
advertisement
 

Extracting Meaning from Text with OpenCalais R3

Formally-structured text that is published can be summarized and combined with other text to provide new insights.  


advertisement
A big challenge companies face today is that most information, both online and archived, is only available as published text and does not contain any formal structure suitable for synthesizing. In a formal structure, information can be summarized, used to help locate meaningful text, and combined with other text to provide new insights. This article shows how to convert unstructured written text into structured data using OpenCalais, which is a public general-purpose text-extraction service that uses a combination of statistical and grammatical analysis to extract meaning. OpenCalais is not the only solution available for extracting meaning from text, but it is the only publicly available web service.

Information Extraction
The simplest way to categorize a document or paragraph is to use word associations. For example, if the words "earnings" and "acquired" are used in a document, it is likely a document about business finances. Furthermore, if the word "Reuters" is mostly used only in business finance documents, then other documents containing this word are likely to also be about business finances. This technique is called statistical analysis and is commonly used for document categorization. Statistical analysis is an OpenCalais technique to categorize documents and identify what the text is referring to.


It's quick, easy and you get access to all the articles on DevX.
This registration/login is to allow you to read articles on devx.com.
Already a member?



advertisement