advertisement
Login | Register   
  Include Code  Search Tips
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   TIP BANK
Browse DevX
Partners & Affiliates
advertisement
advertisement
advertisement
advertisement
 

Automate Metadata Extraction for Corporate Search and Mashups

Learn how to extract document semantics with Apache UIMA. 


advertisement
here are some exciting developments in automated metadata extraction and its implication for better semantic search and corporate mashups. Advanced open source tools created by linguists to recognize the meaning of words in documents are now becoming an order of magnitude more cost effective to use. The arrival of the Apache Unstructured Information Management Architecture (UIMA—pronounced "you-ee-ma") framework makes these tools accessible by non-programmers. The addition of semantically precise metadata to documents opens the door for new semantic web applications; including better document search and document mashups.

Web Search Drives Expectations

Many people wonder at the power and precision of web search engines like Google. But if you use the Google search engine to find a Microsoft Word document that is inside a corporate web site, you might have less than stellar results. There is a simple reason for this: most internal documents don't have the rich web linking that public web sites have. Search engines such as Google use the number of links that point to a document to help rank the search results. Without those links, the documents are not likely to be found.

It's quick, easy and you get access to all the articles on DevX.
This registration/login is to allow you to read articles on devx.com.
Already a member?



advertisement