IBM Open Sources WebFountain (UIMA) - Unstructured Text Analysis software americas cup win 2682133k1

IBM Open Sources WebFountain (UIMA) – Unstructured Text Analysis software

It sounds intriguing. If only I could really get it. IBM has developed software, called UIMA for Unstructured Information Management Architecture, that can be used for analysis of unstructured information. It can perhaps help perform trend analysis across documents, determine the theme and gist of documents, allow fuzzy searches on unstructured documents.

UIMA stands for the Unstructured Information Management Architecture: It is an open, industrial-strength, scaleable and extensible platform for creating, integrating and deploying unstructured information management solutions from combinations of semantic analysis and search components.

IBM is making UIMA available as free, open source software to provide a common foundation for industry and academia to collaborate and accelerate the world-wide development of technologies critical for discovering the vital knowledge present in the fastest growing sources of information today. IBM has empowered its products and services with UIMA creating a channel for third-party vendors to deploy their text and multi-modal analytics in larger integrated solutions.

The premier product platform that exposes the UIMA interfaces to the customer is WebSphere Information Integrator OmniFind Edition. To try out the UIMA software framework download the free UIMA Software Development Kit (SDK) from IBM’s alphaWorks Site: http://www.alphaworks.ibm.com/tech/uima.

The UIMA homepage is here: http://www.research.ibm.com/UIMA/

Here is an article that describes UIMA and the – theory of – development of a UIMA based application: Building an example application with the Unstructured Information Management Architecture. Again, it sounds very exciting. But this article is not very extremely accessible.

What is the Unstructured Information Management Architecture (UIMA) SDK?

Unstructured information management (UIM) applications are software systems that analyze unstructured information (text, audio, video, images, etc.) to discover, organize, and deliver relevant knowledge to the user. In analyzing unstructured information, UIM applications make use of a variety of analysis technologies, including statistical and rule-based Natural Language Processing (NLP), Information Retrieval (IR), machine learning, and ontologies. IBM’s UIMA is an architectural and software framework that supports creation, discovery, composition, and deployment of a broad range of analysis capabilities and the linking of them to structured information services, such as databases or search engines. The UIMA framework provides a run-time environment in which developers can plug in and run their UIMA component implementations, along with other independently-developed components, and with which they can build and deploy UIM applications. The framework is not specific to any IDE or platform.

This technology, the UIMA SDK (Software Development Kit), is an all-JavaTM implementation of the UIMA framework, and it supports the implementation, description, composition, and deployment of UIMA components and applications. It also supports the developer with an Eclipse -based development environment that includes a set of tools and utilities for using UIMA.

One large, but not the only, application area of text analysis is improving text search. By detecting important terms and topics within documents, semantic search engines provide the capability to search for concepts and relationships instead of keywords.UIMA processing occurs through a series of modules called analysis engines. The result of analysis is an assignment of semantics to the elements of unstructured data, for example, the indication that the phrase “Washington” refers to a person’s name or that it refers to a place. UIMA supports the rendering of these results in conventional structures (for example, relational databases or search engine indices), where the content of the original unstructured information may efficiently be accessed according to its inferred semantics.

UIMA is specifically designed to support the developer in the creation, integration, deployment, and sharing of components across platforms and among disperse teams with different skills working to develop advanced analytics.

See here the UIMA SDK User’s Guide and Reference (342 pages PDF):http://dl.alphaworks.ibm.com/technologies/uima/UIMA_SDK_Users_Guide_Reference.pdf

It seems to me that there is at least some similarity between some of the UIMA Functionality and what Oracle Text offers – although I have to add that we have not been able to get Oracle Text’s ABOUT operator – the one that according to the documentation should be able to perform Thematic Searches and even Summary Creation – to work.

2 Comments

  1. James October 21, 2005
  2. James October 21, 2005