Oracle text 20188367001

Oracle text

The Dutch Oracle Usergroup (OgH) organized a session about (text) searching capabilities of Oracle technology. The presentation was done by Wouter van de Weghe of Oracle and consisted of three parts: explaining Oracle Text, discussing UltraSearch and introducing Enterprise Search.

Oracle Text
Wouter gave a good explanation of Oracle Text, formerly known as Context and Intermedia Text (and probably also other names). He stressed that Oracle Text is a technology and not a product and can be used from a range of technologies and products like PL/SQL and java and that it is a standard part of the database. Oracle Text does text indexing and offers smart ways to access this data. It can index different file types (plain text, pdf, mail, word, xml etc.), on many locations (inside and outside the database and on the web), in different languages and offers different options for the presentation like highlighting and summaries. For searching you have options like phrases, full matching, fuzzy searching, proximity searching, about, stemming, themes, case (in)sensitivity, diacritic options, thesaurus options, language specific features, etc. etc.
The indexing process consists of a few steps, like removing stop words (like the words and, the, for etc.), filtering, structuring etc. Some of them are done by external tools from external parties. The result are so called ‘inverted indexes’ (the dr$ tables); with the references in a binary format. To prevent fragmentation of the index it is recommended to schedule the indexing process and not update the index on every change.

Oracle UltraSearch
UltraSearch is a web-based Oracle Text based application. It is part of the database, the application server or the collaboration suite.

Oracle Enterprise Search
Oracle enterprise search will be the successor of Ultrasearch and it will be available later this year as a stand-alone product. With enterprise search Oracle aims to ‘become the Google of the intranet’. It consists of five parts: the crawler, the server, the query UI, administration and the federator. It will index websites, text documents and also (any jdbc available) databases, other datasources like Documentum and Lotus Notes, email and much, much more. It’s provided with webservices and integrates with Portal. Wouter demonstrated that OTN just upgraded to Enterprise Search.

Summary
Oracle text offers very advanced technology for searching and indexing text. It can be used quite easily in pl/sql and java applications. With UltraSearch (and later with Enterprise Search) Oracle offers an out-of-the-box web-based search application that can also be integrated in other applications. Be aware that searching can be a complex issue so don’t be overwhelmed with all the possibilities. Using only basic features, like case-insensitive searching or searching without diacritics can be very usefull, will boost perfomance and may result in clearer application design.