Using Lucene with Spring Introduction to Spring Modules html

Using Lucene with Spring Introduction to Spring Modules

While preparing a Lucene presentation and workshop I came
across Spring Modules. It’s a small set of libraries that does things the
Spring-way. It isn’t included in Spring because it doesn’t belong to the
‘core-business’ of Spring. I’m not writing about the possibilities of Spring
Modules,  I’ll only focus on the Lucene
library. Nevertheless it’s worthwhile to take a peek at Spring Modules
yourself.

Lucene is a very fast and easy to use java based search
engine. Spring Modules (SM) hides some implementation details and lets you
configure the search engine in XML (like you’re used to in Spring)

Before you get
started

SM can be downloaded here
(Project Tools, Documents & Files and pick the version with dependencies).
For this article I used the 0.4 version. Since the Spring community is very active
it’s likely that newer versions will be released soon.

Creating the
directory

In Lucene you can use a file based directory  (FSDirectory) or an in-memory directory,
called a RAMDirectory. The beauty of Spring is that you can change the
directory without changing your code. Let’s take a look at the several bean
definitions for directories.


<bean id="ramdDirectory"
            class="org.springmodules.lucene.index.support.RAMDirectoryFactoryBean" />

<bean id="fsDirectory"
            class="org.springmodules.lucene.index.support.FSDirectoryFactoryBean">
            <property name="location" value="file:///C:/temp/index/>
            <property name="create" value="true"/>
</bean>

<alias name="ramDirectory" alias="indexDirectory" />

The first definition is the RAMDirectory, nothing special
here. The second bean has some options. You have to set the location of the
index (this is a file system directory) and whether you want do a full index or
incremental index.

Finally I included an alias. With this alias you can easily
switch between indexes (you have to use indexDirectory at at least two places
and you don’t want to change it at two places if you want to swap between
indexes.

Creating the index

Our first step is to convert our domain objects to Lucene
Documents. We do this by implementing the DocumentCreator interface. This interface
describes the createDocument()
method that returns a Lucene Document. Just convert your domain object in this
method and use the fieldnames you prefer. In my test application I have an
Employee object that implements DocumentCreator, my createDocument Method looks
like this:


public Document createDocument() {
    Document doc = new Document();

    doc.add(new Field("name", getName(), Field.Store.YES, Field.Index.TOKENIZED));
    doc.add(new Field("department", getDepartment(), Field.Store.YES, Field.Index.TOKENIZED));
    doc.add(new Field("location", getLocation(), Field.Store.YES, Field.Index.TOKENIZED));

    return doc;
}

Now we have to define an index factory:


<bean id="indexFactory"
            class="org.springmodules.lucene.index.support.SimpleIndexFactoryBean">

            <property name="directory">
                        <ref bean="indexDirectory" />
            </property>
            <property name="analyzer">
                        <bean class="org.apache.lucene.analysis.SimpleAnalyzer" />
            </property>
</bean>

We can point our indexing class to the indexFactory:


<bean id="indexAccessor" class="nl.amis.spring.LuceneIndexer">
            <property name="indexFactory">
                        <ref local="indexFactory" />
            </property>
</bean>

The indexing class extends LuceneIndexSupport. This class
gives us the getTemplate() method. With this method you can do everything with
your index that was possible without using SM.


public class LuceneIndexer extends LuceneIndexSupport {

    public void index() {
        List<Employee> list = //populate list here

        for (Employee e : list) {
            getTemplate().addDocument(e);
        }
    }
}

As you can see we can pass the Employee object to the
addDocument method because Employee implemented the DocumentCreator interface.

All you have to do now is run the index() method and your
index is created.

Searching your index

The final step is making a searcher for your index. First we
have to write a converter. We have to convert Lucene Documents to domain
objects. Converting is done with a HitExtractor interface. In your class you’ll
have to implement the mapHit(int id, Document document, float score) method. It
returns an Object. When you’re using Java 1.5
you can return any type (covariance return) and I strongly advise to do
so.

My  class is called
EmployeeHitExtractor:


public Employee mapHit(int id, Document document, float score) {

    Employee e = new Employee();

    e.setDepartment(document.get("department"));
    e.setLocation(document.get("location"));
    e.setName(document.get("name"));

    return e;
}

Now we can create the searcher. First we have to add two
beans to our Spring configuration:


<bean id="searcherFactory"
            class="org.springmodules.lucene.search.factory.SimpleSearcherFactory">

            <property name="directory">
                        <ref bean="indexDirectory" />
            </property>
</bean>

<bean id="indexSearcher"
            class="nl.amis.spring.LuceneIndexSearcher">

            <property name="searcherFactory">
                        <ref local="searcherFactory" />
            </property>
            <property name="analyzer">
                        <bean class="org.apache.lucene.analysis.SimpleAnalyzer" />
            </property>
</bean>

The beans speak for itself. Now we have to create the
LuceneIndexSearcher class. It extends LuceneSearchSupport. That class provides
a getter and setter for the Analyzer and Factory and provides us with a
LuceneIndexTemplate (accessible via getTemplate())

Add a getDocs() method to the class:


public int getDocs() throws ParseException {

    LuceneSearchTemplate template = getTemplate();

    QueryParser parser = new QueryParser("location", getAnalyzer());

    Query query = parser.parse("seattle");

    HitExtractor hitExtractor = new EmployeeHitExtractor();

    List list = template.search(query, hitExtractor);

    for (int i = 0; i < list.size(); i++) {
        System.out.println(list.get(i));
    }

    return list.size();
}

 

This class is not quite useful but demonstrates the idea. First
we parse the query with the Analyzer and then get a list with results (which is
immediately converted to list with Employees).

Conclusion

The Lucene library for SM is very powerful and forces us to
do the Lucene things more structured. I also think it’s a good thing to have
definitions for the directory, indexer and searcher in a single XML file. Of
course we can do everything manually, but it’s great someone already did this
for us.

I didn’t mention the DatabaseIndexer interface. The
documentation doesn’t seem to be 100% accurate and it assumes you’re using
JDBC. In my opinion JDBC is a bad thing for object mapping (even in Spring),
you should at least use iBatis
and working with a Object mapper also makes it easier to implement the
DocumentCreator interface.

There is a manual in both PDF and HTML. You can find it by
going to the main page of Spring Modules and scroll down to the header “Usage
Instructions”

Sources

http://lucene.apache.org

http://springmodules.dev.java.net

 

Download the project source files here

I used Maven2 to build the project. You might have to install Lucene 1.9.1 (or higher), Spring Modules 0.4 (or higher) and the Oracle JDBC driver manually. I used the HR schema, so you have to have that one too. It’s howere easy to change the database or the schema. Feel free to ask for help if it isn’t working.

7 Comments

  1. infochannel February 27, 2010
  2. Arnaud Jeansen August 31, 2007
  3. Manohar August 18, 2006
  4. Jeroen van Wilgenburg July 3, 2006
  5. Vishakh Cherian July 3, 2006
  6. Jeroen van Wilgenburg June 29, 2006
  7. Rolf June 26, 2006