While preparing a Lucene presentation and workshop I came
across Spring Modules. Itâ€™s a small set of libraries that does things the
Spring-way. It isnâ€™t included in Spring because it doesnâ€™t belong to the
â€˜core-businessâ€™ of Spring. Iâ€™m not writing about the possibilities of Spring
Modules, Iâ€™ll only focus on the Lucene
library. Nevertheless itâ€™s worthwhile to take a peek at Spring Modules
yourself.

Lucene is a very fast and easy to use java based search
engine. Spring Modules (SM) hides some implementation details and lets you
configure the search engine in XML (like youâ€™re used to in Spring)

Before you get
started

SM can be downloaded here
(Project Tools, Documents & Files and pick the version with dependencies).
For this article I used the 0.4 version. Since the Spring community is very active
itâ€™s likely that newer versions will be released soon.

Creating the
directory

In Lucene you can use a file based directory (FSDirectory) or an in-memory directory,
called a RAMDirectory. The beauty of Spring is that you can change the
directory without changing your code. Letâ€™s take a look at the several bean
definitions for directories.


<bean id="ramdDirectory"
            class="org.springmodules.lucene.index.support.RAMDirectoryFactoryBean" />

<bean id="fsDirectory"
            class="org.springmodules.lucene.index.support.FSDirectoryFactoryBean">
            <property name="location" value="file:///C:/temp/index" />
            <property name="create" value="true"/>
</bean>

<alias name="ramDirectory" alias="indexDirectory" />

The first definition is the RAMDirectory, nothing special
here. The second bean has some options. You have to set the location of the
index (this is a file system directory) and whether you want do a full index or
incremental index.

Finally I included an alias. With this alias you can easily
switch between indexes (you have to use indexDirectory at at least two places
and you donâ€™t want to change it at two places if you want to swap between
indexes.

Creating the index

Our first step is to convert our domain objects to Lucene
Documents. We do this by implementing the DocumentCreator interface. This interface
describes the createDocument()
method that returns a Lucene Document. Just convert your domain object in this
method and use the fieldnames you prefer. In my test application I have an
Employee object that implements DocumentCreator, my createDocument Method looks
like this:


public Document createDocument() {
    Document doc = new Document();

    doc.add(new Field("name", getName(), Field.Store.YES, Field.Index.TOKENIZED));
    doc.add(new Field("department", getDepartment(), Field.Store.YES, Field.Index.TOKENIZED));
    doc.add(new Field("location", getLocation(), Field.Store.YES, Field.Index.TOKENIZED));

    return doc;
}

Now we have to define an index factory:


<bean id="indexFactory"
            class="org.springmodules.lucene.index.support.SimpleIndexFactoryBean">

            <property name="directory">
                        <ref bean="indexDirectory" />
            </property>
            <property name="analyzer">
                        <bean class="org.apache.lucene.analysis.SimpleAnalyzer" />
            </property>
</bean>

We can point our indexing class to the indexFactory:


<bean id="indexAccessor" class="nl.amis.spring.LuceneIndexer">
            <property name="indexFactory">
                        <ref local="indexFactory" />
            </property>
</bean>

The indexing class extends LuceneIndexSupport. This class
gives us the getTemplate() method. With this method you can do everything with
your index that was possible without using SM.


public class LuceneIndexer extends LuceneIndexSupport {

    public void index() {
        List<Employee> list = //populate list here

        for (Employee e : list) {
            getTemplate().addDocument(e);
        }
    }
}

As you can see we can pass the Employee object to the
addDocument method because Employee implemented the DocumentCreator interface.

All you have to do now is run the index() method and your
index is created.

Searching your index

The final step is making a searcher for your index. First we
have to write a converter. We have to convert Lucene Documents to domain
objects. Converting is done with a HitExtractor interface. In your class youâ€™ll
have to implement the mapHit(int id, Document document, float score) method. It
returns an Object. When youâ€™re using Java 1.5
you can return any type (covariance return) and I strongly advise to do
so.

My class is called
EmployeeHitExtractor:


public Employee mapHit(int id, Document document, float score) {

    Employee e = new Employee();

    e.setDepartment(document.get("department"));
    e.setLocation(document.get("location"));
    e.setName(document.get("name"));

    return e;
}

Now we can create the searcher. First we have to add two
beans to our Spring configuration:


<bean id="searcherFactory"
            class="org.springmodules.lucene.search.factory.SimpleSearcherFactory">

            <property name="directory">
                        <ref bean="indexDirectory" />
            </property>
</bean>


<bean id="indexSearcher"
            class="nl.amis.spring.LuceneIndexSearcher">

            <property name="searcherFactory">
                        <ref local="searcherFactory" />
            </property>
            <property name="analyzer">
                        <bean class="org.apache.lucene.analysis.SimpleAnalyzer" />
            </property>
</bean>

The beans speak for itself. Now we have to create the
LuceneIndexSearcher class. It extends LuceneSearchSupport. That class provides
a getter and setter for the Analyzer and Factory and provides us with a
LuceneIndexTemplate (accessible via getTemplate())

Add a getDocs() method to the class:


public int getDocs() throws ParseException {

    LuceneSearchTemplate template = getTemplate();

    QueryParser parser = new QueryParser("location", getAnalyzer());

    Query query = parser.parse("seattle");

    HitExtractor hitExtractor = new EmployeeHitExtractor();

    List list = template.search(query, hitExtractor);

    for (int i = 0; i < list.size(); i++) {
        System.out.println(list.get(i));
    }

    return list.size();
}

This class is not quite useful but demonstrates the idea. First
we parse the query with the Analyzer and then get a list with results (which is
immediately converted to list with Employees).

Conclusion

The Lucene library for SM is very powerful and forces us to
do the Lucene things more structured. I also think itâ€™s a good thing to have
definitions for the directory, indexer and searcher in a single XML file. Of
course we can do everything manually, but itâ€™s great someone already did this
for us.

I didnâ€™t mention the DatabaseIndexer interface. The
documentation doesnâ€™t seem to be 100% accurate and it assumes youâ€™re using
JDBC. In my opinion JDBC is a bad thing for object mapping (even in Spring),
you should at least use iBatis
and working with a Object mapper also makes it easier to implement the
DocumentCreator interface.

There is a manual in both PDF and HTML. You can find it by
going to the main page of Spring Modules and scroll down to the header â€œUsage
Instructionsâ€

Sources

http://lucene.apache.org

http://springmodules.dev.java.net

Download the project source files here

I used Maven2 to build the project. You might have to install Lucene 1.9.1 (or higher), Spring Modules 0.4 (or higher) and the Oracle JDBC driver manually. I used the HR schema, so you have to have that one too. It’s howere easy to change the database or the schema. Feel free to ask for help if it isn’t working.

7 Comments

infochannel February 27, 2010

hi thank you good example
How to Index Microsoft Format Documents (Word, Excel, Powerpoint) – Lucene
http://kalanir.blogspot.com/2008/08/how-to-index-microsoft-format-documents.html
Arnaud Jeansen August 31, 2007

I don’t know whether the API changed between the 0.4 and the 0.8 version (or if this is a typo from the start), but I spent a few hours trying to use Lucene with this configuration with no success.
For an FS implementation, you need to put the “create true” property node on the indexFactory bean, not on the FSDirectory one.
Manohar August 18, 2006

Sir,
I need your help to configure Apache Lucene to search Word Documents,PDF and XML.
Thanking You
Jeroen van Wilgenburg July 3, 2006

I don’t have much experience with indexing those files and don’t have any examples lying around. You can go to http://www.zilverline.org, download the application and browse the source. This application is a google desktop search made with Lucene and can index the files you mentioned.
Vishakh Cherian July 3, 2006

Sir,
I need your hep to configure Apache Lucene to search Word Documents,PDF and XML.
Thanking You
Vishakh Cherian
Jeroen van Wilgenburg June 29, 2006

Thanks for the tip. I heard the name and know it does something with Lucene and a database, but I’ll give it a try.
Rolf June 26, 2006

you should give Compass a try.
This is a springified framework that works on top of lucene
http://www.opensymphony.com/compass/

Using Lucene with Spring Introduction to Spring Modules

Before you get
started

Creating the
directory

Creating the index

Searching your index

Conclusion

Sources

Like this:

About The Author

Jeroen van Wilgenburg

7 Comments

Before you get started

Creating the directory

Creating the index

Searching your index

Conclusion

Sources

Share this:

Like this:

Related Posts

About The Author

Jeroen van Wilgenburg

7 Comments

Before you get
started

Creating the
directory