While preparing a Lucene presentation and workshop I came
across Spring Modules. It’s a small set of libraries that does things the
Spring-way. It isn’t included in Spring because it doesn’t belong to the
‘core-business’ of Spring. I’m not writing about the possibilities of Spring
Modules, I’ll only focus on the Lucene
library. Nevertheless it’s worthwhile to take a peek at Spring Modules
yourself.
Lucene is a very fast and easy to use java based search
engine. Spring Modules (SM) hides some implementation details and lets you
configure the search engine in XML (like you’re used to in Spring)
Before you get
started
SM can be downloaded here
(Project Tools, Documents & Files and pick the version with dependencies).
For this article I used the 0.4 version. Since the Spring community is very active
it’s likely that newer versions will be released soon.
Creating the
directory
In Lucene you can use a file based directory (FSDirectory) or an in-memory directory,
called a RAMDirectory. The beauty of Spring is that you can change the
directory without changing your code. Let’s take a look at the several bean
definitions for directories.
<bean id="ramdDirectory" class="org.springmodules.lucene.index.support.RAMDirectoryFactoryBean" /> <bean id="fsDirectory" class="org.springmodules.lucene.index.support.FSDirectoryFactoryBean"> <property name="location" value="file:///C:/temp/index
" /> <property name="create" value="true"/> </bean> <alias name="ramDirectory" alias="indexDirectory" />
The first definition is the RAMDirectory, nothing special
here. The second bean has some options. You have to set the location of the
index (this is a file system directory) and whether you want do a full index or
incremental index.
Finally I included an alias. With this alias you can easily
switch between indexes (you have to use indexDirectory at at least two places
and you don’t want to change it at two places if you want to swap between
indexes.
Creating the index
Our first step is to convert our domain objects to Lucene
Documents. We do this by implementing the DocumentCreator interface. This interface
describes the createDocument()
method that returns a Lucene Document. Just convert your domain object in this
method and use the fieldnames you prefer. In my test application I have an
Employee object that implements DocumentCreator, my createDocument Method looks
like this:
public Document createDocument() {
Document doc = new Document();
doc.add(new Field("name", getName(), Field.Store.YES, Field.Index.TOKENIZED));
doc.add(new Field("department", getDepartment(), Field.Store.YES, Field.Index.TOKENIZED));
doc.add(new Field("location", getLocation(), Field.Store.YES, Field.Index.TOKENIZED));
return doc;
}
Now we have to define an index factory:
<bean id="indexFactory"
class="org.springmodules.lucene.index.support.SimpleIndexFactoryBean">
<property name="directory">
<ref bean="indexDirectory" />
</property>
<property name="analyzer">
<bean class="org.apache.lucene.analysis.SimpleAnalyzer" />
</property>
</bean>
We can point our indexing class to the indexFactory:
<bean id="indexAccessor" class="nl.amis.spring.LuceneIndexer">
<property name="indexFactory">
<ref local="indexFactory" />
</property>
</bean>
The indexing class extends LuceneIndexSupport. This class
gives us the getTemplate() method. With this method you can do everything with
your index that was possible without using SM.
public class LuceneIndexer extends LuceneIndexSupport {
public void index() {
List<Employee> list = //populate list here
for (Employee e : list) {
getTemplate().addDocument(e);
}
}
}
As you can see we can pass the Employee object to the
addDocument method because Employee implemented the DocumentCreator interface.
All you have to do now is run the index() method and your
index is created.
Searching your index
The final step is making a searcher for your index. First we
have to write a converter. We have to convert Lucene Documents to domain
objects. Converting is done with a HitExtractor interface. In your class you’ll
have to implement the mapHit(int id, Document document, float score) method. It
returns an Object. When you’re using Java 1.5
you can return any type (covariance return) and I strongly advise to do
so.
My class is called
EmployeeHitExtractor:
public Employee mapHit(int id, Document document, float score) {
Employee e = new Employee();
e.setDepartment(document.get("department"));
e.setLocation(document.get("location"));
e.setName(document.get("name"));
return e;
}
Now we can create the searcher. First we have to add two
beans to our Spring configuration:
<bean id="searcherFactory"
class="org.springmodules.lucene.search.factory.SimpleSearcherFactory">
<property name="directory">
<ref bean="indexDirectory" />
</property>
</bean>
<bean id="indexSearcher"
class="nl.amis.spring.LuceneIndexSearcher">
<property name="searcherFactory">
<ref local="searcherFactory" />
</property>
<property name="analyzer">
<bean class="org.apache.lucene.analysis.SimpleAnalyzer" />
</property>
</bean>
The beans speak for itself. Now we have to create the
LuceneIndexSearcher class. It extends LuceneSearchSupport. That class provides
a getter and setter for the Analyzer and Factory and provides us with a
LuceneIndexTemplate (accessible via getTemplate())
Add a getDocs() method to the class:
public int getDocs() throws ParseException {
LuceneSearchTemplate template = getTemplate();
QueryParser parser = new QueryParser("location", getAnalyzer());
Query query = parser.parse("seattle");
HitExtractor hitExtractor = new EmployeeHitExtractor();
List list = template.search(query, hitExtractor);
for (int i = 0; i < list.size(); i++) {
System.out.println(list.get(i));
}
return list.size();
}
This class is not quite useful but demonstrates the idea. First
we parse the query with the Analyzer and then get a list with results (which is
immediately converted to list with Employees).
Conclusion
The Lucene library for SM is very powerful and forces us to
do the Lucene things more structured. I also think it’s a good thing to have
definitions for the directory, indexer and searcher in a single XML file. Of
course we can do everything manually, but it’s great someone already did this
for us.
I didn’t mention the DatabaseIndexer interface. The
documentation doesn’t seem to be 100% accurate and it assumes you’re using
JDBC. In my opinion JDBC is a bad thing for object mapping (even in Spring),
you should at least use iBatis
and working with a Object mapper also makes it easier to implement the
DocumentCreator interface.
There is a manual in both PDF and HTML. You can find it by
going to the main page of Spring Modules and scroll down to the header “Usage
Instructionsâ€
Sources
http://springmodules.dev.java.net
Download the project source files here
I used Maven2 to build the project. You might have to install Lucene 1.9.1 (or higher), Spring Modules 0.4 (or higher) and the Oracle JDBC driver manually. I used the HR schema, so you have to have that one too. It’s howere easy to change the database or the schema. Feel free to ask for help if it isn’t working.
hi thank you good example
How to Index Microsoft Format Documents (Word, Excel, Powerpoint) – Lucene
http://kalanir.blogspot.com/2008/08/how-to-index-microsoft-format-documents.html
I don’t know whether the API changed between the 0.4 and the 0.8 version (or if this is a typo from the start), but I spent a few hours trying to use Lucene with this configuration with no success.
For an FS implementation, you need to put the “create true” property node on the indexFactory bean, not on the FSDirectory one.
Sir,
I need your help to configure Apache Lucene to search Word Documents,PDF and XML.
Thanking You
I don’t have much experience with indexing those files and don’t have any examples lying around. You can go to http://www.zilverline.org, download the application and browse the source. This application is a google desktop search made with Lucene and can index the files you mentioned.
Sir,
I need your hep to configure Apache Lucene to search Word Documents,PDF and XML.
Thanking You
Vishakh Cherian
Thanks for the tip. I heard the name and know it does something with Lucene and a database, but I’ll give it a try.
you should give Compass a try.
This is a springified framework that works on top of lucene
http://www.opensymphony.com/compass/