As you may notice, this is not a very efficient way to locate information. If the relevant topic is at the end of the book, it will certainly take a while to reach. You can start from the beginning of the book start reading until you land on the inheritance topic. You then get a book on OOP and start looking for the relevant information about inheritance. Let's say you are interested in Object Oriented Programming ( OOP) and learning more about inheritance. So, how does Lucene maintain an index, and how's an index being leveraged in terms of search? We can think of a scenario where you look for a certain subject from a book. Once the index is created, you can query it to locate documents by search terms, and this is what's referred to as searching the index. The act of adding documents to the data store is called indexing and the data store itself is called an index. We will treat each news item as a document and add it to our news data store. "Content": "Solid quarterly results from consumer-oriented stocks including ĪMZN +15.75% overshadowed data on slowing economic growth, pushing benchmarks to their biggestįor each news bit, we have a title, publishing date, content, and link, which are the constituents of the typical information in a news article. "Title": "Dow Rises, Gains 1.5% on Week" ,
At the same time, political tensions in France and the Netherlandsįueled fears of further euro-zone turmoil", Monday, driven by steep losses for banks and resource firms after weak purchasing-managers index "Content": "LONDON (MarketWatch)-European stock markets tumbled to a three-month low on "Title": "Europe stocks tumble on political fears, PMI data" , Hopefully, by completing this chapter, you will gain enough knowledge to set up Lucene and have a good grasp of Lucene's concept of indexing and searching information. At the end of this chapter, we will show you how to retrieve search results from Lucene. Then, we will learn how to formulate search queries.
The Creating fields section of this chapter introduces you to Lucene's way of handling information. We will practice deleting documents and searching these documents to locate information. We will learn how to create an index and add documents to an index. All the recipes that follow introduce basic Lucene functionalities, which do not require in-depth knowledge to understand. Instructions to download and set up Lucene are covered in detail in these two recipes. Getting Lucene and setting up a Lucene Java project serves as a guide for you to get started with Lucene. These completely depend on the given language.Creating and writing documents to an indexĬreating queries with the Lucene QueryParser Stop words are words like ‘a', ‘am', ‘is' etc.
The third argument in the TextField constructor indicates whether the value of the field is also to be stored or not.Īnalyzers are used to split the data or text into chunks, and then filter out the stop words from them. Here, we create a document with TextField and add them to the index using the IndexWriter. IndexWriter writter = new IndexWriter(memoryIndex, indexWriterConfig) ĭocument.add(new TextField("title", title, )) ĭocument.add(new TextField("body", body, )) IndexWriterConfig indexWriterConfig = new IndexWriterConfig(analyzer)
StandardAnalyzer analyzer = new StandardAnalyzer() Directory memoryIndex = new RAMDirectory()