Month: September 2011

Non-range queries on numeric fields with lucene gt 2.9


As you might know from following the ongoing development of the Lucene search engine, true indexing of numeric values is possible since version 2.9. The actual trick that implemements numeric values in a string based engine, as opposite to a regular relational database for example, is widely discussed elsewhere, e.g. (SearchNumericalFields). However, to get it in short, those guys just encode any signed numeric along a line of string encoded buckets of a certain precicion that finally allows for greater-than, greater-than-equal etc. queries. As a result you have new “numeric” classes at your disposal, namely NumericRangeQuery and NumericRangeFilter for querying and filtering, respectively. Also note that the class SortField since then allows for field types being encoded numerics.

So far, this is a huge step forward in providing professional means for semi-structured searching and sorting. However, what fairly seldom gets attention is the question how to realize equality or in-list searches (or filtering) on numeric values because launching a classic TermQuery or Filter on a numeric field will simply not work. Something quite commonly seen is to just employ a range query/filter with the same inclusive upper and lower bound to achive an equality logic like so:

Filter filter = NumericRangefilter.newIntRange("someField", 2, 2, true, true);

I do not know whether this will perform well under all circumstances but at least know that Lucene comprises some clever rewrite code to care for something chubby like this.

Jep uhh, replacing equality logic with the line above does not imply a comparable approach for in-list searches (or filtering), probably known from TermsFilter. The only way to reproduce in-list logic is to encode the numeric search tokens to the internal format by yourself. This is when the class NumericUtils comes in. It provides for several member methods that produce encoded strings from the family of common numeric datatypes like so:

long[] mdtIds = {168, 167, 151};
Filter orListFilter = new TermsFilter();
for (long mdtId : mdtIds)
((TermsFilter) orListFilter).addTerm(
  new Term(IndexField.IMP_MANDANT.name(), NumericUtils.longToPrefixCoded(mdtId)));

Someone may argue that using NumericUtils ist not a recommended employment of the Lucene library. He/she may be right … or not ;-). Let’s see.

ps. I’m using query and filter more or less as synonyms throughout the article. In fact, a Lucene filter may just be seen as a query without scoring. This also becomes obvious iff one takes a look at classes like ConstantScoreQuery or QueryWrapperFilter.

have fun