Using the DataImportHandler XPathEntityProcessor on a Database Resultset Column

The Solr documentation for XPathEntityProcessor introduces a spezialization subtype of EntityProcessor that is primarily depicted to process data (to be) imported from xml/http-datasources (for example, Usage with XML/HTTP Datasource). However, using XPathEntityProcessor on a FieldReaderDataSource instead on the original URLDataSource or !HttpDataSource (search for FieldReaderDataSource in Uploading Structured Data Store Data with the Data Import Handler) enables reading xml instances contained in columns delivered from database requests through SqlEntityProcessor.
Bewildered out of words and meanings…? Don’t worry, the following will give you a living example of how to craft the xml from an Oracle database easily and what to do on the Solr side to map the information datums into indexing fields. To me, this is really a nice example of how to employ xml in a true sense of a defined (well-forming, encoding) data exchange layer, hiding most if not all of the implementation details of xml processing on the database and on the search-engine. Note however, that this great time-to-market, through xml processing technically, always comes at a certain extra cost such that the xml-instances shall not become to large for this solution pattern. I will also use xml attributes for small size values instead of tags in the xml generation as one step of optimization.


Solr 2 oracle date indexing timezone handling and probable issues

Apache Solr provides a field definition type for datetime values called solr.TrieDateField (TrieDateField) that is based on an efficient compare-/sort-representation. Being an extension-/derived-class to the well known solr.DateField (DateField) up to Solr 4.x, solr.TrieDateField does replace solr.DateField for Solr releases > 5.0 . Using one or the other date field preassumes one important convention: to handle any value passed around or processed within Solr as UTC (Coordinated Universal Time) or zulu time (Z appended) such that all that timezone-detection- and timezone-math-hassle can be avoided. Solr thus exclusively allows values given to the DataImportHandler as defined in ISO 8601, “1995-12-31T23:59:59Z” as an example. However, iff you do not pass the value as a string but as a database date or timestamp w/o tz datatype within an sql select statement to DataImportHandler, secondary Solr-side processing may have to be taken into account.