Month: June 2012

Resource allocation when importing a yago2 ontology into an oracle semantic database

This is a short review of the resources nessecary when importing a full yago2 ontology, which is about 200 mio triples, into an oracle semantic database. Some information and snippets about the way to execute the import is given but this is not the main focus of the article. I’m still on a fresh, not otherwise loaded 11gR2.0.1 database.

The way to execute the import mainly followed the instructions given with the rdf demos, that is using sqlldr and bulkload.ctl to populate a staging table as proposed in ORACLE_HOME\md\demo\network\rdf_demos and afterwards employ sem_apis.bulk_load_from_staging_table() to actually load the data into the sematic net. bulkload.ctl has in fact not being changed anyway, the yago2 data being supplied in nt triples formatting like this:

<Embeth_Davidtz> <> <Army_of_Darkness> .

and the staging table:

create table yago2_stage (
  RDF$STC_sub varchar2(4000) not null,
  RDF$STC_pred varchar2(4000) not null,
  RDF$STC_obj varchar2(4000) not null,
  RDF$STC_sub_ext varchar2(64),
  RDF$STC_pred_ext varchar2(64),
  RDF$STC_obj_ext varchar2(64),
  RDF$STC_canon_ext varchar2(64)
) compress;

and the sql loader call:

sqlldr userid=lucene/*** control=yago2.ctl data=yago2_1.nt direct=true skip=0 load=95000000 discardmax=10000 bad=yago2.bad discard=yago2.dis log=yago2.log errors=0


NLS quirks when loading triples into oracle semantic models

Something really strange happened during my latest tests against the bulk loading feature of oracle semantic technology. I was working through the demos, supplied with the oracle 11gr2 examples media, that is $ORACLE_HOME/md/demo/network/rdf_demos/bulkload{n}.* and other files, as the following error was raised when executing the load from the staging table. Obviously, oracle seems to complain about an invalid dateTime format, but, however, the more I rechecked the actual object value of the triple against the XMLSchema standard, in particular ISO 8601 for date and time datatypes, the more I learned that the given value was as correct as can be (note by the way, that there are in fact triples with the rdf demos that are erroneous by intent to debug the code in bulkload.ctl).

begin sem_apis.bulk_load_from_staging_table('yago2', 'lucene', 'yago2_stage'); end;
Line 117: ORA-13199: During LBV:ORA-13199: Element Parse Error: Invalid date/time value [debug info: GCVN-timestamp: 2030-10-20T12:10:00Z] (value: "2030-10-20T12:10:00Z"^^<>) ORA-06512: in "MDSYS.MD", Zeile 1723
ORA-06512: in "MDSYS.MDERR", Zeile 17
ORA-06512: in "MDSYS.SDO_RDF", Zeile 9
ORA-06512: in "MDSYS.SDO_RDF_INTERNAL", Zeile 768

In order to minimize the problem space, I executed a simple insert statement using SDO_RDF_TRIPLE_S but only achieved the same result. Searching the net for comparable use cases I more and more sensed finding myself caught in an NLS-related problem. To prove this idea I executed another simple insert statement carrying an actual object value of XMLSchema#double as follows:

INSERT INTO yago2_data VALUES (null, SDO_RDF_TRIPLE_S ('yago2', '<>',
  '<>', '"1.2"^^<>'))
Line 65: ORA-55303: SDO_RDF_TRIPLE_S-Konstruktor nicht erfolgreich: BNode-non-reuse case: SQLERRM=ORA-55328: Versuch, den literalen Wert "1.2"^^<> einzufügen, war nicht erfolgreich
ORA-06512: in "MDSYS.MD", Zeile 1723
ORA-06512: in "MDSYS.MDERR", Zeile 17
ORA-06512: in "MDSYS.SDO_RDF_TRIPLE_S", Zeile 64

Another ora-code but essentially the same outcome, uuhh? Ok, since my database sessions usually suppose a german NLS environment, I tried to replace the number separator from “.” to “,” and finally succeeded.

So I ask myself, what the hell is this? The interpretation of XMLSchema datatypes being dependent on the nls settings of the session environment. Hey guys @ oracle, did you ever read the XMLSchema standard (#!%&§$)?

Well, the appropriate settings that will resolve this loading quirk go like this:


have fun (nevertheless)!