This is a short review of the resources nessecary when importing a full yago2 ontology, which is about 200 mio triples, into an oracle semantic database. Some information and snippets about the way to execute the import is given but this is not the main focus of the article. I’m still on a fresh, not otherwise loaded 11gR2.0.1 database.
The way to execute the import mainly followed the instructions given with the rdf demos, that is using sqlldr
and bulkload.ctl
to populate a staging table as proposed in ORACLE_HOME\md\demo\network\rdf_demos
and afterwards employ sem_apis.bulk_load_from_staging_table()
to actually load the data into the sematic net. bulkload.ctl
has in fact not being changed anyway, the yago2 data being supplied in nt
triples formatting like this:
<Embeth_Davidtz> <http://yago-knowledge.org/resource/actedIn> <Army_of_Darkness> .
and the staging table:
create table yago2_stage ( RDF$STC_sub varchar2(4000) not null, RDF$STC_pred varchar2(4000) not null, RDF$STC_obj varchar2(4000) not null, RDF$STC_sub_ext varchar2(64), RDF$STC_pred_ext varchar2(64), RDF$STC_obj_ext varchar2(64), RDF$STC_canon_ext varchar2(64) ) compress;
and the sql loader call:
sqlldr userid=lucene/*** control=yago2.ctl data=yago2_1.nt direct=true skip=0 load=95000000 discardmax=10000 bad=yago2.bad discard=yago2.dis log=yago2.log errors=0