semantic tech

SENSE – Intelligent Storage and Exploration of large Document Sets

The is the next public presentation of the SENSE – Intelligent Storage and Exploration of large Document Sets project, this time, however, in a more general manner at the poster session of the INFORMATIK 2013 – 43. Jahrestagung der Gesellschaft für Informatik to be held on Thuesday, September 17, 2013, in Koblenz, Germany. The poster is neither aspired nor even able to show any details but will hopefully encourage a lot of people to get into inspiring discussions about the scaleability of large multimedia management systems over time. I’m looking forward to meet you there! This is the preliminary reference:

Wehner, P. and Krüger, R.: SENSE – Intelligent Storage and Exploration of large Document Sets.Poster Session of the 43. Jahrestagung der Gesellschaft für Informatik (INFORMATIK 2013), Koblenz, Germany, Springer, 2013

Do note that the SENSE project is funded by the German Federal Ministry of Education and Research (bmbf) under grant No. 01IS11025A and constitutes a part of “KMUInnovativ: IKT” initiative.

Have fun. P

ps. I’m also eagerly looking forward to dive into the “Tag der Informatik” on Wednesday, September 18, 2013.

SENSE – Semantic-guided Communication & Composition in a Widget/Dashboard Environment

The second research project (alongside Topic/S) that i’ve been heading since January 2012, namely SENSE – Intelligent Storage and Exploration of large Document Sets evenly denotes its first accepted scientific paper at the ICWE’13 – 13th International Conference on WebEngineering – 5th International Workshop on Lightweight Integration on the Web to be held July 8-12, 2013, in Aalborg, Denmark. The paper is entitled “Semantic-guided Communication & Composition in a Widget/Dashboard Environment” and summarizes the work in progress of just the user interface part or layer of the the ongoing research project. This is the preliminary reference:

Wehner, P. and Krüger, R.: Semantic-guided Communication & Composition in a Widget/Dashboard Environment.Proceedings of the 5th International Workshop on Lightweight Integration on the Web (ComposableWeb), 13th International Conference on WebEngineering (ICWE’13), Aalborg, Germany, Springer, 2013

Paper / Slides

Do note that the SENSE project is funded by the German Federal Ministry of Education
and Research (bmbf) under grant No. 01IS11025A and constitutes a part of “KMUInnovativ: IKT” initiative.

Have fun. P

Topic/s – Towards Topics-based, Semantics-assisted News Search

A research project that i’ve been heading since January 2012, namely Topic/S – Automatic Extraction und Search of Topics and Trends in Editorial Data (NewsRoom) denotes its first accepted scientific paper at the WIMS’13 – International Conference on Web Intelligence, Mining and Semantics to be held June 12-14, 2013, in Madrid, Spain. The paper is entitled “Towards Topics-based, Semantics-assisted News Search” and summarizes the work in progress of the ongoing research project. This is the full reference:

M. Voigt, M. Aleythe, P. Wehner: Towards Topics-based, Semantics-assisted News Search.
Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics (WIMS’13), ACM, 2013

Do note that the work within the Topic/S project is funded by the European Social Fund / Free State of Saxony, contract no. 99457/2677.

Have fun. P

Update: The slides are now available via slideshare.

Do not filter for dedicated access in (oracle) sparql

It took me a resort to good old autotrace lately when attempting to optimize a sparql query in oracle’s semantic technology stack. The point was actually, that the still growing graph of data rendered some queries, that were fast once, into lame old rusty cars (another good old story as well).

So, what sort of query do I talk about? Not much of a monster, being stripped down for simplicity just imagine a graph of newspaper articles that relate to some named entities by nodes representing a named entity recognition match and offering a hook to hang up additional match information, say the weight of the match and so on. The query features sort of an entry point (that it was’nt actually!), the uri of the article here. That entry point is just applied as a filter expression to this pattern nodes (?art_1 = article, ?sim_1 = sem item match, ?sitm = sem item). Executing the following simplified stuff originally took around 8 secs run time, cough, cough.

select art_1, sim_1, sitm
from table(
  sem_match('{
    ?art_1 tpcs:hasSemItemMatch ?sim_1 . ?sim_1 tpcs:hasSemItem ?sitm .
    filter ( (?art_1 = <http://www.topic-s.de/topics-facts/id/article/926791705>) ) }',
  sem_models('topics'), null,
  sem_aliases(sem_alias('tpcs','http://www.topic-s.de/topics-schema#')), null));

(more…)

Resource allocation when importing a yago2 ontology into an oracle semantic database

This is a short review of the resources nessecary when importing a full yago2 ontology, which is about 200 mio triples, into an oracle semantic database. Some information and snippets about the way to execute the import is given but this is not the main focus of the article. I’m still on a fresh, not otherwise loaded 11gR2.0.1 database.

The way to execute the import mainly followed the instructions given with the rdf demos, that is using sqlldr and bulkload.ctl to populate a staging table as proposed in ORACLE_HOME\md\demo\network\rdf_demos and afterwards employ sem_apis.bulk_load_from_staging_table() to actually load the data into the sematic net. bulkload.ctl has in fact not being changed anyway, the yago2 data being supplied in nt triples formatting like this:

<Embeth_Davidtz> <http://yago-knowledge.org/resource/actedIn> <Army_of_Darkness> .

and the staging table:

create table yago2_stage (
  RDF$STC_sub varchar2(4000) not null,
  RDF$STC_pred varchar2(4000) not null,
  RDF$STC_obj varchar2(4000) not null,
  RDF$STC_sub_ext varchar2(64),
  RDF$STC_pred_ext varchar2(64),
  RDF$STC_obj_ext varchar2(64),
  RDF$STC_canon_ext varchar2(64)
) compress;

and the sql loader call:

sqlldr userid=lucene/*** control=yago2.ctl data=yago2_1.nt direct=true skip=0 load=95000000 discardmax=10000 bad=yago2.bad discard=yago2.dis log=yago2.log errors=0

(more…)