Topic/s – Towards Topics-based, Semantics-assisted News Search


wims13 A research project that i’ve been heading since January 2012, namely Topic/S – Automatic Extraction und Search of Topics and Trends in Editorial Data (NewsRoom) denotes its first accepted scientific paper at the WIMS’13 – International Conference on Web Intelligence, Mining and Semantics to be held June 12-14, 2013, in Madrid, Spain. The paper is entitled “Towards Topics-based, Semantics-assisted News Search” and summarizes the work in progress of the ongoing research project. This is the full reference:

M. Voigt, M. Aleythe, P. Wehner: Towards Topics-based, Semantics-assisted News Search.
Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics (WIMS’13), ACM, 2013

Do note that the work within the Topic/S project is funded by the European Social Fund / Free State of Saxony, contract no. 99457/2677.

Have fun. P

Recoursively counting files by extension in a windows shell script


It took me sort of lightyears to get this tiny piece of code running. Does not seem much logic to incorporate, uuh? Some looping and counting and printing. However, scripting this old windows shell eventually turned out a nightmare .. Ok, I learned a lot on the way to this script, about delayed expansion in nested loops (blocks) or assignments of arithmetic expressions. But the persisting lesson learned is: next time install python or something comparable first (or go and learn power shell).

Well, at the end of the day i again and again noticed how much I profit from the knowledge available on the net. So I’m going to share this script back for everyone having an interest.

Remarks:

  • get me a count of all files of type (or extension) %1, recoursing any dir within the current dir
  • the pushd / popd stuff is used because %%A cannot be used following “for /r” ;-)
  • the !CNT! is used because the changes in the inner loop are not visible to the outer one ;-) ;-)
echo off
setlocal enabledelayedexpansion
for /f %%A in ('dir /b/a:d') do (
  pushd %%A
  set CNT=0
  for /r %%X in (*.%1) do ( set /a CNT+=1 )
  echo %%A !CNT!
  popd
)

Enjoy!

Do not filter for dedicated access in (oracle) sparql


It took me a resort to good old autotrace lately when attempting to optimize a sparql query in oracle’s semantic technology stack. The point was actually, that the still growing graph of data rendered some queries, that were fast once, into lame old rusty cars (another good old story as well).

So, what sort of query do I talk about? Not much of a monster, being stripped down for simplicity just imagine a graph of newspaper articles that relate to some named entities by nodes representing a named entity recognition match and offering a hook to hang up additional match information, say the weight of the match and so on. The query features sort of an entry point (that it was’nt actually!), the uri of the article here. That entry point is just applied as a filter expression to this pattern nodes (?art_1 = article, ?sim_1 = sem item match, ?sitm = sem item). Executing the following simplified stuff originally took around 8 secs run time, cough, cough.

select art_1, sim_1, sitm
from table(
  sem_match('{
    ?art_1 tpcs:hasSemItemMatch ?sim_1 . ?sim_1 tpcs:hasSemItem ?sitm .
    filter ( (?art_1 = <http://www.topic-s.de/topics-facts/id/article/926791705>) ) }',
  sem_models('topics'), null,
  sem_aliases(sem_alias('tpcs','http://www.topic-s.de/topics-schema#')), null));

Read more of this post

Evaluating free mockup apps: pencil and prototyper-free


There is quite a lot of mockup apps around whereas only some of them are under active development and additionally are offered with a non-commercial licene. I (only rawly) evaluated two of them, namely pencil (2.0.3) and prototyper-free (2.1.0) for an application in a web-ui project.

pencil is a classic in the mockup scene and may became most popular by shipping a firefox plugin variant that allows for quick web-based mockup production. however, since firefox meanwhile outruns any developer by updates, the current plugin rendered incompatible with my running firefox installation and i settled for the windows installation package. prototyper-free is, on the other hand, a branch of the commercial, fully fledged mockup tool that has been restricted in functionality and more specific in extensionability (see below) without preventing you from getting your mockup job done efficiently.
Read more of this post

The 11g pivot query and the group by clause


There has been this long awaited pivot query feature available since 11g that saves us time in writing those infamous decode / group by queries (see expert one-on-one from Tom Kyte or the web for examples) to flip (grouped by) leading row values to column names. Searching the web for application examples, however, does only reveal this emp-table stuff, e.g. on http://orafaq.com/wiki/PIVOT, that still contains the well known group by with the base query.

SELECT *
  FROM (SELECT job, deptno, sum(sal) sal FROM emp GROUP BY job, deptno)
         PIVOT ( sum(sal) FOR deptno IN (10, 20, 30, 40) );

JOB               10         20         30         40
--------- ---------- ---------- ---------- ----------
CLERK           1300       1900        950
SALESMAN                              5600
PRESIDENT       5000
MANAGER         2450       2975       2850
ANALYST                    6000

Jep so, what I wanted to achive was a simple flip of rows of status values and counts into columns of status names and count values based on some data like the following.

Read more of this post

Just another discussion of unicode character conversion for oracle


Running an oracle in multibyte unicode storage like AL32UTF8, disregarding the char and byte column length topic, is actually no different from the old days single byte storage, e.g. in WE8MSWIN1252. However, any job that includes sort of character conversion in terms of character, decimal and hex reprasentations, does require at least a basic understanding of available unicode storage options and sql functions with oracle. To me, the main reason of common problems is the mismatch being imposed by oracle’s impure layout of the sql functions ascii, asciistr, chr, nchr and unistr concerning the database and the national characterset.

The following has been executed on a 11gR2 on win64 using these database and the national characterset storage options.

SQL> select * from NLS_database_PARAMETERS where parameter like '%CHARACTERSET%';
PARAMETER                VALUE
---------                -----
NLS_CHARACTERSET         AL32UTF8
NLS_NCHAR_CHARACTERSET   AL16UTF16

Read more of this post

Resource allocation when importing a yago2 ontology into an oracle semantic database


This is a short review of the resources nessecary when importing a full yago2 ontology, which is about 200 mio triples, into an oracle semantic database. Some information and snippets about the way to execute the import is given but this is not the main focus of the article. I’m still on a fresh, not otherwise loaded 11gR2.0.1 database.

The way to execute the import mainly followed the instructions given with the rdf demos, that is using sqlldr and bulkload.ctl to populate a staging table as proposed in ORACLE_HOME\md\demo\network\rdf_demos and afterwards employ sem_apis.bulk_load_from_staging_table() to actually load the data into the sematic net. bulkload.ctl has in fact not being changed anyway, the yago2 data being supplied in nt triples formatting like this:

<Embeth_Davidtz> <http://yago-knowledge.org/resource/actedIn> <Army_of_Darkness> .

and the staging table:

create table yago2_stage (
  RDF$STC_sub varchar2(4000) not null,
  RDF$STC_pred varchar2(4000) not null,
  RDF$STC_obj varchar2(4000) not null,
  RDF$STC_sub_ext varchar2(64),
  RDF$STC_pred_ext varchar2(64),
  RDF$STC_obj_ext varchar2(64),
  RDF$STC_canon_ext varchar2(64)
) compress;

and the sql loader call:

sqlldr userid=lucene/*** control=yago2.ctl data=yago2_1.nt direct=true skip=0 load=95000000 discardmax=10000 bad=yago2.bad discard=yago2.dis log=yago2.log errors=0

Read more of this post

NLS quirks when loading triples into oracle semantic models


Something really strange happened during my latest tests against the bulk loading feature of oracle semantic technology. I was working through the demos, supplied with the oracle 11gr2 examples media, that is $ORACLE_HOME/md/demo/network/rdf_demos/bulkload{n}.* and other files, as the following error was raised when executing the load from the staging table. Obviously, oracle seems to complain about an invalid dateTime format, but, however, the more I rechecked the actual object value of the triple against the XMLSchema standard, in particular ISO 8601 for date and time datatypes, the more I learned that the given value was as correct as can be (note by the way, that there are in fact triples with the rdf demos that are erroneous by intent to debug the code in bulkload.ctl).

begin sem_apis.bulk_load_from_staging_table('yago2', 'lucene', 'yago2_stage'); end;
/
Line 117: ORA-13199: During LBV:ORA-13199: Element Parse Error: Invalid date/time value [debug info: GCVN-timestamp: 2030-10-20T12:10:00Z] (value: "2030-10-20T12:10:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>) ORA-06512: in "MDSYS.MD", Zeile 1723
ORA-06512: in "MDSYS.MDERR", Zeile 17
ORA-06512: in "MDSYS.SDO_RDF", Zeile 9
ORA-06512: in "MDSYS.SDO_RDF_INTERNAL", Zeile 768

In order to minimize the problem space, I executed a simple insert statement using SDO_RDF_TRIPLE_S but only achieved the same result. Searching the net for comparable use cases I more and more sensed finding myself caught in an NLS-related problem. To prove this idea I executed another simple insert statement carrying an actual object value of XMLSchema#double as follows:

INSERT INTO yago2_data VALUES (null, SDO_RDF_TRIPLE_S ('yago2', '<http://example.org/Nathan>',
  '<http://example.org/age>', '"1.2"^^<http://www.w3.org/2001/XMLSchema#double>'))
/
Line 65: ORA-55303: SDO_RDF_TRIPLE_S-Konstruktor nicht erfolgreich: BNode-non-reuse case: SQLERRM=ORA-55328: Versuch, den literalen Wert "1.2"^^<http://www.w3.org/2001/XMLSchema#double> einzufügen, war nicht erfolgreich
ORA-06512: in "MDSYS.MD", Zeile 1723
ORA-06512: in "MDSYS.MDERR", Zeile 17
ORA-06512: in "MDSYS.SDO_RDF_TRIPLE_S", Zeile 64

Another ora-code but essentially the same outcome, uuhh? Ok, since my database sessions usually suppose a german NLS environment, I tried to replace the number separator from “.” to “,” and finally succeeded.

So I ask myself, what the hell is this? The interpretation of XMLSchema datatypes being dependent on the nls settings of the session environment. Hey guys @ oracle, did you ever read the XMLSchema standard (#!%&§$)?

Well, the appropriate settings that will resolve this loading quirk go like this:

ALTER SESSION SET NLS_DATE_LANGUAGE = American;
ALTER SESSION SET NLS_NUMERIC_CHARACTERS = '.,' ;

have fun (nevertheless)!

Running a standalone apex listener on a 11gr2


This is just a cookbook version of a get to go of the oracle apex listener on a (windows) oracle 11gR2. Necessary resources comprise the (otn authenticated) installation and developer guide for apex listener (pdf) as well as the download area for apex listener (zip). Note that there is another installation and developer guide for apex listener that comes as html pages with the downloadable package. It does contain release notes as of august 2011 such that it should be more recent than the pdf version (as of February 2011). There also is an apex listener forum hosted by oracle, see the APEX Listener OTN Forum for any user driven discussion threads.

Me, i tested the standalone version of apex listener to be employed in some rapid prototyping developments, establishing an easy backend data connection for nowadays ajax-based web ui’s. Iff you go for some apex listener evaluations pointing to productive environments, do consider deploying the code with a standard j2ee container host.

The installation (and developer) guide is, as mostly with oracle, a well done step-by-step manual just to follow and this is the way i choosed to go as follows: Get the downloadable package from the resource above. It’s an archive versioned as apex_listener.1.1.3.243.11.40.zip to this date and contains a singleton war file along with some documentation.

I already have apex installed with this database instance and decided to extract the archive to E:\oracle\product\11.2.0\dbhome_1\apex-listener, pointed out as the “temp” directory, for whatever reason, in the installation guide. From this root directory i launched the war file along with some documented parameters to depict the deployment directory, the apex image directory, the http port to use, and so on:

E:\oracle\product\11.2.0\dbhome_1\apex-listener\temp>java -Dapex.home=E:\oracle\product\11.2.0\dbhome_1\apex-listener\runtime -Dapex.images=E:\oracle\product\11.2.0\dbhome_1\apex\images -Dapex.port=8585 -jar apex.war

The deployment directory, actually, is the place where the war file will be extracted to and where runtime information as well as configuration data will be written. Do not use -Dapex.erase=true iff you decide to follow my installation hierarchie, because doing so will erase all configuration data too.

Read more of this post

Reading an xml file as an oracle external table


There is a couple of posts around that imply or claim an oracle external table to be able to read xml files by design. This is not true, really. It may work for you or actually may fit your current xml file structure but you should generally not approach to read xml by external tables. Here is why.

What external tables or sql loader for better can do actually is reading line oriented, file based data in bulk and quite fast. It offers a lot of settings to parametrize the loading process, the reader, according to your input format. You may for example have a look at the excellent series of posts discussing this options by Jiri starting here Oracle External Tables by Examples part 1 – TAB delimited fields.

Well, this functionality, may also be (mis)used to read xml instances from files. Given a file as the following one may be interested in extracting the contents of the token tags.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<FILE_INFO>
	<PERSON_BILD>
		<token_row><token>Fritz &amp; Fischer</token></token_row>
		<token_row><token>Boris Borsberg</token></token_row>
	</PERSON_BILD>
</FILE_INFO>

Some external table definition may then look like this:

Read more of this post

Follow

Get every new post delivered to your Inbox.