XMLType.schemaValidate is your only true friend for xmlschema validation

When it comes to introducing xml-data into your database storage, you might, as all sincere developers do, at first attempt to take care of proper data integrity checking on import. Since xml is a really powerful but a somewhat complex document-alike data representation, such integrity checking must incorporate proving the document strucuture, as sort of integrated data types, iff you like, as well as the document data, in terms of facets of given actual values. According to the xml standard, xmlschema is the means of choice here, offering another xml-spelled specification layer to achive the two beforementioned goals. Eventually, a given xml-instance will have to be thrown against the xmlschema provided, to assure its integrity (and being well-formed too, btw), the earlier the better, at best on data import already.

The paragraph above is actually not very oracle specific. Any implementation of the xml standard, as oracle’s xml db does, proposes this course of action. However, since some xml operations may become quite costly when the xml-instances get large and the whole dom tree has to be set up in memory, oracle, as others, dabbles at dodging and shifting pricey work to the most reasonable extend possible. You may load xml into the database, claiming any xml-instances are just fine, validate xml-instances only to be well-formed, manage the validation status of xml-instances at your own responsibility and so forth. There is nothing wrong about that whatsoever. The point is, though, it might be a stony walkway to learn to distinguish the maybe from the certain, to establish a reliable check to safeguard your xml data integrity, just as simple as ALTER TABLE mytab ADD(CONSTRAINT mytab_CHK CHECK(INSERT_TS is not null) NOT DEFERRABLE ENABLE);.

Jep, what next? I’m going to briefly discuss the xmlschema / xmlinstance used below and then show what I experienced will happen using the various ways of “validating” the instance against the schema. The post is somewhat lengthy, but do not get frightened, this is for the c&p examples all along the way.


Reading an xml file as an oracle external table

There is a couple of posts around that imply or claim an oracle external table to be able to read xml files by design. This is not true, really. It may work for you or actually may fit your current xml file structure but you should generally not approach to read xml by external tables. Here is why.

What external tables or sql loader for better can do actually is reading line oriented, file based data in bulk and quite fast. It offers a lot of settings to parametrize the loading process, the reader, according to your input format. You may for example have a look at the excellent series of posts discussing this options by Jiri starting here Oracle External Tables by Examples part 1 – TAB delimited fields.

Well, this functionality, may also be (mis)used to read xml instances from files. Given a file as the following one may be interested in extracting the contents of the token tags.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
		<token_row><token>Fritz &amp; Fischer</token></token_row>
		<token_row><token>Boris Borsberg</token></token_row>

Some external table definition may then look like this: