XMLType.schemaValidate is your only true friend for xmlschema validation


When it comes to introducing xml-data into your database storage, you might, as all sincere developers do, at first attempt to take care of proper data integrity checking on import. Since xml is a really powerful but a somewhat complex document-alike data representation, such integrity checking must incorporate proving the document strucuture, as sort of integrated data types, iff you like, as well as the document data, in terms of facets of given actual values. According to the xml standard, xmlschema is the means of choice here, offering another xml-spelled specification layer to achive the two beforementioned goals. Eventually, a given xml-instance will have to be thrown against the xmlschema provided, to assure its integrity (and being well-formed too, btw), the earlier the better, at best on data import already.

The paragraph above is actually not very oracle specific. Any implementation of the xml standard, as oracle’s xml db does, proposes this course of action. However, since some xml operations may become quite costly when the xml-instances get large and the whole dom tree has to be set up in memory, oracle, as others, dabbles at dodging and shifting pricey work to the most reasonable extend possible. You may load xml into the database, claiming any xml-instances are just fine, validate xml-instances only to be well-formed, manage the validation status of xml-instances at your own responsibility and so forth. There is nothing wrong about that whatsoever. The point is, though, it might be a stony walkway to learn to distinguish the maybe from the certain, to establish a reliable check to safeguard your xml data integrity, just as simple as ALTER TABLE mytab ADD(CONSTRAINT mytab_CHK CHECK(INSERT_TS is not null) NOT DEFERRABLE ENABLE);.

Jep, what next? I’m going to briefly discuss the xmlschema / xmlinstance used below and then show what I experienced will happen using the various ways of “validating” the instance against the schema. The post is somewhat lengthy, but do not get frightened, this is for the c&p examples all along the way.

The playground xmlschema, referred to as v_sch in the registerSchema call below, is a stolen fragment from some of my gis adventures and features a root-element tpcs_si to contain a location loc with a latitude lat (longitude saved for brevity). The latitude sources from an extended type definition to facet the decimal values. The schema implies the use of an explicit namespace of the elements defined throughout the xmlinstances.

<xs:schema targetNamespace="tpcs_namespace"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:tpcs="tpcs_namespace"
    elementFormDefault="qualified"
    attributeFormDefault="unqualified">
  <xs:element name="tpcs_si">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="loc" minOccurs="1" maxOccurs="1">
          <xs:complexType>
            <xs:attribute name="lat" type="tpcs:latDecimal" use="required"/>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:simpleType name="latDecimal">
    <xs:annotation><xs:documentation>neg for equator south, 90 is theoretical</xs:documentation></xs:annotation>
    <xs:restriction base="xs:decimal">
      <xs:minInclusive value="-90"/>
      <xs:maxInclusive value="90"/>
    </xs:restriction>
  </xs:simpleType>
</xs:schema>

The playground xmlinstance, referred to as v_doc below, just develops what is allowed to as well as pointing to the schema location of its namespace. We are going to poke around with the facet upon /tpcs:tpcs_si/loc/@lat later, to have the oracle xml parser complain (or not). That’s it.

<tpcs:tpcs_si
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:tpcs="tpcs_namespace"
    xsi:schemaLocation="tpcs_namespace http://www.bitbach.de/tpcs.xsd" >
  <tpcs:loc lat="51.04809" />
</tpcs:tpcs_si>

Yet another requisite is to register the xmlschema with xml db to use the url pointer in operations. Since we do not need any data structures generated on the oracle layer, these parameters have been set to false.

declare
  v_url varchar2(32767) := 'http://www.bitbach.de/tpcs.xsd';
  v_sch varchar2(32767) := '...';
begin
  dbms_xmlschema.registerschema(schemaurl => v_url,
    schemadoc => v_sch, local => true, gentypes => false, gentables => false);
end;
/

Ok then, let’s go testing.

schema just given in the instance

select xmltype('
    <tpcs:tpcs_si
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns:tpcs="tpcs_namespace"
        xsi:schemaLocation="tpcs_namespace http://www.bitbach.de/tpcs.xsd" >
      <tpcs:loc lat="51.04809" />
    </tpcs:tpcs_si>',
  validated => 0, wellformed => 0)
from dual
;
select xmltype('
    <tpcs:tpcs_si
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns:tpcs="tpcs_namespace"
        xsi:schemaLocation="tpcs_namespace http://www.bitbach.de/tpcs.xsd" >
      <tpcs:loc_abcdef lat="51.04809" />
    </tpcs:tpcs_si>',
  validated => 0, wellformed => 0)
from dual
;
select xmltype('
    <tpcs:tpcs_si_abcdef
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns:tpcs="tpcs_namespace"
        xsi:schemaLocation="tpcs_namespace http://www.bitbach.de/tpcs.xsd" >
      <tpcs:loc lat="51.04809" />
    </tpcs:tpcs_si>',
  validated => 0, wellformed => 0)
from dual
;
ORA-31011: XML-Parsing nicht erfolgreich
ORA-19202: Fehler bei XML-Verarbeitung 
LPX-00225: End-Element-Tag "tpcs:tpcs_si" stimmt nicht mit Start-Element-Tag "tpcs:tpcs_siabcdef" überein.
select xmltype('
    <tpcs:tpcs_si_abcdef
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns:tpcs="tpcs_namespace"
        xsi:schemaLocation="tpcs_namespace http://www.bitbach.de/tpcs.xsd" >
      <tpcs:loc lat="51.04809" />
    </tpcs:tpcs_si_abcdef>',
  validated => 0, wellformed => 0)
from dual
;

So obviously, just providing xmlschema information, i.e. a proper pointer, just with the xmlinstance is not sufficient. Only the well-formed status will be checked. In-depth document structure or value facets (not shown here), remains unregarded.

schema given in the instance and as a parameter to xmltype

I sourced that from howto validate xml content against an xml schema and it sounds plausible, not? Btw, this what the other crowds in java and python do all day, right?

select xmltype('
    <tpcs:tpcs_si
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns:tpcs="tpcs_namespace"
        xsi:schemaLocation="tpcs_namespace http://www.bitbach.de/tpcs.xsd" >
      <tpcs:loc_abcdef lat="51.04809" />
    </tpcs:tpcs_si>',
  schema => 'http://www.bitbach.de/tpcs.xsd',
  validated => 0, wellformed => 0)
from dual
;
select xmltype('
    <tpcs:tpcs_si
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns:tpcs="tpcs_namespace"
        xsi:schemaLocation="tpcs_namespace http://www.bitbach.de/tpcs.xsd" >
      <tpcs:loc_abcdef lat="51.04809" />
    </tpcs:tpcs_si>',
  validated => 0, wellformed => 0)
  .createSchemaBasedXML('http://www.bitbach.de/tpcs.xsd')
from dual
;

Dont’t know what happens here. The xmltype documentation) is indeed not very chatty but does imply, imho, that the xmlinstance will be connected to the the xmlschema, i.e. is schema-based, which in turn, imho, does suggest validation activity(?). Maybe, the calls behave that unexpected because the generated xmlinstance is still transient?

Let’s try xmltype.isSchemaBased().. and find out that only isSchemaBased triggers true xmlinstance validation. So, createSchemaBasedXML may establish a link between a schema and an instance, but does demonstrably not validate.

select xmltype('
    <tpcs:tpcs_si
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns:tpcs="tpcs_namespace"
        xsi:schemaLocation="tpcs_namespace http://www.bitbach.de/tpcs.xsd" >
      <tpcs:loc_abcdef lat="51.04809" />
    </tpcs:tpcs_si>',
  validated => 0, wellformed => 0)
  .createSchemaBasedXml('http://www.bitbach.de/tpcs.xsd')
  .isSchemaBased()
from dual
;
ORA-30937: Keine Schemadefinition für "loc_abcdef" (Namespace "tpcs_namespace") in übergeordnetem Knoten "/tpcs_si"
ORA-06512: in "SYS.XMLTYPE", Zeile 206
30937. 00000 -  "No schema definition for '%s' (namespace '%s') in parent '%s'"

Funny enough though, that isSchemaBased throws an exception, which is not a documented behaviour.

schema given in the instance and storing the instance persistently in a xmlschema-based table

--drop table tpcs_si purge
--;
create table tpcs_si of xmltype
  xmlschema "http://www.bitbach.de/tpcs.xsd"
  element "tpcs_si"
;
insert into tpcs_si values(xmltype('
  <tpcs:tpcs_si
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xmlns:tpcs="tpcs_namespace"
      xsi:schemaLocation="tpcs_namespace http://www.bitbach.de/tpcs.xsd" >
    <tpcs:loc lat="51.04809" />
  </tpcs:tpcs_si>'))
;
insert into tpcs_si values(xmltype('
  <tpcs:tpcs_si
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xmlns:tpcs="tpcs_namespace"
      xsi:schemaLocation="tpcs_namespace http://www.bitbach.de/tpcs.xsd" >
    <tpcs:loc_abcdef lat="51.04809" />
  </tpcs:tpcs_si>'))
;
insert into tpcs_si values(xmltype('
  <tpcs:tpcs_si
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xmlns:tpcs="tpcs_namespace_abcdef"
      xsi:schemaLocation="tpcs_namespace http://www.bitbach.de/tpcs.xsd" >
    <tpcs:loc lat="51.04809" />
  </tpcs:tpcs_si>'))
;
SQL-Fehler: ORA-19007: Schema - stimmt nicht mit erwartetem Schema http://www.bitbach.de/tpcs.xsd überein.
19007. 00000 -  "Schema %s does not match expected %s."
*Cause:    The given XML document conformed to a different schema than expected.
*Action:   Insert/Update only the XML documents that conform to that particular schema.

Seems that even in persistent xmltype storage, deep validation of a xmlinstance is still delayed. Well-formedness will be checked against as the namespace and the root element do. Let’s test on with xmltype.schemaValidate() used in a trigger, as noticed in the xml db documentation.

create or replace trigger tpcs_si_bir before insert on tpcs_si for each row
begin
  if (:new.object_value is not null) then :new.object_value.schemaValidate(); end if;
end;
/
insert into tpcs_si values(xmltype('
  <tpcs:tpcs_si
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xmlns:tpcs="tpcs_namespace"
      xsi:schemaLocation="tpcs_namespace http://www.bitbach.de/tpcs.xsd" >
    <tpcs:loc_abcdef lat="51.04809" />
  </tpcs:tpcs_si>'))
;

Finally, finally! So this the right wrench? How about transient xmtype instance again?

schemaValidate on a transient instance

declare
  v_doc varchar2(32767) := '
    <tpcs:tpcs_si
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns:tpcs="tpcs_namespace"
        xsi:schemaLocation="tpcs_namespace http://www.bitbach.de/tpcs.xsd" >
      <tpcs:loc lat="51.04809" />
    </tpcs:tpcs_si>';
  v_obj xmltype := xmltype(v_doc);
begin
  v_obj.schemaValidate();
end;
/
declare
  v_doc varchar2(32767) := '
    <tpcs:tpcs_si
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns:tpcs="tpcs_namespace"
        xsi:schemaLocation="tpcs_namespace http://www.bitbach.de/tpcs.xsd" >
      <tpcs:loc_abcdef lat="51.04809" />
    </tpcs:tpcs_si>';
  v_obj xmltype := xmltype(v_doc);
begin
  v_obj.schemaValidate();
end;
/
ORA-30937: Keine Schemadefinition für "loc_abcdef" (Namespace "tpcs_namespace") in übergeordnetem Knoten "/tpcs_si"
ORA-06512: in "SYS.XMLTYPE", Zeile 354
ORA-06512: in Zeile 11
declare
  v_doc varchar2(32767) := '
    <tpcs:tpcs_si
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns:tpcs="tpcs_namespace"
        xsi:schemaLocation="tpcs_namespace http://www.bitbach.de/tpcs.xsd" >
      <tpcs:loc lat="11151.04809" />
    </tpcs:tpcs_si>';
  v_obj xmltype := xmltype(v_doc);
begin
  v_obj.schemaValidate();
end;
/
LSX-00292: Wert "11151.04809" ist größer als Höchstwert "90" (inclusive) aufgetreten
ORA-06512: in "SYS.XMLTYPE", Zeile 354
ORA-06512: in Zeile 11

bottom line

Ok, so it seems that in practice it does not matter whether some xmlinstance is transient or persistent, is schema-based or non-schema-based, is marked as schema-valid =1 or =0. Bring on schemaValidate as your safeguard, straightening the validation status out for shure. To me, I in fact can comprehend why oracle attempts to save cpu for xml-processing whenever possible, however, the xml db documentation does not introduce this idea anywhere and the xmltype-method names are of not much further help either. So again, one is forced to test, test, test, already in the early stages of development to undetrstand what the new toolbox around actually offers.

Sourced also : validate xml in oracle against xsd.

Have fun, Peter

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s