This tutorial describes the NVDL language. Different elements and attributes and their meanings are explained with the help of example NVDL schemas. We start with simple scenarios and gradually we move to more difficult and specific ones. A similar concept is used in the NRL tutorial by James Clark [NRL] which served as a base for this document. The first example is probably the simples NVDL schema imaginable.
Example 1. The “Hello World!” NVDL schema
<rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0"> <namespace ns="http://hello-world"> <validate schema="hello-world.dtd"> </namespace> </rules>
The root element in any NVDL schema is called
rules. It contains all the rules that determine the validation process execution. In Example 1, “The “Hello World!” NVDL schema”, we have just one rule. If one of the sections of the validated XML instance matches the namespace
http://hello-world, the whole section is send for validation against the
hello-world.dtd subschema. Elements from different namespaces are rejected, which is the default behaviour.
This example is basically equivalent to a classical single-namespace validation. The validation process is the same, as if we would validate directly against the
hello-world.dtd schema using a DTD validator. Let's move to a more realistic example, where we expect presence of more namespaces in one instance.
Example 2. Compound document schema with
<rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0"> <namespace ns="http://www.w3.org/1999/xhtml"> <validate schema="xhtml.rng"/> </namespace> <namespace ns="http://www.w3.org/2000/svg"/"> <validate schema="svg.sch"/> </namespace> </rules>
In the example we see two different
namespace rules, each of which contains a
validate action. This example shows a basic mapping between a namespace URI (specified by the
ns attribute) and a schema URI (specified by the
The meaning of the NVDL script is very simple. Every section which belongs to the XHTML namespace is validated against the
xhtml.rng Relax NG schema and every SVG section against the Schematron rules defined in
Validation of sections against subschemas can be adjusted using several attributes or childs of the
validate element. Not only the
schema attribute can be used to specify subschemas. In some cases, it may be useful to use the
schema element instead, to embed a subschema directly into our NVDL script. The
schema element may contain either text, in case our subschema is not XML-based, or a foreign XML fragment.
One of the problems an NVDL dispatcher has to solve is, what validator to invoke for each subschema. In most cases our subschema is defined in the XML format, thus the schema language can be easily recognized from the subschema's parent element namespace. But in some cases, the subschema is in a different format and the NVDL dispatcher has to determine the schema language from the MIME type. In case the MIME type in not available, the schema language should be manually specified in the NVDL script using the
A typical example of a non-XML schema language format is DTD. In this case the value of the
schemaType attribute should be
application/xml-dtd. Another widely used example is Relax NG in the compact syntax. In this case we may specify
application/x-rnc as the
schemaType attribute value.
Some validators use specific options to adjust the validation process. Such options may be specified directly in an NVDL script. The NVDL dispatcher takes care of passing those into the appropriate validator. Options are expressed using
option elements inside the
validate action. Their name and value pairs are set using the
arg attributes. If the validation process requires the validator to support a particular option, the
mustSupport attribute should be set to
true. An error is returned, if the validator doesn't support it.
Frequently, we need to allow or reject all elements in a particular context. It doesn't make much sense to express such behaviour using some specific schema language. Instead, for this purpose NVDL offers predefined schemas. A
validate action may be replaced by
reject. In the following example, XHTML sections are validated using the
xhtml.rng schema, but all SVG sections are allowed without even attempting to validate them. All other element sections are rejected. Normally we can leave the
anyNamespace rule out, as it is the default behaviour to reject any section, which doesn't match any of the defined rules.
Example 3. Predefined schemas
<rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0"> <namespace ns="http://www.w3.org/1999/xhtml"> <validate schema="xhtml.rng"/> </namespace> <namespace ns="http://www.w3.org/2000/svg"> <allow/> </namespace> <anyNamespace> <reject/> </anyNamespace> </rules>
When we talk about validation, there is another interesting NVDL feature. For one namespace we may specify several
validate actions. This tells the NVDL dispatcher to invoke several validators for every matching section. As different schema languages are more or less suitable to express some kind of constraints, this is a reasonable use-case. Sometimes we may achieve better validation results using a combination of two or more schema languages.
One of the promising schema language combinations is for example Relax NG and Schematron. Where Relax NG is suitable to define the elementary grammar of a vocabulary (mostly parent-child relations), Schematron is especially useful to express complex validation rules across the XML instance tree.
Imagine, we like to produce valid XHTML documents, which are also accessible in respect to the at Web Content Accessibility Guidelines (WCAG). Where Relax NG is the right solution to express XHTML grammar, Schematron is the preferable language for expressing the various complex accessibility rules. As shown in the next example, multiple
validate elements cause XHTML sections to be validate against both schemas.
Example 4. Multiple
validate elements for a single namespace
<rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0"> <namespace ns="http://www.w3.org/1999/xhtml"> <validate schema="xhtml.rng"/> <validate schema="wcag.sch"/> </namespace> </rules>
Rules are represented by the
anyNamespace elements and they consist of a condition and a list of actions. The rule is triggered whenever an element or attribute section matches the condition. In this case, each action in the list is executed on such section. As part of the NVDL, there are several different types of actions. One of them is for example the
validate action, which was mentioned earlier.
Rule conditions are defined using a namespace URI. The
namespace rule is applicable to any section, whose namespace matches the value of the rule's
ns attribute. The
anyNamespace rule works differently. As it matches every section, which doesn't have any applicable
namespace rule defined, it basically specifies the default behaviour. In Example 3, “Predefined schemas” we have seen the use of
anyNamespace rule in conjunction with the
reject action. When we use
allow instead, we change the default strict validation behaviour so that NVDL is going to validate laxly. This means, any section in an arbitrary namaspace with no matching rule is automatically allowed, even without being validated.
Sometimes, it's desirable to match several namespaces, whose URI matches a special pattern. For that reason, NVDL introduces wild-cards. The
wildCard attribute at the
namespace rule sets a special symbol (one character) that stands for one or more unspecified characters. If
wildCard is not present, the default wild-card symbol is a star
*. Wild-cards are useful for example in cases, when the same schema (or behaviour) applies to several versions or mutations of a languages and those versions differ slightly in the namespace URI.
Rules can match elements, attributes or both. Implicitly they match elements. This means, rules apply by default to element sections only. But this can be altered using the
match attribute at both,
anyNamespace rules. The
match attribute accepts "element", "attribute" or "element attribute" values.
In the previous examples we considered simple global rules applicable for all element and attribute sections in the entire document. With modes we gain much more flexibility. We may specify different rules, which are applicable in different context of the document.
The previous examples, where
anyNamespace elements were directly contained in the root
rules element, may be understood as NVDL scripts with just one global mode. When using multiple modes, the root element doesn't contain different rules directly. Instead it contains several child
mode elements, which than contain the different rules. In this scenario a
startMode attribute has to be present at the
rules element, to specify the initial mode.
We can transit form one mode to another, every time an action is executed on a particular section. There are basically two possibilities for specifying transitions between modes. Every action (e.g.
validate action) can have a
useMode attribute, which references a different mode using it's unique name. Names are assigned to modes using their
name attribute. Named modes are always childs of the
rules element. Another approach is nesting modes directly into actions. There is one important difference between those two approaches. Where named modes can be referenced by multiple actions, transition to a nested modes is only possible by executing its parent action. If no
useMode or any nested
mode is defined for an action, the action transits by default back to the same mode.
Let's look into a simple mode example. Imagine, we like to ensure our instances have SVG sections nested into XHTML, not other way around. In the initial mode, we allow occurrence of the XHTML namespace sections only. For any nested section, we transit to the
nested mode, where just the SVG namespace is allowed. In our example, we prevent the SVG fragments from containing any other foreign namespace fragments.
Example 5. Using modes
<rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0" startMode="init"> <mode name="init"> <namespace ns="http://www.w3.org/1999/xhtml"> <validate schema="xhtml20.rng" useMode="nested"/> </namespace> </mode> <mode name="nested"> <namespace ns="http://www.w3.org/2000/svg"/"> <validate schema="svg.sch"/> </namespace> </mode> </rules>
Let's make the same example more readable using nested modes. The meaning of the two scripts is equivalent, they just differ in the syntax.
Example 6. Using nested modes
<rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0" startMode="init"> <mode name="init"> <namespace ns="http://www.w3.org/1999/xhtml"> <validate schema="xhtml20.rng"> <mode> <namespace ns="http://www.w3.org/2000/svg"/"> <validate schema="svg.sch"/> </namespace> </mode> </validate> </namespace> </mode> </rules>
As we see, the
mode element can appear as a child of the
rules element or it can be nested inside actions. But there is one more place where modes can appear. They can be included inside other modes. In such case the NVDL dispatcher takes care of merging those two modes together into one mode. If a child mode has rules with the same condition as the parent mode, those rules are overridden by the parent rules.
Example 7. Mode inclusion
<rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0" startMode="init"> <mode name="init"> <mode> <anyNamespace> <allow/> </anyNamespace> </mode> <anyNamespace> <reject/> </anynamespace> </mode> </rules>
The previous NVDL script is equivalent to following one. Both scripts simply reject any instance.
Example 8. Merged mode
<rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0" startMode="init"> <mode name="init"> <anyNamespace> <reject/> </anynamespace> </mode> </rules>
Mode inclusion is especially useful, when including external modes using XInclude. With XInclude we can only include well-formed XML fragments. This means, we cannot include different rules directly, instead they need to have a root element. It's straightforward to encapsulated rules intended for inclusion into a parent
Mode inclusion is a nice way to achieve mode inheritance. Image, in Example 6, “Using nested modes” we like to allow both XHTML as well as SVG fragments to have nested RDF sections, to provide some meta-data. The next example shows the solution.
Example 9. Mode inheritance
<rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0" xmlns:xi="http://www.w3.org/2001/XInclude" startMode="init"> <mode name="init"> <namespace ns="http://www.w3.org/1999/xhtml"> <validate schema="xhtml20.rng"> <mode> <xi:include href="rdfmode.xml"/> <namespace ns="http://www.w3.org/2000/svg"/"> <validate schema="svg.sch"> <mode> <xi:include href="rdfmode.xml"/> </mode> </validate> </namespace> </mode> </validate> </namespace> </mode> </rules> rdfmode.xml <mode> <namespace ns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <validate schema="rdf.rng"/> </namespace> </mode>
Till now, we talked about actions in general, but only the
validate action was introduced and explained in detail. Several other different actions and their meaning is explained in the following text. One of them is the
attach action, which allows re-attaching child sections back to their parent, to make them possibly validate together as one fragment.
Attaching sections results in XML fragments with multiple namespaces. At first, it may look like going against the principal of NVDL, as NVDL is all about separating different namespace fragments. But as we have already discussed, it's not just NVDL which can handle compound document validation. Modern validation languages, such as Relax NG or XML Schema, can cope very well with compound documents. In case we have a nicely designed compound document schema written in those languages, it does make a good sense to use it. For doing so, we need to use
Imagine, we have a very well designed schema for XHTML in Relax NG, which defines abstract classes for inline and block elements. In this case, it's very easy to allow nested SVG fragments to occur just in the context of an inline or block element, without the need of listing all such elements explicitly. This would be necessary in case we like to use purely NVDL to describe the context in which can SVG freely occur. In this case, it makes sense to use the Relax NG compound document schema and use the
attach action inside the SVG namespace rule, as shown in the following example.
Example 10. Attaching SVG sections back to XHTML
<rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0" startMode="xhtml"> <mode name="xhtml"> <namespace ns="http://www.w3.org/1999/xhtml"> <validate schema="xhtml+svg.rng" useMode="svg"/> </namespace> </mode> <mode name="svg"> <namespace ns="http://www.w3.org/2000/svg"> <attach/> </namespace> </mode> </rules>
When combining different XML vocabularies, we usually embed foreing elements or attributes as childs into different contexts of the parent language. But sometimes, the language we like to validate is wrapped by a different language. This can occur for example when we use XML based scripting or templating languages. Imagine, we like to validate XHTML embedded for example into Java Server Pages (JSP) or into XSLT.
This is exactly the case, where we use the NVDL
unwrap action. With
unwrap, we leave the current section completely out. This means, all the child sections of the current section are attached directly to its parent. With this approach, we can completely filter out (unwrap) the templating language and validate just it's content. A very common example of using templating languages is XHTML styling of domain specific XML languages using XSTL stylesheets. Consider the following example.
Example 11. Validating XHTML wrapped by XSLT
<rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0" startMode="root"> <mode name="root"> <namespace ns="http://www.w3.org/1999/xhtml"> <validate schema="xhtml.rng" useMode="xslt"/> </namespace> <namespace ns="http://www.w3.org/1999/XSL/Transform"> <allow/> </namespace> </mode> <mode name="xslt"> <namespace ns="http://www.w3.org/1999/XSL/Transform"> <unwrap/> </namespace> <namespace ns="http://www.w3.org/1999/xhtml"> <attach/> </namespace> </mode> </rules>
This NVDL schema simply filters every occurrence of XSLT out. The pure XHTML is than send for validation against the
unwrap concept works nicely in simple situations, but with complex scenarios we may face some troubles. With templating languages it is hard to predict the execution flow. Imagine just a simple if-else condition which is a must-have feature of any templating language. When we
unwrap the templating language mark-up, both normally disjoint possibilities are attached into the validated fragment. This can cause problems, when we use schema languages to control number of occurrences for elements. Especially when just one occurrence is allowed.
Example 12. XHTML wrapped by XSLT
The following XSLT stylesheet produces XHTML documents with either the "Book" title, the "Magazine" title or simply an "Item" title, depending on the root element of the input document. Every time, just one single condition is met, so the resulting XHTML document always has just one
title element. A different situation occurs when we
unwrap the templating language. See the result in the next example.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/1999/xhtml" version="1.0"> <xsl:template match="/"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <xsl:choose> <xsl:when test="book"> <title>Book</title> </xsl:when> <xsl:when test="magazine"> <title>Magazine</title> </xsl:when> <xsl:otherwise> <title>Item</title> </xsl:otherwise> </xsl:choose> </head> <body> <p> .. </p> </body> </html> </xsl:template> </xsl:stylesheet>
Example 13. XHTML fragment after unwrapping XSLT
<html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Book</title> <title>Magazine</title> <title>Item</title> </head> <body> <p> .. </p> </body> </html>
Even the XSLT stylesheet always produces a valid XHTML, after
unwrap is used, we get an invalid validation fragment. According to the XHTML specification, just one
title element is allowed inside
To solve this issue, we can have a modified version of the XHTML schema which allows any number of elements in any context. In simple scenarios, we can also fine-tune unwrapping using the NVDL
context element, which will be discussed later.
attach action attaches the whole section to its parent,
attachPlaceholder attaches just a special
placeholder element instead of the whole section. The
placeholder element is assigned to the
http://purl.oclc.org/dsdl/nvdl/ns/instance/1.0 namespace and it has two attributes. The
ns attribute specifies the section's namespace URI and the
localName attribute contains the local name of the section's root element.
placeholder element can than be defined in the subschema, so we can check the context of different foreign fragments, without actually validating them in this particular subschema.
The three recently discussed actions (
attachPlaceholder) are so called “no result actions”, as they don't directly result in validation of anything. For obvious reason, those actions are mutually exclusive and thus just one of them can be present in the same rule.
cancelNestedAction element may occur inside rules on the same place, where usually actions occur, but it is not an action itself. It basically prevents any action to be executed. When
cancelNestedAction is present, neither other
cancelNestedAction element, nor any actions may occur in that particular rule.
cancelNestedAction is useful in particular, when we have a general rule defined, but we like to define an exception for some namespace. The following example illustrates such approach. Any namespace is attached to the parent section, but the SVG section are not, because we don't want them to be validate with the same subschema.
Example 14. Do not attach SVG fragments to XHTML
<rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0" startMode="root"> <mode name="root"> <namespace ns="http://www.w3.org/1999/xhtml"> <validate schema="xhtml.rng" useMode="attach"/> </namespace> </mode> <mode name="attach"> <namespace ns="http://www.w3.org/2000/svg"> <cancelNestedAction/> </namespace> <anyNamespace> <attach/> </namespace> </mode> </rules>
There is one more note to make about
cancelNestedAction. If we use mode inclusion,
cancelNestedAction rules are left out from included child modes.
Modes offer a mechanism to change the NVDL dispatcher behaviour during the validation process depending on the context of the currently processed section. In the previous examples, the initial mode for a section was always determined using the action's default transition. In this case, we are basically telling the NVDL dispatcher, how to treat a specific section depending on the namespace of it's parent section. If we look back at Example 10, “Attaching SVG sections back to XHTML”, this NVDL schema may be interpreted as follows: Do attach SVG sections to their parent, whenever they are in the context of an XHTML section and reject every SVG sections in any other context.
Expressing context as the parent section namespace is likely to be sufficient in most cases. But sometimes it's necessary to specify context more precisely, for example as an element path within the parent section.
In NVDL, any action can have several
context child elements with a
path attribute and a mode transition. The transition is defined analogically to the default transitions at actions. We can have a
useMode attribute or a nested
path attribute defines the context within an element section. The syntax is inspired by XPath but it's much simpler. As we specify context within one element section, we don't need to use any namespace prefixes. There also aren't any axis, functions and other advanced XPath constructs. The
path attribute value is basically a list of one or more choices separated by the "|" delimiter. Each choice is than a list of local element names splitted with a path separator "/". If preceded by a path separator the choice is considered to be an absolute path (a path from the root element) otherwise the path is relative.
Every time an action is executed for an element section, the NVDL dispatcher first goes through the list of it's
context childs. If any of the
path expressions matches the current section's context path, the transition at this particular
context is invoked. Otherwise the action's default transition is used. Section's context path begins at its parent section root element and includes every local element name unless it reaches the parent element of the section.
context element is in particular useful when we like to allow a embedded vocabulary to occur just in a certain context of the parent vocabulary. Let's consider we have a XHTML document which can contain some meta-data expressed using RDF. As meta-date are not intended to be rendered, we don't want them to occur in the document's
body. Actually the best place to put meta-data is apparently in the
head section. The following example shows, how to express such requirement using NVDL.
Example 15. Allow RDF in the
head context only
RDF fragments are allowed just inside the
head element. According to the XHTML schema, there is just one
head element, thus we can use the relative path to identify it. In this case, RDF is allowed inside any
head across the document, but an occurrence of
head in a wrong place would not pass the validation against the
xhtml.rng schema. We could also use an absolute path
/html/head to define the context. Than for NVDL the RDF fragment can just occur in that particular
<rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0" startMode="root"> <mode name="root"> <namespace ns="http://www.w3.org/1999/xhtml"> <validate schema="xhtml.rng"> <context path="head" useMode="rdf"/> </validate> </namespace> </mode> <mode name="rdf"> <namespace ns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <validate schema="rdf.rng" useMode="attach"/> </namespace> </mode> <mode name="attach"> <anyNamespace> <attach/> </anyNamespace> </mode> </rules>
In the previous text we did not make a difference between attribute and element sections. As a default behaviour, NVDL attaches attribute sections back to their element sections, but using the
match attribute at rules, we can handle attribute sections specifically. Each attribute section consists of attributes belonging to the same namespace with the same parent element. Using the
match attribute, we can invoke a
validate action on an attribute section. This means, a standalone attribute set it send to be validated against the specified subschema.
A set of attributes is not considered to be a well-formed XML. That's why NVDL creates a meaningless element, called the
virtualElement based in the
http://purl.oclc.org/dsdl/nvdl/ns/instance/1.0 namespace, to attach the attributes to it before sending them for validation.
Modern validation languages e. g. Relax NG or XML Schema have a very expressive constructs to constraint attribute values. Specific handling of attribute sections allows us to define such constrains in a separated subschema. A nice example are attributes from the XML default
http://www.w3.org/XML/1998/namespace namespace (for more details refer to [NS]). It's undoubtedly convenient to have a specific subschema for those attributes, as they can occur in any compound language. We can just simply reuse such subschema every time we want to allow them, without the need to have them defined again and again in every specific subschema.
NVDL expects, attribute-only subschemas to have no supplemental elements defined. To make validation using such subschema possible, NVDL first performs a schema language specific transformation on it, to allow attributes to be attached to the
virtualElement. The NVDL specification shows as an example the Relax NG specific transformation. The schema is wrapped by a
<element><anyName/>...content of the original schema...</element>
anyName allows any element to contain the defined attributes. For different schema languages, NVDL implementations should introduce different transformation rules.
Example 16. Validation of default XML attributes
The NVDL schema in the example validates the values of the default XML attributes. The attributes aren't attached back to their elements, thus we don't need to define them in the
<rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0"> <namespace ns="... some language namespace ..."> <validate schema="some-language-schema.rng"/> </namespace> <namespace ns="http://www.w3.org/XML/1998/namespace" match="attributes"> <validate schema="xmlattr.rng"/> </namespace> </rules>
Next we see a fragment of the
xmlattr.rng subschema. The
xml:space attribute value is constrained to
<group datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes" xmlns="http://relaxng.org/ns/structure/1.0"> <optional> ... <attribute name="xml:space"> <choice> <value>preserve</value> <value>default</value> </choice> </attribute> ... </optional> </group>
The NVDL dispatcher transforms the
xmlattr.rng subschema before validation into the following form.
<element datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes" xmlns="http://relaxng.org/ns/structure/1.0"> <anyName/> <group> <optional> ... <attribute name="xml:space"> <choice> <value>preserve</value> <value>default</value> </choice> </attribute> ... </optional> </group> </element>
Sometimes we cannot rely on namespaces to decompose our documents into sections. Some languages don't use namespaces (for example DocBook 4.2). In this case we need some other mechanism to extract sections. NVDL offers the
trigger construct to achieve that.
trigger elements can occur as childs of the
rules element. There are two obligatory attributes on
ns attribute specifies the namespace and the
nameList attribute is a space separated list of element local names. A trigger is fired for any element whose namespace exactly matches the one specified and whose local name is contained in the
nameList. Also the element shall not be a root of the current element section and it's parent shall not be located by the same
NVDL allows schema annotation using the
message element. Every action (
validate) can have a
message element or attribute. This is the right place to attach comments or hints to make the NVDL schema more understandable to humans.
 In NVDL we have element sections and attribute sections. An element section is defined as an element such that a single namespace applies to itself and to all its descendant elements. An attribute section is basically a non-empty set of attributes having the same namespace.
 The term instance is used for the input document of the validation process
 This applies e. g. to DTD or Relax NG in the compact syntax.
 When embedding subschemas directly into NVDL, such schema becomes effectively a compound document.
 An NVDL dispatcher is an application doing validation and dispatching in compliance with the NVDL specification.
 Not every schema language combination is automatically useful. Some languages are more suitable to be combined than others. In general, there are two major categories of schema languages: grammar-oriented and rule-based. Usually it makes more sense to combine languages from different categories. For more information refer to [HTML-VAL].
 Web Content Accessibility Guidelines is a set of recommendations aimed to make Web content accessible to people with all sorts of disabilities. For more information refer to [WCAG].