October 2017 : Online course in ASP.NET MVC / Core. Conducted by Bipin Joshi. Read more...
Registration for October 2017 batch of ASP.NET MVC / Core online course has already started. Conducted by Bipin Joshi. Book your seat today ! Click here for more details.

Introduction to XML Namespaces & Schema

Introduction

Traditionally XML document structures are defined using DTD - Document Type Definition. However DTDs suffer for many limitations like :

  • They are written using a markup called Extended Backus-Naur Form(EBNF) which differs from normal XML. Hence, you must spend some time in learning the new syntax

  • DTDs provide very limited data types

  • DTDs are not extensible

  • DTDs do not provide support for namespaces

These limitations called for a new flexible and extensible specifications. XML Schema is such a specification. It is used to define the structure of XML document. In other words it is a kind of metadata ( data about data). The Schema itself is XML document.

Following are the advantages of XML Schemas :

  • They are written in XML itself

  • They support namespaces i.e. your XML document can be based on one or more schemas

  • Being XML they are easy to write

  • They provide various data types which DTD lack

  • They can be inherited within your XML document

Currently, IE5+ is the only browser providing support to a subset of entire XML Schema specifications (called XML-DR which stands for XML Data Reduced). Note that this is Microsoft's implementation and not a W3C standard. All the examples presented require IE5+ installed.

What are Namespaces?

Before going into the details of XML schema one should know the meaning of the term Namespace. Many XML books present this concept in rather abstract manner. Here will try to understand the concept using some analogous thing in real programming world.

Consider that you are developing an application which makes use of two third party components, say Component1 and Coponent2( Microsoft gang can assume that the components are VB ActiveX DLLs and Sun gang can assume them to be Java Class Packages :-). Both the components provide a class with name "MyClass". Now, you declared a variable x like this :

Dim x as MyClass ---------- in VB
MyClass x=new MyClass() ------------- in Java

Got the problem? How the compiler (VB or Java) will come to know that which MyClass instance to create since class with same name exist in both the components? To avoid the problem you will modify the declaration like this :

Dim x as Component1.MyClass ---------- in VB
Component1.MyClass x=new Component1.MyClass() ------------- in Java

Now the compiler will understand clearly which instance to create because you are using "fully qualified" class names to avoid "name collisions".

The same thing applies go XML documents as well. What if your XML document is based on multiple schemas and same element name exists in them. How XML parser will come to know about your intention? This where XML Namespaces come into picture.

Namespace is a collection of names which are used in XML documents as element types and attribute names. The collection is uniquely identified by a URI (Uniform Resource Identifier).

Thus in our example Component1 represents one namespace which provides MyClass and Component2 represents another namespace providing its own MyClass.

Note : The analogy  shown above is just to help you understand the general concept of Namespaces.

Using Namespaces

Now, let us see how to use namespaces in our XML documents. To indicate that an element belongs to a namespace you write something like this :

<myelement xmlns="my_namespace_uri">

</myelement>

Above statements tell XML parser that the element and all its children belong to namespace my_namespace_uri. Note the use of xmlns attribute. As you might have noticed, assigning your namespace URI directly to xmlns attribute makes that URI default for all the child elements.

But what if I am using two namespaces? In such cases you modify the declaration as follows :

<myelement xmlns:prefix1="my_namespace1_uri" xmlns:prefix2=""my_namespace2_uri">
<prefix1:child1>some data</prefix1:child1>
<prefix2:child2>some data</prefix2:child2>
<prefix2:child1>some data</prefix2:child1>
</myelement>

Now, you are identifying each child explicitly using its fully qualified tag name.

In many cases you will find that though you are using multiple namespaces, only one is being used most of the times. In such cases you can make that namespace default as follows :

<myelement xmlns="my_namespace1_uri"   xmlns:prefix1=""my_namespace2_uri">
<child1>I am from default namespacesome data</child1>
<prefix1:child2>I am from other namespace</prefix1:child2>
</myelement>

Back to XML Schema

Let us consider following XML document :

<?xml version="1.0"?>

<books>
	<book isbn="100">
		<title>Visual Basic</title>
		<author>Author1</author>
	</book>
	<book isbn="101">
		<title>Java</title>
		<author>Author2</author>
	</book>
	<book isbn="102">
		<title>Linux</title>
		<author>Author3</author>
	</book>
</books>      

We want to enforce structural rules such that :

  • All the books element (which is also the root element) contains instances of book element only

  • All the book elements should contain attribute isbn

  • All the book elements should contain sub elements only (no text data)

  • The attribute data type must be integer

  • The child elements of book i.e. title and author must appear in the same sequence i.e. title first then author

Here is a schema which does that. The schema is stored as booksschema.xml

<?xml version="1.0" ?>

<Schema name="booksschema"
xmlns="urn:schemas-microsoft-com:xml-data"
xmlns:dt="urn:schemas-microsoft-com:datatypes" >

<AttributeType name="isbn" dt:type="int" required="yes"/>

<ElementType name="books">
	<element type="book" />
</ElementType>

<ElementType name="book" content="eltOnly" order="seq">
	<attribute type="isbn" />
	<element type="title" />
	<element type="author" />
</ElementType>

<ElementType name="title"></ElementType>
<ElementType name="author"></ElementType>

</Schema>      

Let us dissect it.

  • <Schema name="booksschema"
    xmlns="urn:schemas-microsoft-com:xml-data"
    xmlns:dt="urn:schemas-microsoft-com:datatypes" >
    This line specifies the schema name and Namespaces used. Here, the default namespace is "urn:schemas-microsoft-com:xml-data". Note that this namespace will work only in IE5+.Every scheme should start with element Schema and must have a name.

  • <AttributeType name="isbn" dt:type="int" required="yes"/>
    This line declares an attribute isbn with data type of integer. The isbn attribute is specified to be mandatory by "required" attribute.

  • <ElementType name="books">
    <element type="book" />
    </ElementType>
    Above block specifies that books element should contain one or more instances of book element

  • <ElementType name="book" content="eltOnly" order="seq">
    <attribute type="isbn" />
    <element type="title" />
    <element type="author" />
    </ElementType>
    Above block states that book element should contain sub elements only and not PCDATA ( by content attribute). It also specifies that the book element should have isbn attribute and 2 sub elements - title and author. Further using order attribute we enforce that the sub elements should appear in the same sequence

  • <ElementType name="title"></ElementType>
    <ElementType name="author"></ElementType>
    Finally, declarations of title and author elements appear. Here, they do not contain sub elements but you can extend them as per your own requirement.

Now we will link this schema with our XML document.

<?xml version="1.0"?>

<books xmlns="x-schema:d:\bipin\xml\booksschema.xml">      

We modified the definition of root element to include the namespace. The syntax should be as shown only the path will change as per your XML file.

You can test that the document is being validated as per schema by using IE validate XML context menu option. You should have latest MSXML parser installed to enable this feature.


Bipin Joshi is a software consultant, an author and a yoga mentor having 22+ years of experience in software development. He also conducts online courses in ASP.NET MVC / Core and Design Patterns. He is a published author and has authored or co-authored books for Apress and Wrox press. Having embraced the Yoga way of life he also teaches Meditation and Mindfulness to interested individuals. To know more about him click here.

Get connected : Twitter  Facebook  Google+  LinkedIn

Posted On : 07 January 2001


Tags : XML