October 2017 : Online course in ASP.NET MVC / Core. Conducted by Bipin Joshi. Read more...
Registration for October 2017 batch of ASP.NET MVC / Core online course has already started. Conducted by Bipin Joshi. Book your seat today ! Click here for more details.

Introduction To XML

What is XML?

  • XML stands for eXtensible Markup Language
  • All of you must have used HTML tags and elements. HTML provides a fixed set of elements and we are bound to use only those elements.
  • XML on the other hand allows us to create our own tags and elements
  • Since we create our own tags they can be descriptive making the document more readable
  • HTML is designed to display your data in a web browser
  • XML is designed to represent your data rather than its display. The display of data is taken care by other means like CSS or XSL or custom applications
  • HTML page or data can be displayed only on web browsers
  • XML data can be used by any application including web browser which understands how to interpret the data
  • Since the data is separated from display, any change in data can be easily incorporated without touching the display mechanism
  • XML originated from SGML – Standard Generalized Markup Language – which provides specifications to create markup languages
  • HTML is also an example of markup language
  • SGML and XML are controlled by World Wide Web Consortium(W3C)
  • XML made its first public appearance in 1996
  • The first official specification of XML was published in 1998

A Simple XML document

Consider following file named myfirstxml.xml which represents a simple XML document. Try to compare it with HTML. XML files are just plain text files having .xml extention

Myfirstxml.xml

<? Xml version="1.0" ?>

<!DOCTYPE mylibrary SYSTEM "mylibrary.dtd">

<catalog>

<book book_no="100">

<author>Author 1</author>

<title>Title 1 </title>

<photo src="photo1.gif" />

</book>

<book book_no="200">

<author>Author 2</author>

<title> Title 2</title>

<photo src="photo2.gif" />

</book>

</catalog>

Common XML Terms

  • Processing instructions

<? Xml version="1.0" ?>

They are special instructions and enclosed in a pair of <? And ?>

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

  • Version : specifies xml version used for the document. Currently it should be 1.0
  • Encoding : Optional argument. Specifies character code set used
  • Standalone : Optional argument. Specifies weather the document depends on any other external document or markup. If your document is based on any DTD then set it to "no" .
  • Document Type Declaration

<!DOCTYPE mylibrary SYSTEM "mylibrary.dtd">

If your XML document is based on some DTD you must declare that DTD name here. The document name "mylibrary" is arbitery and need not be the same as the DTD file name

  • Tag

<author>

Tags are identifiers of a particular instance of data. They are enclosed between a pair of < and >. Generally a set of start tag (<--->) and end tag(<--- />) form an element

  • Element

<author></author>

<photo src="photo1.gif" />

An element is a set of tags. Element generally comprise of a set of start tag and end tag. However some times they can be represented in an alternative way like shown in the second example. Here instead of using a pair of <photo> and </photo> we have used a shortcut <photo --- />

  • Attribute

Book_no

They provide some extra information about an element

  • Root

Catalog

Every XML document must have an element at the top of hierarchy called the root element

  • Tree

<catalog>

----

</catalog>

An XML document can be viewed as an inverted tree with root element at the top and all other elements at various branch levels

  • Node

Catalog

Each point which starts a branch or is at a leaf level is called as a Node

  • Parent

Catalog

Parent elements are the elements having sub elements

  • Child

Book

Child elements are the elements beneath parent elements

Basic Rules of XML Grammar

  • XML is case sensitive. So, all the tag names – start and end - must appear in the same case
    e.g.
    <mytag> is not same as <MYTAG> or <MyTag>
  • All start tags must have corresponding end tags
    e.g
    <mytag>Some Data
    <my_other_tag>Some other data</my_other_tag>
    Above XML is wrong as <mytag> do not have corresponding end tag </mytag>
  • Empty elements must be written in abbreviated form
    e.g.
    <photo src="mypicture.gif" />
  • All tags must be nested properly
    e.g.
    <mytag>some data
    <my_other_tag>Some other data
    </mytag>
    </my_other_data>
    Above XML is invalid because the nesting of tags is incorrect. The correct nesting would be
    <mytag>some data
    <my_other_tag>Some other data
    </my_other_data>
    </mytag>
  • All attribute values must be enclosed in quotation marks
    e.g.
    <book book_no=100> is invalid. Valid usage would be
    <book book_no="100">

What is a DTD ?

  • DTD stands for Document Type Declaration
  • It defines the structure or rules for an XML document which is based on the DTD
  • DTD is written in a special format called Extended Backus-Naur Form(EBNF)

Bipin Joshi is a software consultant, an author and a yoga mentor having 22+ years of experience in software development. He also conducts online courses in ASP.NET MVC / Core and Design Patterns. He is a published author and has authored or co-authored books for Apress and Wrox press. Having embraced the Yoga way of life he also teaches Meditation and Mindfulness to interested individuals. To know more about him click here.

Get connected : Twitter  Facebook  Google+  LinkedIn

Posted On : 18 December 2000


Tags : XML