What is XML? A Practical Guide for Developers

If you've worked with legacy enterprise APIs, consumed RSS feeds, opened a .docx file, or drawn an SVG, you've used XML — even if you didn't know it. XML powers more of the software world than most developers realise. It's older than JSON, more verbose, and more complex — but also more capable in specific contexts. This is a practical guide to what XML actually is, how it works, and when it still makes sense to use it.

XML stands for eXtensible Markup Language. The W3C published the XML 1.0 specification in 1998, and it became the dominant data exchange format of the early web era. The "extensible" part is key: unlike HTML which has a fixed set of tags, XML lets you define your own tags to describe any data structure you need.

What XML Looks Like

Here's a complete XML document representing a product from an e-commerce API. This covers all the core syntax features you'll encounter:

xml

<?xml version="1.0" encoding="UTF-8"?>
<product id="SKU-8821" inStock="true">
  <name>Wireless Noise-Cancelling Headphones</name>
  <brand>SoundCore</brand>
  <price currency="USD">149.99</price>
  <categories>
    <category>Electronics</category>
    <category>Audio</category>
    <category>Accessories</category>
  </categories>
  <specs>
    <spec name="battery">30 hours</spec>
    <spec name="connectivity">Bluetooth 5.2</spec>
    <spec name="weight">250g</spec>
  </specs>
  <description>
    Premium wireless headphones with active noise cancellation,
    30-hour battery life, and foldable design.
  </description>
</product>

Let's break down the key parts. The first line is the XML declaration — it tells parsers this is XML 1.0 encoded in UTF-8. The <product> tag is the root element (every XML document has exactly one). The id and inStock parts on that same tag are attributes. Everything inside the opening and closing tags is either child elements, attributes, or text content.

Elements vs Attributes — A Genuine Design Decision

One thing that trips up new XML authors: you can represent the same data as an element or as an attribute. Both of these are valid XML for the same piece of data:

xml

<!-- As an attribute -->
<price currency="USD">149.99</price>

<!-- As a child element -->
<price>
  <amount>149.99</amount>
  <currency>USD</currency>
</price>

The conventional wisdom: use attributes for metadata about the element (identifiers, flags, units), and use child elements for the actual data content — especially if that content might become complex, repeated, or need its own attributes in the future. Attributes can only hold plain text; child elements can hold any XML structure.

Well-Formedness vs Validity — Two Different Standards

XML has two levels of correctness that are often confused:

Well-formed XML follows the basic syntax rules: one root element, all tags properly closed, attributes quoted, no illegal characters. Any XML parser can check this.
Valid XML conforms to a specific schema — either a DTD (Document Type Definition) or an XML Schema (XSD). Validity checking requires both the document and the schema. You can have well-formed XML that's not valid against a particular schema.

In practice: most enterprise XML integrations care about validity, not just well-formedness. If you're building a SOAP integration or exchanging HL7 healthcare records, the schema defines exactly what elements are required, what types they hold, and what values are allowed. Use the XML Validator to check well-formedness quickly.

Real-World Places XML Appears

SOAP web services. The older half of enterprise software still runs on SOAP — banking APIs, ERP systems, payment gateways. Every SOAP message is a well-defined XML document with an Envelope, Header, and Body.
RSS and Atom feeds. Every podcast feed, news site feed, and YouTube channel subscription is an XML document. RSS 2.0 and Atom are both XML-based formats that have existed since the early 2000s.
SVG images. Scalable Vector Graphics is XML. When you open a .svg file in a text editor, you're reading XML. This is why you can style SVGs with CSS and manipulate them with JavaScript.
Office documents. A .docx, .xlsx, or .pptx file is a ZIP archive containing XML files. Office Open XML is how Microsoft stores all modern Office document formats.
Android layouts. Android UI layouts are defined in XML files. If you've done Android development, you've written a lot of XML.
Maven and pom.xml. Java's Maven build system uses a pom.xml file to define project dependencies and build configuration — familiar to any Java developer.

Parsing XML in JavaScript — DOMParser

In the browser, you can parse XML strings using the built-in DOMParser API. It works the same way as parsing HTML:

const xmlString = `<?xml version="1.0"?>
<product id="SKU-8821">
  <name>Wireless Headphones</name>
  <price currency="USD">149.99</price>
  <categories>
    <category>Electronics</category>
    <category>Audio</category>
  </categories>
</product>`;

const parser = new DOMParser();
const doc = parser.parseFromString(xmlString, 'application/xml');

// Check for parse errors
const parseError = doc.querySelector('parsererror');
if (parseError) {
  console.error('XML parse error:', parseError.textContent);
} else {
  const name = doc.querySelector('name').textContent;
  const currency = doc.querySelector('price').getAttribute('currency');
  const categories = [...doc.querySelectorAll('category')].map(el => el.textContent);

  console.log(name);       // Wireless Headphones
  console.log(currency);   // USD
  console.log(categories); // ['Electronics', 'Audio']
}

Parsing XML in Python — ElementTree

Python's standard library includes xml.etree.ElementTree — no third-party packages needed for basic XML parsing:

python

import xml.etree.ElementTree as ET

xml_string = """<?xml version="1.0"?>
<product id="SKU-8821">
  <name>Wireless Headphones</name>
  <price currency="USD">149.99</price>
  <categories>
    <category>Electronics</category>
    <category>Audio</category>
  </categories>
</product>"""

root = ET.fromstring(xml_string)

name = root.find('name').text
currency = root.find('price').get('currency')
categories = [el.text for el in root.findall('categories/category')]

print(name)        # Wireless Headphones
print(currency)    # USD
print(categories)  # ['Electronics', 'Audio']

The Gotchas That Bite XML Authors

Entity encoding. Five characters must be escaped in XML text and attribute values: & → &, < → <, > → >, " → ", ' → '. Forgetting to escape an ampersand in a URL breaks the entire document.
Namespaces. Namespaced XML looks like <soap:Envelope xmlns:soap="...">. Queries without namespace awareness will fail to find elements. This is one of the most common XML parsing bugs.
Whitespace sensitivity. Whitespace between elements is technically meaningful in XML (unlike HTML). Most parsers handle it sensibly, but it can cause surprises when comparing documents.
Character encoding. The XML declaration should match the actual file encoding. A UTF-8 file declared as ISO-8859-1 will parse incorrectly for any non-ASCII characters.
No native array type. XML doesn't have arrays. You represent lists by repeating elements, which means parsers return NodeLists rather than JavaScript arrays — always convert with Array.from() or spread.

XML Tools Worth Knowing

Working with XML regularly? These tools will save you time: XML Formatter to pretty-print minified XML responses, XML Validator to check well-formedness, XML to JSON to convert XML responses to JSON for easier handling, and XML XPath to test XPath queries against a document.

Wrapping Up

XML is verbose and complex compared to JSON, but it's not a legacy footnote — it's the foundation of SVG, Office documents, RSS feeds, SOAP services, and Android UIs. Understanding the element/attribute distinction, the difference between well-formed and valid XML, and the common gotchas around entity encoding and namespaces will serve you well whenever XML shows up in your work. And it will show up.

← All XML articles Browse all categories →