If you've worked with legacy enterprise APIs, consumed RSS feeds, opened a .docx file,
or drawn an SVG, you've used XML — even if you didn't know it. XML powers more of the software world
than most developers realise. It's older than JSON, more verbose, and more complex — but also more
capable in specific contexts. This is a practical guide to what XML actually is, how it works,
and when it still makes sense to use it.
XML stands for eXtensible Markup Language. The W3C published the XML 1.0 specification in 1998, and it became the dominant data exchange format of the early web era. The "extensible" part is key: unlike HTML which has a fixed set of tags, XML lets you define your own tags to describe any data structure you need.
What XML Looks Like
Here's a complete XML document representing a product from an e-commerce API. This covers all the core syntax features you'll encounter:
<?xml version="1.0" encoding="UTF-8"?>
<product id="SKU-8821" inStock="true">
<name>Wireless Noise-Cancelling Headphones</name>
<brand>SoundCore</brand>
<price currency="USD">149.99</price>
<categories>
<category>Electronics</category>
<category>Audio</category>
<category>Accessories</category>
</categories>
<specs>
<spec name="battery">30 hours</spec>
<spec name="connectivity">Bluetooth 5.2</spec>
<spec name="weight">250g</spec>
</specs>
<description>
Premium wireless headphones with active noise cancellation,
30-hour battery life, and foldable design.
</description>
</product>Let's break down the key parts. The first line is the XML declaration — it tells parsers
this is XML 1.0 encoded in UTF-8. The <product> tag is the root element (every
XML document has exactly one). The id and inStock parts on that same tag
are attributes. Everything inside the opening and closing tags is either child elements,
attributes, or text content.
Elements vs Attributes — A Genuine Design Decision
One thing that trips up new XML authors: you can represent the same data as an element or as an attribute. Both of these are valid XML for the same piece of data:
<!-- As an attribute -->
<price currency="USD">149.99</price>
<!-- As a child element -->
<price>
<amount>149.99</amount>
<currency>USD</currency>
</price>The conventional wisdom: use attributes for metadata about the element (identifiers, flags, units), and use child elements for the actual data content — especially if that content might become complex, repeated, or need its own attributes in the future. Attributes can only hold plain text; child elements can hold any XML structure.
Well-Formedness vs Validity — Two Different Standards
XML has two levels of correctness that are often confused:
- Well-formed XML follows the basic syntax rules: one root element, all tags properly closed, attributes quoted, no illegal characters. Any XML parser can check this.
- Valid XML conforms to a specific schema — either a DTD (Document Type Definition) or an XML Schema (XSD). Validity checking requires both the document and the schema. You can have well-formed XML that's not valid against a particular schema.
Real-World Places XML Appears
- SOAP web services. The older half of enterprise software still runs on SOAP — banking APIs, ERP systems, payment gateways. Every SOAP message is a well-defined XML document with an Envelope, Header, and Body.
- RSS and Atom feeds. Every podcast feed, news site feed, and YouTube channel subscription is an XML document. RSS 2.0 and Atom are both XML-based formats that have existed since the early 2000s.
- SVG images. Scalable Vector Graphics is XML. When you open a
.svgfile in a text editor, you're reading XML. This is why you can style SVGs with CSS and manipulate them with JavaScript. - Office documents. A
.docx,.xlsx, or.pptxfile is a ZIP archive containing XML files. Office Open XML is how Microsoft stores all modern Office document formats. - Android layouts. Android UI layouts are defined in XML files. If you've done Android development, you've written a lot of XML.
- Maven and pom.xml. Java's Maven build system uses a
pom.xmlfile to define project dependencies and build configuration — familiar to any Java developer.
Parsing XML in JavaScript — DOMParser
In the browser, you can parse XML strings using the built-in
DOMParser
API. It works the same way as parsing HTML:
const xmlString = `<?xml version="1.0"?>
<product id="SKU-8821">
<name>Wireless Headphones</name>
<price currency="USD">149.99</price>
<categories>
<category>Electronics</category>
<category>Audio</category>
</categories>
</product>`;
const parser = new DOMParser();
const doc = parser.parseFromString(xmlString, 'application/xml');
// Check for parse errors
const parseError = doc.querySelector('parsererror');
if (parseError) {
console.error('XML parse error:', parseError.textContent);
} else {
const name = doc.querySelector('name').textContent;
const currency = doc.querySelector('price').getAttribute('currency');
const categories = [...doc.querySelectorAll('category')].map(el => el.textContent);
console.log(name); // Wireless Headphones
console.log(currency); // USD
console.log(categories); // ['Electronics', 'Audio']
}Parsing XML in Python — ElementTree
Python's standard library includes
xml.etree.ElementTree
— no third-party packages needed for basic XML parsing:
import xml.etree.ElementTree as ET
xml_string = """<?xml version="1.0"?>
<product id="SKU-8821">
<name>Wireless Headphones</name>
<price currency="USD">149.99</price>
<categories>
<category>Electronics</category>
<category>Audio</category>
</categories>
</product>"""
root = ET.fromstring(xml_string)
name = root.find('name').text
currency = root.find('price').get('currency')
categories = [el.text for el in root.findall('categories/category')]
print(name) # Wireless Headphones
print(currency) # USD
print(categories) # ['Electronics', 'Audio']The Gotchas That Bite XML Authors
- Entity encoding. Five characters must be escaped in XML text and attribute values:
&→&,<→<,>→>,"→",'→'. Forgetting to escape an ampersand in a URL breaks the entire document. - Namespaces. Namespaced XML looks like
<soap:Envelope xmlns:soap="...">. Queries without namespace awareness will fail to find elements. This is one of the most common XML parsing bugs. - Whitespace sensitivity. Whitespace between elements is technically meaningful in XML (unlike HTML). Most parsers handle it sensibly, but it can cause surprises when comparing documents.
- Character encoding. The XML declaration should match the actual file encoding. A UTF-8 file declared as ISO-8859-1 will parse incorrectly for any non-ASCII characters.
- No native array type. XML doesn't have arrays. You represent lists by repeating elements, which means parsers return NodeLists rather than JavaScript arrays — always convert with
Array.from()or spread.
XML Tools Worth Knowing
Working with XML regularly? These tools will save you time: XML Formatter to pretty-print minified XML responses, XML Validator to check well-formedness, XML to JSON to convert XML responses to JSON for easier handling, and XML XPath to test XPath queries against a document.
Wrapping Up
XML is verbose and complex compared to JSON, but it's not a legacy footnote — it's the foundation of SVG, Office documents, RSS feeds, SOAP services, and Android UIs. Understanding the element/attribute distinction, the difference between well-formed and valid XML, and the common gotchas around entity encoding and namespaces will serve you well whenever XML shows up in your work. And it will show up.