Sam's infrequently-updated cabinet of curiosities
Saturday, 16 June 2007

E4X

E4X, short for "ECMASCript for XML", is an extension to ECMAScript (i.e. JavaScript, JScript, ActionScript...) with new syntax and built-in objects for more convenient handling of XML fragments. It seems to be used most frequently with ActionScript 3 (Flash), but is also available in recent Mozilla/Firefox releases.

I whipped up this guide after a quick read-through of the specification and a bit of playing around. Corrections are more than welcome.

In order, it briefly outlines:

  • The syntax for declaring literal XML values
  • XML and XMLList objects
  • Variable interpolation in XML literals
  • The new syntax for traversal of XML objects
  • Namespace considerations
  • The methods of XML objects

First-class XML

E4X XML objects can be created by passing a string to the XML constructor function, but that's hardly exciting. Much more interesting is the new syntax for XML literals, similar to that in Scala. It's exactly what you'd expect:

var x = <elm id="1">
    <a>content</a>
</elm>;

There's no more need to bother with painful string concatenation or backslashed line continuations.

Even better, XML objects are first-class citizens. They have properties and methods; they can be deleted, concatenated and iterated over.

var y = x + <elm id="2" />;
var name = <xml />.name();

XML and XMLList

As well as XML, E4X defines the XMLList, an ordered collection of XML objects similar to the W3C DOM NodeList.

The literal syntax is rather less intuitive:

var xl = <>
    <a />
    <b />
    <c />
</>;

Much of E4X's expressive power comes from the blurring of the line between XMLList and XML objects. Both have a type of xml; instanceof xml returns true for both.

The advantage is that you rarely need to worry about which you have. A single-item XMLList is treated identically to an XML object, and even longer lists share many of the same methods. The text() method of an XML object returns its text content. On an XMLList it returns the concatenated text content of all list members.

If you do need to tell the difference, just check the .length(): an XML object's length is always 1.

Literal Interpolation

When declaring a literal, expressions inside braces (curly brackets) are automatically evaluated.

var name = "bob^%*";
var tag = "person";
var p = <{tag} id="3">{name.replace(/[^a-z]/ig, "")}</{tag}>;
// <person id="3">bob</person>

Braced values are not, however, evaluated in CDATA sections, such as the contents of attribute values:

var att = "id";
var val  = 3;
var a = <person {att}="{val}">bob</person>;
// <person id="{val}">bob</person>

var b = <person {att}={val}>bob</person>;
// <person id="3">bob</person>

Interpolated attribute values are automatically quoted; any XML entities are automatically escaped.

val = "\"<>";
b = <person {att}={val}>bob</person>
// <person id="&quot;&lt;>">bob</person>

Literal braces should be escaped as &#x7B; and &x#7D; for { and } respectively.

Accessing XML Properties

XML objects can be filtered and traversed using an object syntax similar to ElementTree and BeautifulSoup, with a bit of XPath thrown in.

A node's child elements can be accessed as properties:

var x = <people class="example">
    <person id="1"><name>sam</name></person>
    <person id="2"><name>elizabeth</name></person>
</people>;

var names = x.person.name;
var name  = x.person[0].name;

names.toXMLString();
// <name>sam</name> <name>elizabeth</name>
name.toXMLString();
// <name>sam</name>

x.[name] is the same as x.child([name]).

Attributes

As with XPath, "@" is used to access attributes.

var id = x.person[0].@id;

x.@[name] is identical to x.attribute([name]).

Descendants

The .. operator accesses all descendants, not just the immediate children.

var names = x..name;
var ids = x..@id;

x..[name] is equivalent to x.descendants([name]).

The Wildcard

The "*" wildcard matches all names.

var persons  = x.*;
var all      = x..*;
var attrs    = x..@*;

The wildcard is magic in more than one context, but in this one it's equivalent to QName(null, "*").

var all = x.descendants(QName(null, "*"));

Filtering Predicates

var me     = x.person.(name == "sam");
var either = x.person.(@id == 1 || @id == 2);

Predicates can be nested and quite complex:

var me = x..*.(name == "sam" && 
    name.parent().(@id == 1).name() == "person");

They're not quite as useful as they could be, however. Unlike XPath, E4X expressions cannot easily be used to search ancestor axes.

The previous example illustrates a potential problem. It only works because the list of matches is reduced to one by (name == "sam") before the parent() method is invoked.

This expression, on the other hand, will raise an exception:

x..*.(name.parent().@id == 1);

The filter does not examine the parent of every name in turn; it looks for the single parent of the entire list of names together. It returns undefined unless every member shares the same parent.

Deletion

The delete keyword works on arbitrary E4X expressions:

delete x.person.(@id == 1); // that's me gone 
delete x..person            // ... and everyone else

Assignment

You can also use the normal assignment operator:

x..name[0] = "batman";
x.@pointless = "new attribute!";
x.person += <person id="3"><name>alfred</name></person>

In some circumstances you can also assign to an expression that would return a list:

x.* = <goodbye_previous_content />;

But those nodes were all together, so replacing them at once is a natural operation. This, on the other hand, is illegal:

x.person.@newattributes = "for all";

Iteration

There are several ways to iterate over XMLList and XML objects, though for XML the exercise is meaningless:

x[0] == x;
// true

Nevertheless. First, iteration over list indices:

var i, elm;
for (i in x..*) {
    elm = x..*[i];
}

The same can be accomplished with a for;; loop and the length() method.

for (i=0; i<x.length(); ++i) {
    elm = x[i];
}

Most useful of all, though, is the new for each .. in syntax, allowing direct manipulation of matching nodes:

var elm;
for each (elm in x.person) {
    elm.@id += 1;
}

Namespaces

E4X has robust namespace support, but (as anyone with XML experience must expect) they complicate an otherwise simple model.

var x = <xml>
        <v1>value one</v1>
        <v2>value two</v2>
    </xml>;

x.v1 == "value one";
// true

With namespaces, you have to use a qualified name.

var x = <xml xmlns="http://example.com/">
        <v1>value one</v1>
        <v2>value two</v2>
    </xml>;

x.v1 == undefined;
// true

var example = Namespace("http://example.com/");
x.example::v1 == "value one";
// true

Note the use of the :: scoping operator. You can also suggest a namespace prefix and/or or construct the QName directly:

var example = Namespace("example", "http://example.com/");
var name = QName(example, "v1");
var same = QName("http://example.com/", "v1");

If more liberal matching is required, the * wildcard signifies any namespace.

x.*::v1 == "value one";
// true

The wildcard anyname-namespace is different from the unnamed namespace, and can also be created by passing null to the Namespace constructor. The following are equivalent:

x.*::v1
x.child(QName(null, "v1"));

The default namespace

Using perhaps the most self-explanatory syntax ever devised, you can set the default XML namespace in the current scope.

var example = Namespace("http://example.com/");
default xml namespace = example;
// or
default xml namespace = "http://example.com/";

var x = <xml />;
x.toXMLString();
// <xml xmlns="http://example.com/"/>

To reiterate: in the current scope.

toString() vs. toXMLString()

There is an important difference between the toString and toXMLString methods.

var x = <people>
    <person id="1"><name>sam</name></person>
    <person id="2"><name>elizabeth</name></person>
</people>;

var names = x.person.name;
var name = x.person[0].name;

names.toXMLString();
// <name>sam</name> <name>elizabeth</name>
name.toXMLString();
// <name>sam</name>

names.toString();
// <name>sam</name> <name>elizabeth</name>
name.toString();
// sam

toString returns different values depending on whether or not an object is considered "complex". If there are no child elements (other types, such as XML comments, don't count), it returns the element's text content only. This is very useful in most cases but a painful gotcha in others.

Extending E4X

ECMAScript lets you do wonderful things by extending Object.prototype, String.prototype etc. with new methods.

It's much harder with E4X. The prototypes of XML and XMLList are read-only, so new methods can't be added directly. Most of their existing methods throw exceptions if they are applied to any other object. Procedural code will have to do.

Future versions will have built-in support for custom types based on XML schemas.

Global function reference

isXMLName( value ) : bool

Is the value usable as an XML name?

XML Constructor Reference

The XML constructor has several properties managing global settings for XML processing and serialization.

XML.ignoreComments

Ignore XML comments. (Default: true.)

XML.ignoreProcessingInstructions

Ignore XML processing instructions. (Default: true.)

XML.ignoreWhitespace

Ignore whitespace. (Default: true.)

XML.prettyPrinting

Pretty-print XML output with toXMLString() etc. (Default: true.)

XML.prettyIndent

Pretty indent level for child nodes. (Default: 2.)


There are also three methods to more easily apply and restore settings for use, say, within a function.

XML.settings()

Get an Object containing the above settings.

XML.defaultSettings()

Get an object containing the default settings.

XML.setSettings([settings])

Set XML settings from, e.g., an object returned by XML.settings().

XML Object Reference

addNamespace([namespace])

Add a namespace declaration to the object.

appendChild(child)

Append a node to the object's list of children.

attribute(attributeName)

Returns an XMLList of zero or one matching attributes.

Same as element.@attributeName.

attributes()

Returns an XMLList of attributes.

Same as `element.@*

child(propertyName or index)

Same as element.propertyName or element[index].

childIndex()

Returns the node's position in the parent's list of children, or -1 if there is no parent or its children are unordered.

children()

Returns an XMLList of children.

Same as element.*.

comments()

Returns an XMLList of child nodes that are comments.

Same as element.(*.nodeKind() == 'comment').

contains(value)

Same as element == value.

copy()

Return a deep copy of the object, detached from its parent.

descendants(name)

Return all descendants with the given name, or, if name is null or undefined, all descendants.

Same as element..name.

elements([name])

Returns all child elements with the given name, or, if name is null or undefined, all child elements.

Same as element.(*.nodeKind() == 'element').

hasOwnProperty(prop)

The same as on any other object.

hasComplexContent()

Returns true if the node has complex content (in effect, if it has child elements).

hasSimpleContent()

The opposite of hasComplexContent.

inScopeNamespaces()

Returns an Array of in-scope Namespace objects.

insertChildAfter(anchor, child)

insertChildBefore(anchor, child)

Insert a child node before or after the specified anchor node. If the anchor is null, insert before or after no nodes.

If the anchor is not in this XML object, do nothing.

length()

Return the length of the object. For XML objects always return 1.

localName()

Return the local part of the qualified name. (A node's name not including its namespace.)

name()

Return the qualified name. (Including namespace.)

var x = <xml xmlns="http://example.com/">abc</xml>;
x.name() == "http://example.com/::xml";
x.localName() == "xml";
x.namespace() == "http://example.com/";

namespace([prefix])

Return the in-scope namespace specified by prefix, or:

  • If no namespace matches, return undefined.
  • If prefix is not provided, return the default namespace.

namespaceDeclarations()

Return an Array of Namespace objects representing namespaces declared (as in assigned a prefix) on this XML object.

nodeKind()

Returns the type of XML node, one of attribute, element, comment, processing-instruction, text.

normalize()

Merge adjacent text nodes and remove empty text nodes on this all descendants.

parent()

Return the parent node. On an XMLList, this method returns undefined unless all members share the same parent.

processingInstructions([ name ])

Returns all child processing instructions with the given name, or, if name is null or undefined, all child processing instructions.

Same as element.(*.nodeKind() == 'processing-instruction').

prependChild(value)

Insert value at the beginning of the object's child nodes.

propertyIsEnumerable(prop)

Will the specified property be enumerated in a for .. in loop? Same as for other objects.

removeNamespace(namespace)

If possible, remove the given namespace from the object and all descendants. removeNamespace will not remove a namespace if it is referenced in that object or any of its children.

replace(propertyName, value)

Replace value specified by propertyName, where propertyName is a name, numeric index or * wildcard, with value.

setChildren(value)

Replace the object's children with value.

setLocalName(name)

Change the object's local name using a string or the localName property of a QName object.

setName(name)

Set the object's name and alter the in-scope namespaces to fit.

setNamespace(ns)

Replace the object's default namespace with ns.

text()

Returns all child text nodes with the given name, or, if name is null or undefined, all child text nodes.

Same as element.(*.nodeKind() == 'text').

toString()

Returns a string representation. Elements with simple content (i.e., no child elements) are returned as text; complex elements are returned as XML.

toXMLString()

An XML serialization of the object.

valueOf()

Return this object.

XMLList Reference

Most methods are the same. Descendant methods such as children() and text() are simply applied to all members of the list and the results combined. Others, like parent(), don't work when it isn't logical that they do so -- consult your common sense.

Optional Features

Implementations are allowed to include these optional features, or not. Currently Mozilla seems to be on the "or not" side of the fence, but they're easy enough to implement in userspace if you need them.

domNode()

Return a W3C DOM node representation of the object.

domNodeList()

Return a W3C DOM NodeList representation.

xpath(exp)

Apply the XPath expression exp and either return an XMLList of results or throw a TypeError.