Sam's infrequently-updated cabinet of curiosities
Tuesday, 23 June 2009

Backbars and JavaScript Bitmaps

Oh, that's right, I have a blog! I've got a bunch of old projects to document, but first, a minor diversion of this past week.

Eliazar Parra published a fabulous user-script to add "backbars" to social link sites like Digg and Reddit, where content is voted on and scored. The idea of a backbar is that each item is unobtrusively shaded with a background that corresponds with its score.

The backbars effectively transform the page into a bar chart, and in some cases it's a spectacular improvement. His Stack Overflow example highlights the shortcomings of the original design, which disguises entire orders of magnitude by turning "39,9000" into "39.9k", which has the same visual weight as "3,996".

Backbars make all the difference:

Stack Overflow design

Love it!

Generating Images

However, the original implementation has external dependencies, most vexing being coloured rectangles hosted on the author's site. As a result, you can't even change backbar colours.

This led down the rabbit hole into dynamically generating the images from within the script. Most formats are frustratingly complex to implement (I am not porting zlib to JavaScript), so it had to be simple bitmaps.

Luckily, they don't only come uncompressed; there are several flavours of run-length compression, a simple scheme whereby you replace a series like redpixel + redpixel + redpixel... with a single redpixel * 100. In the case of these big blocks of colour, it takes 2 bytes to encode 255 pixels, quite a staggering improvement over the 1020 bytes of the uncompressed 24-bit version.

The end result: jsbmp, a small library for generating bitmaps in JavaScript.

Pretty useless, but there are a few intriguing possibilities. Sparklines, maybe?

Not Invented Here

And then of course I had to do something about jQuery, so I ended up reimplementing the script. Enter Admiral Backbar. (It's a trap!)

Sunday, 26 October 2008

Notes on installing Roundup at Dreamhost

Installing Roundup 1.4.6 at Dreamhost stretched from the "15-30 minutes" specified in the installation docs to something more like four hours. This is a collection of notes for next time.

Dramatis Personae

Roundup is the software. It's billed as "a simple-to-use and -install issue-tracking system", but it's so configurable that it's probably better described as a lightweight tracker-oriented framework.

The Roundup installatation includes the Roundup module -- i.e., what you get when you import roundup -- and a set of administration scripts: roundup-admin, roundup-server etc.

You can use the roundup-admin script to create a tracker. If Roundup is a framework, the tracker is the application. It includes a data schema, page templates, extensions, custom behaviours and so on.

To access the tracker you will need to set up the web interface.

1. Installation

If you get the source distribution, the instructions suggest that you install it. Don't.

Which is to say, the docs assume that you are a server administrator attempting to add the Roundup module and associated scripts into your Python installation directory. If you run this:

`python setup.py install --prefix test_directory`

The result is:

  • test_directory/Lib/site-packages/roundup/ -- Roundup module
  • test_directory/Scripts/ -- platform-specific administration scripts
  • test_directory/share/roundup/templates/ -- included templates
  • test_directory/share/roundup/cgi-bin/ -- included web interface
  • test_directory/share/locale/ -- translation files

i.e., files that slot neatly into a Python install. Because I have not installed a custom version and obviously can't touch the server-wide installation, this structure is not ideal.

2. Be Careful About Changing The Directory Structure

I want the Roundup module in my directory for Python libraries, /home/me/pylib. I don't care about locales and the scripts are just thin wrappers around Python files in roundup/scripts.

It's the templates that are the problem. The comment on the listTemplates function in roundup/admin.py reveals the 5-step process Roundup uses to find them:

Look in the following places, where the later rules take precedence:

  1. <roundup.admin.__file__>/../../share/roundup/templates/*
    this is where they will be if we installed an egg via easy_install
  2. <prefix>/share/roundup/templates/*
    this should be the standard place to find them when Roundup is installed
  3. <roundup.admin.__file__>/../templates/*
    this will be used if Roundup's run in the distro (aka. source) directory
  4. <current working dir>/*
    this is for when someone unpacks a 3rd-party template
  5. <current working dir>
    this is for someone who "cd"s to the 3rd-party template dir

Either throw the templates directory into /home/me/pylib too or only ever run the roundup-admin script from a template directory. Or, if you don't already have a directory for libraries, just run it from the source distribution.

3. Create a Tracker

Run roundup-admin if you've installed Roundup and the script is on your PATH, otherwise python roundup_admin.py.

Type install and follow the prompts:

Enter tracker home: /home/me/tracker/
Templates: classic, minimal
Select template [classic]: classic
Back ends: anydbm, mysql, sqlite
Select backend [anydbm]: mysql

If the template or backend files can't be found, it won't be possible to select them, so exit and fix the problem.

You'll have to come back to roundup-admin later, but now it's time to configure the tracker.

4. Configure Your Tracker

Head to /home/me/tracker/ and edit config.ini. Search for "NO DEFAULT" to find the items that need to be set. The tracker won't run at all if they're not.

In the Dreamhost context, consider a subdomain -- http://tracker.example.com/ -- because it will make deployment much easier.

Return to roundup-admin and initialise the tracker to set up the default user accounts and roles.

5. Creating The Web Interface

I spent several hours failing to track down mysterious bugs apparently caused by some combination of my tracker settings and the bundled CGI interface. Try using the WSGI interface instead:

#!/usr/bin/env python2.4

# Enable HTML tracebacks
import cgitb
cgitb.enable()

# obtain the WSGI request dispatcher
from roundup.cgi.wsgi_handler import RequestDispatcher
tracker_home = '/home/me/tracker/'
app = RequestDispatcher(tracker_home)

from wsgiref.handlers import CGIHandler
CGIHandler().run(app)

Note that wsgiref isn't included in Python 2.4. Dreamhost isn't providing 2.5 yet, so just download the package and put it somewhere handy.

6. Deploying With Passenger

You can use Phusion Passenger to deply WSGI applications; the wiki has more details. Just set it up in the panel and modify the web interface a little:

#!/usr/bin/env python2.4

import sys, cgitb

# Enable HTML tracebacks
cgitb.enable()

# Even if you've got your paths set up to find
# your python libraries automatically, Passenger's
# interpreter won't.
sys.path.append("/home/me/pylib")

from roundup.cgi.wsgi_handler import RequestDispatcher

# The WSGI app has to be called "application"
application = RequestDispatcher("/home/me/tracker/")

# That's it.

Save it as passenger_wsgi.py and you're good to go.

Wednesday, 30 April 2008

Mrs. Gamely's Words

Mark Helprin's Winter's Tale is a delight, even if its apparent themes are little more than an excuse for the wordplay window-dressing. I might not have found perfect justice, but a hall of light and mirrors built from language is quite enough for me.

One thing that needled, not being the sort to read with dictionary in hand, was this:

Though Mrs. Gamely was by all measures prescientific and illiterate, she did know words. Where she got them was anyone's guess, but she certainly had them. Virginia speculated that the people on the north side of the lake, steeped in variations of English both tender and precise, had made with their language a tool with which to garden a perfect landscape. Those who are isolated in small settlements may not know of the complexities common to great cities, but their hearts are rich, and so words are generated and retained. Mrs. Gamely's vocabulary was enormous. She knew words no one had ever heard of, and she used words every day that had been mainly dead or sleeping for hundreds of years. Virginia checked them in the Oxford dictionary, and found that (almost without exception) Mrs. Gamely's usage was flawlessly accurate. For instance, she spoke of certain kinds of dogs as Leviners. She called the areas near Quebec march-lands. She referred to diclesiums, liripoops, rapparees, dagswains, bronstrops, caroteels, opuntias, and soughs. She might describe something as patibulary, fremescent, pharisaic, Roxburghe, or glockamoid, and words like mormal, jeropigia, endosmic, mage, palmerin, thos, vituline, Turonian, galingale, comprodor, nox, gaskin, secotine, ogdoad, and pintulary fled from her lips in Pierian saltarellos. Their dictionary looked like a sow's ear, because Virginia spent inordinate proportions of her days racing through it, though when Mrs. Gamely was angry a staff of ten could not have kept pace with her, and half a dozen linguaphologists would have collapsed from hypercardia.

Winter's Tale (New York: Harvest, 2005), 225-226

For reference (thanks, Oxford English Dictionary!):

Leviner
? (presumably not from "levin", to emit flashes of lightning)
march-land
a border territory
diclesium
botanical term for a kind of dry, seed-retaining fruit
liripoop
part of a graduate's hood in early academic costume; later (presumably by derivation) "to have [one's] liripoop" was to have learned a lesson or part
rapparee
a 17th-century Irish pikeman; later an Irish bandit
dagswain
a kind of rough bed-cover
bronstrops (singular)
a female procuress [of sexual services]
caroteel
an old commercial measure of quantity ("a caroteel of cloves")
opuntia
originally a Greek herb; later an American cactus
sough
a whisper or murmur or breath; or, a drain or swampy place; or, a ploughshare
patibulary
of or relating to a gallows (patibulum: fork-shaped gibbet)
fremescent
growing noisy
pharisaic
of the Pharisees, hence legalistic, self-righteous, devoted to the letter and not the spirit
Roxburghe
a style of bookbinding
glockamoid
shaped like an arrow-head (note: not in OED)
mormal
a kind of scab or sore
jeropigia
from Portuguese "geropiga", a mixture of grape juice, brandy and sugar used to adulterate wines
endosmic
relating to endosmosis, the flow of a fluid from an area of lesser concentration to one of greater
mage
a magician, or more generally a wise man
Palmerin
16th Century Spanish romantic hero, hence any knightly champion
thos
old Greek and Latin name for some kind of canine animal not definitively identified by subsequent historians
vituline
of or like a calf (vitulus: calf)
Turonian
part of the Cretaceous period
galingale
a gingery spice; better known as galangal
comprodor
possibly a misspelling of "comprodor", a native steward or head servant, intermediary with the locals
Nox
personification of the night, from nox, night
gaskin
a kind of breeches; or, formation from "gasket"
secotine
possibly a misspelling of "Seccotine", a brand of glue originating in the 19th century
ogdoad
the number eight, a set of eight; or specifically the Ogdoad, eight divine beings of ancient Egypt
pintulary
?
Pierian
relating to Pieria, home of the Muses; hence, poetic
saltarello
an animated Italian and Spanish dance

Trawling the book for the rest of Helprin's vocabulary I leave for someone else, but special mention is due to "amphibological" -- of amphibology/amphiboly, ambiguity of speech, especially deriving from grammatical construction -- for appearing in context in the title Amphibological Whimsey Dances. It's a better name for wordplay than wordplay.

Tuesday, 05 February 2008

IntelliType XML

Microsoft hardware is fabulous, but -- like everyone else ever to buy one -- I was disgruntled to find that the Natural Ergonomic Keyboard 4000's centre-keyboard lever is set to control zoom, and only zoom, rather than something like vertical scroll. Worse, IntelliType won't let you change it, unlike almost every other special key on the board.

Luckily, it turns out that it's a limitation of the interface, not the software; you can get a lot more done with a little registry tweaking and two XML configuration files in the installation directory:

  • C:\Program Files\Microsoft IntelliType Pro\commands.xml
  • C:\Program Files\Microsoft IntelliType Pro\mscmdkey.xml

There are two different kinds of customisation possible:

  1. Use the registry to map keys to commands specified in mscmdkey.xml
  2. Change the behaviour of those commands in specific contexts with commands.xml

First, we want to look at mscmdkey.xml.

mscmdkey.xml

mscmdkey.xml lists commands sent out by the keyboard to IntelliType. It should probably be treated as read-only, but it's informative; commands.xml can't easily be edited without it. The commands look like this:

<Command name= 'VOLUME_UP_COMMAND' id='700' isUI='false' default='true' >
    <ResourceIDs displayName='3809' description='4009' descriptionPlusAccel='0' osdText='4270'/>
</Command>

The magic numbers are inscrutable (a displayName of 3809, an osdText of 4270?), but the name and id are simple enough.

The name is a human-readable label. VOLUME_UP_COMMAND is the default action of the volume-up button. SAVE_COMMAND corresponds to the "Save" button above F11. Some of the commands don't have buttons on all keyboards: the Natural 4000 has no "next track" button (MEDIA_NEXT_TRACK_COMMAND), but the Natural MultiMedia does. Some of the commands never have buttons: BATTERY_LOW_COMMAND is an automatic feature of the battery-powered wireless models.

We'll need the id when we get to commands.xml. To change the Zoom lever we want:

  • <Command name= 'ZOOM_IN_COMMAND' id='319' default='true' >
  • <Command name= 'ZOOM_OUT_COMMAND' id='320' default='true' >

319 and 320, respectively.

Some commands have an MSReserved sub-element. The appCommand attribute corresponds to a WM_APPCOMMAND parameter (e.g., for NEW_COMMAND it's 29, the defined value of APPCOMMAND_NEW).

I've had no success trying to add my own commands under high, unused ids.

Remapping keys

The IntelliType software allows you to remap some, but not all, of the keyboard's keys. When you do, the changes are saved to the registry at HKEY_CURRENT_USER\Software\Microsoft\IntelliType Pro\EventMapping.

The easiest way to find the key-code of a key you want to remap is to use IntelliType to assign it, then modify the values. The entry for my second favourites button looks like this:

[HKEY_CURRENT_USER\Software\Microsoft\Intellitype Pro\EventMapping\79]
"ShellExecute"="cmd.exe"
"Friendly"="Python"
"Arguments"="/K python"
"Command"=dword:00000320

The important one is Command. dword:00000320 is 0x320 hex, or 800 in decimal. Search mscmdkey.xml for a command with an id of 800 and you'll find SHELL_EXECUTE_COMMAND -- that sounds a lot like what we're doing. You can use any of the commands from mscmdkey.xml here, though not all of them work.

Friendly is just a human-readable name (this entry displays in the favourites menu as "start Python"). ShellExecute and Arguments are specific to SHELL_EXECUTE_COMMAND.

IntelliType can only set the favorites buttons to applications and URLs, but I miss the next and previous track buttons from my Natural MultiMedia. I've put them as faves 4 and 5:

[HKEY_CURRENT_USER\Software\Microsoft\Intellitype Pro\EventMapping\81]
"Command"=dword:000002c0

[HKEY_CURRENT_USER\Software\Microsoft\Intellitype Pro\EventMapping\82]
"Command"=dword:000002bf

0x2c0 is 704, MEDIA_PREVIOUS_TRACK_COMMAND; 0x2bf is 703, MEDIA_NEXT_TRACK_COMMAND.

Some keys can't be remapped. The "My Favorites" key, for example, can be disabled with DISABLE_COMMAND (400) but operates as normal with any other value.

Editing commands.xml

commands.xml lets us redefine what those commands actually do in any given context. It makes sense: not all software is alike, so triggering (e.g.) a spell check will require something different in OpenOffice.Org Writer than in Microsoft Word. Many of the mappings are simpler than you might think: SAVE_COMMAND (311), for example, is just a macro that performs the "Ctrl + s" keyboard shortcut!

commands.xml looks something like this, trimmed for brevity:

<DPGCmd>
    <ENG>
        <Application UniqueName="StandardSupport">
            <C311 Type="5" KeySeq="ctrl s" />
            <C401 Type="5" KeySeq="F7" />
        </Application>
    </ENG>
    <ALL>
        <Application UniqueName="Notepad" AppName="Notepad">
            <C311 Type="1" wParam="0x10001" />
            <C401 Type="0" />
            <C309 Type="5" KeySeq="alt F4" />
        </Application>
    </ALL>
</DPGCmd>

We're not interested in a lot of this. Each installed language has an element under the root DPGCmd node defining the function of a command in that locale. ENG means English. The special element ALL applies to all languages; unless you're a hardcore polyglot then it's all you'll need.

Commands are mapped on a per-application basis, but AppName doesn't seem to actually do anything: change "Notepad" to "Potato" and the keys will work the same.

UniqueName is the important one, and refers to the window class name passed to the relevant Windows API functions. If you want to do application-specific customisation, there's third-party software around (e.g. The Customiser) that can get a class name from any active window. If not, the special UniqueName value "StandardSupport" applies to all window classes.

Be careful not to have conflicting rules. Commands defined under specific UniqueNames will override commands defined under StandardSupport, but they can't be defined under specific languages if they're also in ALL.

Commands

Every Application contains elements with names like C319, where 319 is the id of a command in mscmdkey.xml. All of these command-elements have a type attribute:

  • Types 1-4 take different kinds of undocumented magic numbers. Type 2 commands seem to be handled by Windows, rather than the active application, as they execute shell functions (open default browser, search window etc.), but that's all I can figure out.
  • Type 7 commands take another type as a subtype. By default it's only used for OFFICE_TASK_PANE_COMMAND, so I'm assuming it's a hack just for that.

The others are easier:

  • Type 0 disables the key.
  • Type 6 takes an Activator which is passed to the window. Some of them are evidently application-specific -- IllustratorZoomin, IE7Save -- but others seem more general. An incomplete listing:

    • ZoomIn
    • ZoomOut
    • ScrollUp
    • ScrollDown

Type 5 is the fun one, which takes a simple keyboard macro in its KeySeq attribute. As mentioned above, "Save" is implemented like this:

<C311 Type="5" KeySeq="ctrl s" />

i.e., it's exactly the same as pressing "Ctrl" and "s". Multiple chords can be separated with a | pipe character, so if you want a "Hello world!" button:

<C203 Type="5" KeySeq="shift h | e | l | l | o | 
    space | w | o | r | l | d | shift 1" />

Macros seem to be deliberately limited, possibly as a security feature. You can't alt tab out, for example, or navigate menus with alt | f | downarrow | downarrow | enter. You can implement macros that reach into "Save As" or "Open" file-pickers, though, and presumably other kinds of dialog.

See Also

I wouldn't even have considered looking for baroque XML configuration files if the internet hadn't said it was possible. Joel Bennett's guide has the most detail by far, plus a handy bit of XSL to remove the F-lock annoyance.

Sunday, 28 October 2007

Sight and Sound Top Tens

In 1952, Sight and Sound magazine followed up a Belgian poll of directors' favourite films with a similar referendum, this time for critics. The result was sixty-odd top-ten lists and an aggregate "ten best films", with Vittorio De Sica's Bicycle Thieves coming in at number one.

The poll would be of relatively little interest if it had ended there, but it was repeated in 1962 -- and 1972, 1982, 1992, 2002... Collected, the lists show the evolution of the cinematic canon (or at least the critical zeitgeist, which may not be precisely the same thing) over the last half-century. For example: 1962 saw Citizen Kane move to first position, where it's stayed ever since, but in 1952 it shared thirteenth. Ten years later, it was even further in front.

I suppose that watching items move up and down lists is only of interest to a certain kind of person, but that kind of person is me. It's annoyed me no end that so many of the websites collecting lists of greatest films only provide the S&S aggregate top tens: those by individual critics offer much greater variety, as well as scope for more interesting statistical projects.

To that end, now that I've dug up the original magazines, here they are in full (or close to it):

The BFI has graciously put all of the 2002 results online already.

Not every list was published in the magazine, though from 1962 onwards the effort was made. I haven't included the comments, many to the effect of "you bastards, how can I pick just ten?"

There's a remarkable range of creative interpretations of the words "top ten films". Some included twelve or fifteen. Some included single entries like "The Apu Trilogy" (really three films), "Chaplin's Mutual films" (more than ten), and in one case "Anthology of the works of W. C. Fields" (more than thirty). Some picked small extracts -- like a single musical number -- over films entire. One picked a specific, unreleased cut, subsequently destroyed by re-editing. Some sent in lists of directors.

Note on accuracy

Please forgive any missing diacritics: the OCR was hard on them, and I'm willing to sacrifice a cedilla here and an acute there to save time. All other corrections are welcome.

Monday, 10 September 2007

Drupal Role

Drupal 4.2.0 was the first CMS I ever installed, back in 2003. A few weeks later it became the first CMS I botched an upgrade to, at which point I moved on to other pastures.

That was that until last week, when John demanded help with a custom module for his mountaineering club site, to enable users to enter unique codes in their profile area and have their membership "upgraded" to a new role. I gather the point is to pass out the codes in meatspace when members join or pay their dues -- it's a clever idea.

A day or three later and the result was this module.

Most of my experience developing plugins for other people's PHP comes from WordPress, so it was interesting to use such a dramatically different API. WordPress requires explicit registration of hook functions, e.g.

add_action("edit_post", "my_edit_post_action");

But Drupal uses magic function names. If you have a module called "mymodule", any function called mymodule_init will be automatically hooked into hook_init.

It's an elegant solution, though I favour the Pythonic "explicit is better than implicit" philosophy too much to be comfortable with it.

One of the best (and worst) things about the whole experience was the Forms API. Instead of writing HTML, you just write some code like this:

function mymodule_form() {
    $form['name'] = array(
      '#type' => 'textfield',
      '#title' => t('Name'),
      '#description' => t('What are you called?')
    );
    $form['submit'] = array(
      '#type' => 'submit',
      '#value' => t('Yield!'),
    );
    return $form;
}

Hook the function in at the appropriate place, and Drupal renders it, themed prettily, with anti-CSRF nonces already handled. More magic functions, hooked in as mymodule_form_validate() and mymodule_form_submit(), can be used for validation and form submission actions respectively.

As always, the downside of a leaky abstraction is that customizations not provided for in the API are much harder than they should be, but I expect that the vast majority of modules never have any problems.

Drupal has lots of other niceties missing from WordPress too, like the watchdog logging system and documentation that doesn't suck. I have issues with it as a user -- the learning curve for administration is comparatively steep -- but it's flexible, powerful, and easy to extend. As a developer I'm extremely impressed.

Tuesday, 14 August 2007

JavaScript Bayes

I wanted to have some Bayesian fun in a user-script, so did a quick JavaScript port of the fabulous Divmod Reverend Python module.

It's somewhat limited, but dead easy to use:

var guesser = new Bayes();
guesser.train("hannibal", "I love to kill people and eat them.");
guesser.train("austen", "Come, let us have tea and scones in Mr. Bingley's gazebo.");
guesser.guess("Jane, these scones are simply delightful!");
// [["austen", 0.9999]]

guesser.train("hannibal", "I love to kill people and eat them with tea and scones.");
guesser.guess("Give me those scones or I'll kill and eat you.");
// [["hannibal", 0.9481433307479079], ["austen", 0.6203339133520634]]

It's missing some stuff, but does enough to be getting along with.

As a test application, I went on and wrote up one of the examples given in the Reverend docs: a script to tell whether you write more like Charles Dickens or Jane Austen. It's both pointless and inaccurate, but I suppose it qualifies. :)

Saturday, 23 June 2007

Userscript: IMDb Decoder Ring

It seems to be a Greasemonkey kind of month. IMDb ratings are fuzzy in the middle, so Tom Moertel made a decoder ring listing what the rating means in terms of the movie's per-genre percentile ranking. Leprechaun 5's 3.2 rating seems bad enough even with an even distribution; in reality, it has a worse rating than 90% of movies in the database.

Anyhow, this userscript puts the data conveniently inline.

Before:

Shrek 3 at IMDb without the script

After:

Shrek 3 at IMDb with the script enabled

Download

Monday, 18 June 2007

Userscript: Reddit unread comments helper

Edit - 2008-05-27: Updated to work with the (horrid) Reddit redesign. I can't be bothered updating the screenshots too.

Or: (ab)using Greasemonkey and Google Gears to add features that would be handled better server-side.

The script tracks comments you've seen at Reddit, then exposes the data in several small ways that each make your life a little easier. Features:

  • On the main Reddit list pages, replace the "n comments" links with "x unread comments (n total)".

    Before:

    before the userscript is applied

    After:

    after the userscript is applied
  • On clicking through to a page where you've already read some of the comments, jump to the first unread comment.

  • Highlight unread comments with a bright but non-distracting left margin.

Download or install it.

Notes

My ulterior motive was testing the Gears DB with Greasemonkey. More than once I've wished it had a binding to SQLite, and with Gears it does: it just got a thousand times more useful. It'd be nicer yet if I could save to an arbitrary cross-domain database, but this is still a tremendous step up.

This script uses a bit of a hack and writes itself directly into the window, rather than just manipulating the DOM from the usual plexiglass sandbox. Strictly speaking it's not necessary, and only possible at all because I have no use for the GM_* API functions, but a userscript with Gears does require at least some meddling of this kind.

Gears prompts the user to allow it to run on a specific domain, but the dialog doesn't appear if it's initialized from within Greasemonkey; it has to be done from the unsafe window. Once it's set up -- once the local database has been created and what have you -- Greasemonkey is fine, but that first step is critical.

Still, that's basically the only hurdle, and it's trivial to surmount. Gears is a dream: the API seems a little sparsely featured, but it's so easy to build a platform around that the lack of convenience methods doesn't matter. I wrote a very simple DB wrapper of my own, and others are already building full ORMs. There's no sight of JavaScript on Jacks just yet, but it can't be far off.

Saturday, 16 June 2007

E4X

E4X, short for "ECMASCript for XML", is an extension to ECMAScript (i.e. JavaScript, JScript, ActionScript...) with new syntax and built-in objects for more convenient handling of XML fragments. It seems to be used most frequently with ActionScript 3 (Flash), but is also available in recent Mozilla/Firefox releases.

I whipped up this guide after a quick read-through of the specification and a bit of playing around. Corrections are more than welcome.

In order, it briefly outlines:

  • The syntax for declaring literal XML values
  • XML and XMLList objects
  • Variable interpolation in XML literals
  • The new syntax for traversal of XML objects
  • Namespace considerations
  • The methods of XML objects

First-class XML

E4X XML objects can be created by passing a string to the XML constructor function, but that's hardly exciting. Much more interesting is the new syntax for XML literals, similar to that in Scala. It's exactly what you'd expect:

var x = <elm id="1">
    <a>content</a>
</elm>;

There's no more need to bother with painful string concatenation or backslashed line continuations.

Even better, XML objects are first-class citizens. They have properties and methods; they can be deleted, concatenated and iterated over.

var y = x + <elm id="2" />;
var name = <xml />.name();

XML and XMLList

As well as XML, E4X defines the XMLList, an ordered collection of XML objects similar to the W3C DOM NodeList.

The literal syntax is rather less intuitive:

var xl = <>
    <a />
    <b />
    <c />
</>;

Much of E4X's expressive power comes from the blurring of the line between XMLList and XML objects. Both have a type of xml; instanceof xml returns true for both.

The advantage is that you rarely need to worry about which you have. A single-item XMLList is treated identically to an XML object, and even longer lists share many of the same methods. The text() method of an XML object returns its text content. On an XMLList it returns the concatenated text content of all list members.

If you do need to tell the difference, just check the .length(): an XML object's length is always 1.

Literal Interpolation

When declaring a literal, expressions inside braces (curly brackets) are automatically evaluated.

var name = "bob^%*";
var tag = "person";
var p = <{tag} id="3">{name.replace(/[^a-z]/ig, "")}</{tag}>;
// <person id="3">bob</person>

Braced values are not, however, evaluated in CDATA sections, such as the contents of attribute values:

var att = "id";
var val  = 3;
var a = <person {att}="{val}">bob</person>;
// <person id="{val}">bob</person>

var b = <person {att}={val}>bob</person>;
// <person id="3">bob</person>

Interpolated attribute values are automatically quoted; any XML entities are automatically escaped.

val = "\"<>";
b = <person {att}={val}>bob</person>
// <person id="&quot;&lt;>">bob</person>

Literal braces should be escaped as &#x7B; and &x#7D; for { and } respectively.

Accessing XML Properties

XML objects can be filtered and traversed using an object syntax similar to ElementTree and BeautifulSoup, with a bit of XPath thrown in.

A node's child elements can be accessed as properties:

var x = <people class="example">
    <person id="1"><name>sam</name></person>
    <person id="2"><name>elizabeth</name></person>
</people>;

var names = x.person.name;
var name  = x.person[0].name;

names.toXMLString();
// <name>sam</name> <name>elizabeth</name>
name.toXMLString();
// <name>sam</name>

x.[name] is the same as x.child([name]).

Attributes

As with XPath, "@" is used to access attributes.

var id = x.person[0].@id;

x.@[name] is identical to x.attribute([name]).

Descendants

The .. operator accesses all descendants, not just the immediate children.

var names = x..name;
var ids = x..@id;

x..[name] is equivalent to x.descendants([name]).

The Wildcard

The "*" wildcard matches all names.

var persons  = x.*;
var all      = x..*;
var attrs    = x..@*;

The wildcard is magic in more than one context, but in this one it's equivalent to QName(null, "*").

var all = x.descendants(QName(null, "*"));

Filtering Predicates

var me     = x.person.(name == "sam");
var either = x.person.(@id == 1 || @id == 2);

Predicates can be nested and quite complex:

var me = x..*.(name == "sam" && 
    name.parent().(@id == 1).name() == "person");

They're not quite as useful as they could be, however. Unlike XPath, E4X expressions cannot easily be used to search ancestor axes.

The previous example illustrates a potential problem. It only works because the list of matches is reduced to one by (name == "sam") before the parent() method is invoked.

This expression, on the other hand, will raise an exception:

x..*.(name.parent().@id == 1);

The filter does not examine the parent of every name in turn; it looks for the single parent of the entire list of names together. It returns undefined unless every member shares the same parent.

Deletion

The delete keyword works on arbitrary E4X expressions:

delete x.person.(@id == 1); // that's me gone 
delete x..person            // ... and everyone else

Assignment

You can also use the normal assignment operator:

x..name[0] = "batman";
x.@pointless = "new attribute!";
x.person += <person id="3"><name>alfred</name></person>

In some circumstances you can also assign to an expression that would return a list:

x.* = <goodbye_previous_content />;

But those nodes were all together, so replacing them at once is a natural operation. This, on the other hand, is illegal:

x.person.@newattributes = "for all";

Iteration

There are several ways to iterate over XMLList and XML objects, though for XML the exercise is meaningless:

x[0] == x;
// true

Nevertheless. First, iteration over list indices:

var i, elm;
for (i in x..*) {
    elm = x..*[i];
}

The same can be accomplished with a for;; loop and the length() method.

for (i=0; i<x.length(); ++i) {
    elm = x[i];
}

Most useful of all, though, is the new for each .. in syntax, allowing direct manipulation of matching nodes:

var elm;
for each (elm in x.person) {
    elm.@id += 1;
}

Namespaces

E4X has robust namespace support, but (as anyone with XML experience must expect) they complicate an otherwise simple model.

var x = <xml>
        <v1>value one</v1>
        <v2>value two</v2>
    </xml>;

x.v1 == "value one";
// true

With namespaces, you have to use a qualified name.

var x = <xml xmlns="http://example.com/">
        <v1>value one</v1>
        <v2>value two</v2>
    </xml>;

x.v1 == undefined;
// true

var example = Namespace("http://example.com/");
x.example::v1 == "value one";
// true

Note the use of the :: scoping operator. You can also suggest a namespace prefix and/or or construct the QName directly:

var example = Namespace("example", "http://example.com/");
var name = QName(example, "v1");
var same = QName("http://example.com/", "v1");

If more liberal matching is required, the * wildcard signifies any namespace.

x.*::v1 == "value one";
// true

The wildcard anyname-namespace is different from the unnamed namespace, and can also be created by passing null to the Namespace constructor. The following are equivalent:

x.*::v1
x.child(QName(null, "v1"));

The default namespace

Using perhaps the most self-explanatory syntax ever devised, you can set the default XML namespace in the current scope.

var example = Namespace("http://example.com/");
default xml namespace = example;
// or
default xml namespace = "http://example.com/";

var x = <xml />;
x.toXMLString();
// <xml xmlns="http://example.com/"/>

To reiterate: in the current scope.

toString() vs. toXMLString()

There is an important difference between the toString and toXMLString methods.

var x = <people>
    <person id="1"><name>sam</name></person>
    <person id="2"><name>elizabeth</name></person>
</people>;

var names = x.person.name;
var name = x.person[0].name;

names.toXMLString();
// <name>sam</name> <name>elizabeth</name>
name.toXMLString();
// <name>sam</name>

names.toString();
// <name>sam</name> <name>elizabeth</name>
name.toString();
// sam

toString returns different values depending on whether or not an object is considered "complex". If there are no child elements (other types, such as XML comments, don't count), it returns the element's text content only. This is very useful in most cases but a painful gotcha in others.

Extending E4X

ECMAScript lets you do wonderful things by extending Object.prototype, String.prototype etc. with new methods.

It's much harder with E4X. The prototypes of XML and XMLList are read-only, so new methods can't be added directly. Most of their existing methods throw exceptions if they are applied to any other object. Procedural code will have to do.

Future versions will have built-in support for custom types based on XML schemas.

Global function reference

isXMLName( value ) : bool

Is the value usable as an XML name?

XML Constructor Reference

The XML constructor has several properties managing global settings for XML processing and serialization.

XML.ignoreComments

Ignore XML comments. (Default: true.)

XML.ignoreProcessingInstructions

Ignore XML processing instructions. (Default: true.)

XML.ignoreWhitespace

Ignore whitespace. (Default: true.)

XML.prettyPrinting

Pretty-print XML output with toXMLString() etc. (Default: true.)

XML.prettyIndent

Pretty indent level for child nodes. (Default: 2.)


There are also three methods to more easily apply and restore settings for use, say, within a function.

XML.settings()

Get an Object containing the above settings.

XML.defaultSettings()

Get an object containing the default settings.

XML.setSettings([settings])

Set XML settings from, e.g., an object returned by XML.settings().

XML Object Reference

addNamespace([namespace])

Add a namespace declaration to the object.

appendChild(child)

Append a node to the object's list of children.

attribute(attributeName)

Returns an XMLList of zero or one matching attributes.

Same as element.@attributeName.

attributes()

Returns an XMLList of attributes.

Same as `element.@*

child(propertyName or index)

Same as element.propertyName or element[index].

childIndex()

Returns the node's position in the parent's list of children, or -1 if there is no parent or its children are unordered.

children()

Returns an XMLList of children.

Same as element.*.

comments()

Returns an XMLList of child nodes that are comments.

Same as element.(*.nodeKind() == 'comment').

contains(value)

Same as element == value.

copy()

Return a deep copy of the object, detached from its parent.

descendants(name)

Return all descendants with the given name, or, if name is null or undefined, all descendants.

Same as element..name.

elements([name])

Returns all child elements with the given name, or, if name is null or undefined, all child elements.

Same as element.(*.nodeKind() == 'element').

hasOwnProperty(prop)

The same as on any other object.

hasComplexContent()

Returns true if the node has complex content (in effect, if it has child elements).

hasSimpleContent()

The opposite of hasComplexContent.

inScopeNamespaces()

Returns an Array of in-scope Namespace objects.

insertChildAfter(anchor, child)

insertChildBefore(anchor, child)

Insert a child node before or after the specified anchor node. If the anchor is null, insert before or after no nodes.

If the anchor is not in this XML object, do nothing.

length()

Return the length of the object. For XML objects always return 1.

localName()

Return the local part of the qualified name. (A node's name not including its namespace.)

name()

Return the qualified name. (Including namespace.)

var x = <xml xmlns="http://example.com/">abc</xml>;
x.name() == "http://example.com/::xml";
x.localName() == "xml";
x.namespace() == "http://example.com/";

namespace([prefix])

Return the in-scope namespace specified by prefix, or:

  • If no namespace matches, return undefined.
  • If prefix is not provided, return the default namespace.

namespaceDeclarations()

Return an Array of Namespace objects representing namespaces declared (as in assigned a prefix) on this XML object.

nodeKind()

Returns the type of XML node, one of attribute, element, comment, processing-instruction, text.

normalize()

Merge adjacent text nodes and remove empty text nodes on this all descendants.

parent()

Return the parent node. On an XMLList, this method returns undefined unless all members share the same parent.

processingInstructions([ name ])

Returns all child processing instructions with the given name, or, if name is null or undefined, all child processing instructions.

Same as element.(*.nodeKind() == 'processing-instruction').

prependChild(value)

Insert value at the beginning of the object's child nodes.

propertyIsEnumerable(prop)

Will the specified property be enumerated in a for .. in loop? Same as for other objects.

removeNamespace(namespace)

If possible, remove the given namespace from the object and all descendants. removeNamespace will not remove a namespace if it is referenced in that object or any of its children.

replace(propertyName, value)

Replace value specified by propertyName, where propertyName is a name, numeric index or * wildcard, with value.

setChildren(value)

Replace the object's children with value.

setLocalName(name)

Change the object's local name using a string or the localName property of a QName object.

setName(name)

Set the object's name and alter the in-scope namespaces to fit.

setNamespace(ns)

Replace the object's default namespace with ns.

text()

Returns all child text nodes with the given name, or, if name is null or undefined, all child text nodes.

Same as element.(*.nodeKind() == 'text').

toString()

Returns a string representation. Elements with simple content (i.e., no child elements) are returned as text; complex elements are returned as XML.

toXMLString()

An XML serialization of the object.

valueOf()

Return this object.

XMLList Reference

Most methods are the same. Descendant methods such as children() and text() are simply applied to all members of the list and the results combined. Others, like parent(), don't work when it isn't logical that they do so -- consult your common sense.

Optional Features

Implementations are allowed to include these optional features, or not. Currently Mozilla seems to be on the "or not" side of the fence, but they're easy enough to implement in userspace if you need them.

domNode()

Return a W3C DOM node representation of the object.

domNodeList()

Return a W3C DOM NodeList representation.

xpath(exp)

Apply the XPath expression exp and either return an XMLList of results or throw a TypeError.

Tuesday, 08 May 2007

Sunshine

And, while with silent lifting mind I've trod
The high, untrespassed sanctity of space,
Put out my hand and touched the face of God.

-- J.G. Magee, High Flight

Beware spoilers.

It's a terrible shame that the second half of Event Horizon belonged to one of the worst science fiction films of the 1990s, because the first half promised one of the best.

Dr. Weir (Sam Neill) and his team head into space to investigate an experimental spacecraft, thought lost seven years before when a test of its faster-than-light engine went awry. There's a whole universe of possible explanations for its reappearance, from aliens to temporal anomalies to quantum somethings-or-others, until Weir speaks those ludicrous, devastating words: the ship has returned from "a dimension of pure evil". It's not just a wretched turn of phrase; it's not just that the anticipated twist is the equivalent of "Christine in space" or "HAL666". It's a betrayal.

Science fiction is science fiction, and scientific endeavour is founded on one basic principle: given time and study, given logic, curiosity and empirical investigation, we can figure it out. We can find out what makes it go. We can reverse-engineer the secrets of the universe.

By contrast, Weir gloats:

Did you really think you could destroy this ship? She's defied space and time. She's been to a place you couldn't possibly imagine.

Another character describes the ship's destination as somewhere "beyond scientific reality", "hell". These are not the words of a scientist; they are the words of a priest. Event Horizon invokes the inexplicable supernatural and thereby becomes fantasy. Its moral is the moral of Babel or Icarus: who are we, mere mortals, to touch the sky?

Sunshine, written by Alex Garland (28 Days Later) and directed by Danny Boyle (Trainspotting), is a significantly better film, though its science is almost as dubious and it echoes its predecessor's painful tumble into creature-feature horror. A team of astronauts are sent aboard the ship Icarus II to reignite our dying sun with a nuclear device, but events take a turn for the Horizon when they discover the Icarus I, thought lost seven years earlier when it failed in the same mission, intact but apparently abandoned.

When the crew have their corresponding encounter with an entity beyond easy understanding, it's with nothing so abstruse as extradimensional evil: they find the sun. Even for the audience it approaches the spiritual. Like 2001: A Space Odyssey, some of the most memorable moments are long, slow shots of objects in space; and though the discordant post-rock score couldn't be further from 2001's classical, the effect is the same. At once beautiful, frightening and inspiring, the images hint at the sublime. In Sunshine, they are images of the sun, so massive, so bright, so fundamentally alien to human experience that an emotional response is inevitable.

For the characters the experience is palpably religious. They dream about it at night; Searle (Cliff Curtis) sits in the observation chamber and bathes in light until he burns. Pinbacker (Mark Strong) makes it explicit: he looks into the sun and finds God there. The Icarus myth had already been registered in the names of the ships, but he invokes it again in his attempt to sabotage the mission. He is the instrument of God's will: in accordance with the divine plan, humanity must be allowed to die.

Pinbacker is a murderous psychopath; the crew sensibly refuse to believe him. Humanity wins, and God's plot (or Pinbacker's interpretation of it) is foiled. That is not to say that the spiritual element of the film is diminished, but for the remaining characters it is a purely secular matter. Capa (Cillian Murphy) at one point comments that their computer simulations are inedaquate, because the high heat and the high gravity are enough to bend time and space. The phenomenon is graphically illustrated in the penultimate sequence, and though obviously unscientific -- Boyle has quipped the warning, "Kids! Don't stick your arm in the sun!" -- it's a figurative expression of the same sense of wonder that most of us find in our interactions with the natural world. A flower is no less perfect as the culmination of millions upon millions of evolutionary generations than it is as the work of a divine craftsman. The sun is no less magnificent as the product of purely natural energies. Even the irreligious can appreciate sublimity.

In Event Horizon's defense, it's not necessarily unscientific to claim something as unimaginable: human brains have limits. We evolved that way, coded to survive as hunter-gatherers. We were not coded to imagine the ten or eleven dimensions postulated by theories of quantum mechanics; it's hard enough to accept that time is a fourth. (How do you visualize four dimensions on a three-dimensional plane?) Less esoteric but no less true, most of us couldn't easily distinguish between a hot oven at 200°C and one at 250°C. To a primitive brain it hardly matters: both are hot enough to burn. How then can we deal with the surface of the sun, closer to 5000°C, or the core, over 12000000°C? Death occurs too soon for nerve impulses to even reach the brain. "Unimaginable" is an appropriate label.

But imagination isn't the end-all, and even if it fails we're still capable of comprehending the difference between "two hundred" and "twelve million". One of the many glories of science is that "unimaginable" is not the same as "indescribable".