JavaScript Bayes

August 15, 2007

I wanted to have some Bayesian fun in a user-script, so did a quick JavaScript port of the fabulous Divmod Reverend Python module.

It’s somewhat limited, but dead easy to use:

var guesser = new Bayes();
guesser.train("hannibal", "I love to kill people and eat them.");
guesser.train("austen", "Come, let us have tea and scones in Mr. Bingley's gazebo.");
guesser.guess("Jane, these scones are simply delightful!");
// [["austen", 0.9999]]

guesser.train("hannibal", "I love to kill people and eat them with tea and scones.");
guesser.guess("Give me those scones or I'll kill and eat you.");
// [["hannibal", 0.9481433307479079], ["austen", 0.6203339133520634]]

It’s missing some stuff, but does enough to be getting along with.

As a test application, I went on and wrote up one of the examples given in the Reverend docs: a script to tell whether you write more like Charles Dickens or Jane Austen. It’s both pointless and inaccurate, but I suppose it qualifies. :)

Userscript: IMDb Decoder Ring

June 23, 2007

It seems to be a Greasemonkey kind of month. IMDb ratings are fuzzy in the middle, so Tom Moertel made a decoder ring listing what the rating means in terms of the movie’s per-genre percentile ranking. Leprechaun 5’s 3.2 rating seems bad enough even with an even distribution; in reality, it has a worse rating than 90% of movies in the database.

Anyhow, this userscript puts the data conveniently inline.

Before:

Shrek 3 at IMDb without the script

After:

Shrek 3 at IMDb with the script enabled

Download

Userscript: Reddit unread comments helper

June 19, 2007

Or: (ab)using Greasemonkey and Google Gears to add features that would be handled better server-side.

The script tracks comments you’ve seen at Reddit, then exposes the data in several small ways that each make your life a little easier. Features:

  • On the main Reddit list pages, replace the “n comments” links with “x unread comments (n total)”.

    Before:

    before the userscript is applied

    After:

    after the userscript is applied

  • On clicking through to a page where you’ve already read some of the comments, jump to the first unread comment.

  • Highlight unread comments with a bright but non-distracting left margin.

Download or install it.

Notes

My ulterior motive was testing the Gears DB with Greasemonkey. More than once I’ve wished it had a binding to SQLite, and with Gears it does: it just got a thousand times more useful. It’d be nicer yet if I could save to an arbitrary cross-domain database, but this is still a tremendous step up.

This script uses a bit of a hack and writes itself directly into the window, rather than just manipulating the DOM from the usual plexiglass sandbox. Strictly speaking it’s not necessary, and only possible at all because I have no use for the GM_* API functions, but a userscript with Gears does require at least some meddling of this kind.

Gears prompts the user to allow it to run on a specific domain, but the dialog doesn’t appear if it’s initialized from within Greasemonkey; it has to be done from the unsafe window. Once it’s set up — once the local database has been created and what have you — Greasemonkey is fine, but that first step is critical.

Still, that’s basically the only hurdle, and it’s trivial to surmount. Gears is a dream: the API seems a little sparsely featured, but it’s so easy to build a platform around that the lack of convenience methods doesn’t matter. I wrote a very simple DB wrapper of my own, and others are already building full ORMs. There’s no sight of JavaScript on Jacks just yet, but it can’t be far off.

E4X

June 17, 2007

E4X, short for “ECMASCript for XML”, is an extension to ECMAScript (i.e. JavaScript, JScript, ActionScript…) with new syntax and built-in objects for more convenient handling of XML fragments. It seems to be used most frequently with ActionScript 3 (Flash), but is also available in recent Mozilla/Firefox releases.

I whipped up this guide after a quick read-through of the specification and a bit of playing around. Corrections are more than welcome.

In order, it briefly outlines:

  • The syntax for declaring literal XML values
  • XML and XMLList objects
  • Variable interpolation in XML literals
  • The new syntax for traversal of XML objects
  • Namespace considerations
  • The methods of XML objects

First-class XML

E4X XML objects can be created by passing a string to the XML constructor function, but that’s hardly exciting. Much more interesting is the new syntax for XML literals, similar to that in Scala. It’s exactly what you’d expect:

var x = <elm id="1">
    <a>content</a>
</elm>;

There’s no more need to bother with painful string concatenation or backslashed line continuations.

Even better, XML objects are first-class citizens. They have properties and methods; they can be deleted, concatenated and iterated over.

var y = x + <elm id="2" />;
var name = <xml />.name();

XML and XMLList

As well as XML, E4X defines the XMLList, an ordered collection of XML objects similar to the W3C DOM NodeList.

The literal syntax is rather less intuitive:

var xl = <>
    <a />
    <b />
    <c />
</>;

Much of E4X’s expressive power comes from the blurring of the line between XMLList and XML objects. Both have a type of xml; instanceof xml returns true for both.

The advantage is that you rarely need to worry about which you have. A single-item XMLList is treated identically to an XML object, and even longer lists share many of the same methods. The text() method of an XML object returns its text content. On an XMLList it returns the concatenated text content of all list members.

If you do need to tell the difference, just check the .length(): an XML object’s length is always 1.

Literal Interpolation

When declaring a literal, expressions inside braces (curly brackets) are automatically evaluated.

var name = "bob^%*";
var tag = "person";
var p = <{tag} id="3">{name.replace(/[^a-z]/ig, "")}</{tag}>;
// <person id="3">bob</person>

Braced values are not, however, evaluated in CDATA sections, such as the contents of attribute values:

var att = "id";
var val  = 3;
var a = <person {att}="{val}">bob</person>;
// <person id="{val}">bob</person>

var b = <person {att}={val}>bob</person>;
// <person id="3">bob</person>

Interpolated attribute values are automatically quoted; any XML entities are automatically escaped.

val = "\"<>";
b = <person {att}={val}>bob</person>
// <person id="&quot;&lt;>">bob</person>

Literal braces should be escaped as &#x7B; and &x#7D; for { and } respectively.

Accessing XML Properties

XML objects can be filtered and traversed using an object syntax similar to ElementTree and BeautifulSoup, with a bit of XPath thrown in.

A node’s child elements can be accessed as properties:

var x = <people class="example">
    <person id="1"><name>sam</name></person>
    <person id="2"><name>elizabeth</name></person>
</people>;

var names = x.person.name;
var name  = x.person[0].name;

names.toXMLString();
// <name>sam</name> <name>elizabeth</name>
name.toXMLString();
// <name>sam</name>

x.[name] is the same as x.child([name]).

Attributes

As with XPath, “@” is used to access attributes.

var id = x.person[0].@id;

x.@[name] is identical to x.attribute([name]).

Descendants

The .. operator accesses all descendants, not just the immediate children.

var names = x..name;
var ids = x..@id;

x..[name] is equivalent to x.descendants([name]).

The Wildcard

The “*” wildcard matches all names.

var persons  = x.*;
var all      = x..*;
var attrs    = x..@*;

The wildcard is magic in more than one context, but in this one it’s equivalent to QName(null, "*").

var all = x.descendants(QName(null, "*"));

Filtering Predicates

var me     = x.person.(name == "sam");
var either = x.person.(@id == 1 || @id == 2);

Predicates can be nested and quite complex:

var me = x..*.(name == "sam" && 
    name.parent().(@id == 1).name() == "person");

They’re not quite as useful as they could be, however. Unlike XPath, E4X expressions cannot easily be used to search ancestor axes.

The previous example illustrates a potential problem. It only works because the list of matches is reduced to one by (name == "sam") before the parent() method is invoked.

This expression, on the other hand, will raise an exception:

x..*.(name.parent().@id == 1);

The filter does not examine the parent of every name in turn; it looks for the single parent of the entire list of names together. It returns undefined unless every member shares the same parent.

Deletion

The delete keyword works on arbitrary E4X expressions:

delete x.person.(@id == 1); // that's me gone 
delete x..person            // ... and everyone else

Assignment

You can also use the normal assignment operator:

x..name[0] = "batman";
x.@pointless = "new attribute!";
x.person += <person id="3"><name>alfred</name></person>

In some circumstances you can also assign to an expression that would return a list:

x.* = <goodbye_previous_content />;

But those nodes were all together, so replacing them at once is a natural operation. This, on the other hand, is illegal:

x.person.@newattributes = "for all";

Iteration

There are several ways to iterate over XMLList and XML objects, though for XML the exercise is meaningless:

x[0] == x;
// true

Nevertheless. First, iteration over list indices:

var i, elm;
for (i in x..*) {
    elm = x..*[i];
}

The same can be accomplished with a for;; loop and the length() method.

for (i=0; i<x.length(); ++i) {
    elm = x[i];
}

Most useful of all, though, is the new for each .. in syntax, allowing direct manipulation of matching nodes:

var elm;
for each (elm in x.person) {
    elm.@id += 1;
}

Namespaces

E4X has robust namespace support, but (as anyone with XML experience must expect) they complicate an otherwise simple model.

var x = <xml>
        <v1>value one</v1>
        <v2>value two</v2>
    </xml>;

x.v1 == "value one";
// true

With namespaces, you have to use a qualified name.

var x = <xml xmlns="http://example.com/">
        <v1>value one</v1>
        <v2>value two</v2>
    </xml>;

x.v1 == undefined;
// true

var example = Namespace("http://example.com/");
x.example::v1 == "value one";
// true

Note the use of the :: scoping operator. You can also suggest a namespace prefix and/or or construct the QName directly:

var example = Namespace("example", "http://example.com/");
var name = QName(example, "v1");
var same = QName("http://example.com/", "v1");

If more liberal matching is required, the * wildcard signifies any namespace.

x.*::v1 == "value one";
// true

The wildcard anyname-namespace is different from the unnamed namespace, and can also be created by passing null to the Namespace constructor. The following are equivalent:

x.*::v1
x.child(QName(null, "v1"));

The default namespace

Using perhaps the most self-explanatory syntax ever devised, you can set the default XML namespace in the current scope.

var example = Namespace("http://example.com/");
default xml namespace = example;
// or
default xml namespace = "http://example.com/";

var x = <xml />;
x.toXMLString();
// <xml xmlns="http://example.com/"/>

To reiterate: in the current scope.

toString() vs. toXMLString()

There is an important difference between the toString and toXMLString methods.

var x = <people>
    <person id="1"><name>sam</name></person>
    <person id="2"><name>elizabeth</name></person>
</people>;

var names = x.person.name;
var name = x.person[0].name;

names.toXMLString();
// <name>sam</name> <name>elizabeth</name>
name.toXMLString();
// <name>sam</name>

names.toString();
// <name>sam</name> <name>elizabeth</name>
name.toString();
// sam

toString returns different values depending on whether or not an object is considered “complex”. If there are no child elements (other types, such as XML comments, don’t count), it returns the element’s text content only. This is very useful in most cases but a painful gotcha in others.

Extending E4X

ECMAScript lets you do wonderful things by extending Object.prototype, String.prototype etc. with new methods.

It’s much harder with E4X. The prototypes of XML and XMLList are read-only, so new methods can’t be added directly. Most of their existing methods throw exceptions if they are applied to any other object. Procedural code will have to do.

Future versions will have built-in support for custom types based on XML schemas.

Global function reference

isXMLName( value ) : bool

Is the value usable as an XML name?

XML Constructor Reference

The XML constructor has several properties managing global settings for XML processing and serialization.

XML.ignoreComments

Ignore XML comments. (Default: true.)

XML.ignoreProcessingInstructions

Ignore XML processing instructions. (Default: true.)

XML.ignoreWhitespace

Ignore whitespace. (Default: true.)

XML.prettyPrinting

Pretty-print XML output with toXMLString() etc. (Default: true.)

XML.prettyIndent

Pretty indent level for child nodes. (Default: 2.)


There are also three methods to more easily apply and restore settings for use, say, within a function.

XML.settings()

Get an Object containing the above settings.

XML.defaultSettings()

Get an object containing the default settings.

XML.setSettings([settings])

Set XML settings from, e.g., an object returned by XML.settings().

XML Object Reference

addNamespace([namespace])

Add a namespace declaration to the object.

appendChild(child)

Append a node to the object’s list of children.

attribute(attributeName)

Returns an XMLList of zero or one matching attributes.

Same as element.@attributeName.

attributes()

Returns an XMLList of attributes.

Same as `element.@*

child(propertyName or index)

Same as element.propertyName or element[index].

childIndex()

Returns the node’s position in the parent’s list of children, or -1 if there is no parent or its children are unordered.

children()

Returns an XMLList of children.

Same as element.*.

comments()

Returns an XMLList of child nodes that are comments.

Same as element.(*.nodeKind() == 'comment').

contains(value)

Same as element == value.

copy()

Return a deep copy of the object, detached from its parent.

descendants(name)

Return all descendants with the given name, or, if name is null or undefined, all descendants.

Same as element..name.

elements([name])

Returns all child elements with the given name, or, if name is null or undefined, all child elements.

Same as element.(*.nodeKind() == 'element').

hasOwnProperty(prop)

The same as on any other object.

hasComplexContent()

Returns true if the node has complex content (in effect, if it has child elements).

hasSimpleContent()

The opposite of hasComplexContent.

inScopeNamespaces()

Returns an Array of in-scope Namespace objects.

insertChildAfter(anchor, child)

insertChildBefore(anchor, child)

Insert a child node before or after the specified anchor node. If the anchor is null, insert before or after no nodes.

If the anchor is not in this XML object, do nothing.

length()

Return the length of the object. For XML objects always return 1.

localName()

Return the local part of the qualified name. (A node’s name not including its namespace.)

name()

Return the qualified name. (Including namespace.)

var x = <xml xmlns="http://example.com/">abc</xml>;
x.name() == "http://example.com/::xml";
x.localName() == "xml";
x.namespace() == "http://example.com/";

namespace([prefix])

Return the in-scope namespace specified by prefix, or:

  • If no namespace matches, return undefined.
  • If prefix is not provided, return the default namespace.

namespaceDeclarations()

Return an Array of Namespace objects representing namespaces declared (as in assigned a prefix) on this XML object.

nodeKind()

Returns the type of XML node, one of attribute, element, comment, processing-instruction, text.

normalize()

Merge adjacent text nodes and remove empty text nodes on this all descendants.

parent()

Return the parent node. On an XMLList, this method returns undefined unless all members share the same parent.

processingInstructions([ name ])

Returns all child processing instructions with the given name, or, if name is null or undefined, all child processing instructions.

Same as element.(*.nodeKind() == 'processing-instruction').

prependChild(value)

Insert value at the beginning of the object’s child nodes.

propertyIsEnumerable(prop)

Will the specified property be enumerated in a for .. in loop? Same as for other objects.

removeNamespace(namespace)

If possible, remove the given namespace from the object and all descendants. removeNamespace will not remove a namespace if it is referenced in that object or any of its children.

replace(propertyName, value)

Replace value specified by propertyName, where propertyName is a name, numeric index or * wildcard, with value.

setChildren(value)

Replace the object’s children with value.

setLocalName(name)

Change the object’s local name using a string or the localName property of a QName object.

setName(name)

Set the object’s name and alter the in-scope namespaces to fit.

setNamespace(ns)

Replace the object’s default namespace with ns.

text()

Returns all child text nodes with the given name, or, if name is null or undefined, all child text nodes.

Same as element.(*.nodeKind() == 'text').

toString()

Returns a string representation. Elements with simple content (i.e., no child elements) are returned as text; complex elements are returned as XML.

toXMLString()

An XML serialization of the object.

valueOf()

Return this object.

XMLList Reference

Most methods are the same. Descendant methods such as children() and text() are simply applied to all members of the list and the results combined. Others, like parent(), don’t work when it isn’t logical that they do so — consult your common sense.

Optional Features

Implementations are allowed to include these optional features, or not. Currently Mozilla seems to be on the “or not” side of the fence, but they’re easy enough to implement in userspace if you need them.

domNode()

Return a W3C DOM node representation of the object.

domNodeList()

Return a W3C DOM NodeList representation.

xpath(exp)

Apply the XPath expression exp and either return an XMLList of results or throw a TypeError.

Pitchfork: Interview: Glen Hansard and Markéta Irglová

 #

It's insane. We honestly have no idea how this all happened. The original marketing plan for Once was to get one 35mm print made, which was going to cost us like four grand. A lot of money. We were going to drive around Ireland in a car, and [writer/director] John [Carney] was going to introduce the film. We figured there were enough Frames fans in Ireland to fill the cinemas. I was going to play a few songs with Mar at the end, and we were going to sell the DVD on the way out. That was huge.

coolest FizzBuzz solution yet

 [via]#

Hiding the hard work in a magic number:

(1..100).map {|i| srand(1781773465) if (i%15)==1; [i, "Fizz", "Buzz", "FizzBuzz"][rand(4)]}

Interior Decorating: The Hitchcock Bathroom

 #

Walls dripping blood, shower curtain complete with silhouette, even a plan for towels embroidered "Bates Motel".

The Last Empire: China’s Pollution Problem Goes Global

 #

In June 2006, an official at China's State Council said environmental damage (everything from crop loss to health care costs) was costing 10 percent of its gross domestic product--in other words, all of the economy's celebrated growth.

Wuxia Masks: On Come Drink With Me and the Beijing Opera

 #

In a 1984 interview with Charles Tesson in Les Cahiers du Cinéma, Hu made a pretty surprising statement when discussing Come Drink With Me:"I didn't want to use real martial arts what we call real kung-fu. I had seen it in tournaments, I didn't find it very beautiful and I didn't understand a thing about it; as a matter of fact, I still don't." The question practically asks itself: how could a man with no interest in martial arts revolutionize martial arts cinema? The fact of the matter is, Hu never saw the martial arts in his films solely as "action"; for him, to have "action" occur on the screen was not enough to make a film an action movie. The kung-fu in Come Drink With Me (and in his later Wuxia films like Dragon Gate Inn and A Touch Of Zen) was never conceived as actual confrontation, but as dance, performance. In fact, the action in the film(s) is choreographed to the performing style of Beijing Opera and the rhythm and beat of its orchestral score (a score mainly performed by traditional instruments from Opera, the wailing flute and the Chinese tempo-drums).

$7,500,000 DreamHost billing error

 #

Almost had a heart attack:

This is just a notice that your DreamHost Account #XXXX ("XXXX's Account") has a balance of $1335.57 (including any charges not due until 2009-01-13), with $1335.57 due (since 2008-12-13).

The Fabrications of “Pre-Code Cinema”

 #

Most people know two things about the Hays Code. One is that the bedrooms of all married couples could contain only twin beds, which had to be at least 27 inches apart. The other is that although the Code was written in 1930, it was not enforced until 1934, and that as a result, the "pre-Code cinema" of the early 1930s violated its rules with impunity in a series of "wildly unconventional films" that were "more unbridled, salacious, subversive, and just plain bizarre" than in any other period of Hollywood's history.

Neither of these things is true.