Resource and representation; or, miscellaneous things learned while reading W3C mailing list archives, part one

Roy Fielding:

At no time whatsoever is the resource transferred across the network when doing a GET. Only a REPRESENTATION of that resource is transferred, and the fragment refers to a target within the representation and not within the resource. That is why fragments are media-type specific.

Files

Practically speaking, 99% of experience with URIs is going to be with http URLs on the World Wide Web. It’s from using websites, developing websites, server administration. For me, at least, it was a little of this and a little of that, only learning just enough to be getting along with the task at hand. That task never involved writing a HTTP server, so there were definitely no specifications involved. (Who reads standards anyway?)

That limited experience encouraged a file-centric view of the Web. It’s natural enough, given that most Web server software automagically provides a 1:1 mapping between files in a directory and publicly-accessible URLs:

  • /~sam/public_html/index.html = http://example.com/index.html
  • C:\apache\htdocs\pages\giraffe.jpg = http://example.com/pages/giraffe.jpg

Ergo, resources are files/documents, and it’s the resource itself — at least a bitwise copy of it — sent down the wire in response to requests. The server gets a request for index.html, and that’s what it sends back: you can tell it’s the same file; just “view source”.

Fake files

Moving from hand-coded HTML to dynamic content ought to have fixed this misconception, but it made things worse. Instead of this:

http://example.com/page/giraffe.html

I had this:

http://example.com/blog/01/01/giraffe

Instead of a HTML file, I had a text document about giraffes, stored in a database, dynamically converted into HTML and interpolated into a template written in yet another language, sent as bits and bytes to a remote client, which rendered it — styled by another language again — into something pretty and human-readable. Which of those was the “resource”?

I thought of it as faking files. It was conning browsers into thinking they’d been given real files when, in fact, it was magic fairy gold. I thought it was a clever trick. A quick Google search turns up plenty of pages with titles like “fake files/directories using mod_rewrite“, so I’m clearly not alone.

There are other reasons to take this view, too, like the insistence on calling a HTTP 404 status code a “File Not Found” error. (It’s just “Not Found”.)

Resources

It wasn’t “faking files”, because a URI doesn’t identify a file. Not only in practical terms, because the server software can generate its output however it pleases, but theoretically. It’s not a “Uniform File Identifier”: a URI identifies a resource.

RFC 3986: Uniform Resource Identifier (URI): Generic Syntax:

A Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resource. […]

This specification does not limit the scope of what might be a resource; rather, the term “resource” is used in a general sense for whatever might be identified by a URI.

That sounds frighteningly ambiguous, but the generality is liberating. Even if you don’t want to use a URI in ways that aren’t intuitive — what does it mean for one to identify a “physical resource”? — it’s obvious that an abstract resource is more flexible than a file. A resource can be anything. (A file is a resource, because “any information that can be named can be a resource“; it’s just a limited one.)

Conceptually, it’s a whole different ballgame.

Negotiation

To return to the quote that began this entry: performing a HTTP GET request does not transfer a resource. It transfers a representation of that resource. What makes this interesting is the explicit permission to have more than one representation of the same thing.

  • A multilingual site will have pages available in more than one language.
  • Some clients prefer different formats. An essay could be presented in plain text, HTML, Atom, PDF, streaming audio, interpretive dance video…

The only necessary thing is that the essential characteristics of the resource are present in every representation. Since a URI can identify just about anything, that means it’s up to you to decide what those characteristics are.

My favourite example is a URI that identifies a circle. It being a simple matter of geometry, there are dozens of possible representations which provide enough information to construct the same circle:

  1. The radius, plus coordinates of the centre.
  2. Three or more points on the circumference.
  3. Three points making up an equilateral triangle, with the qualification that the circle meets the midpoint of each side.

And so on and so on and so on. Each of these can be presented in many different formats:

  1. Plain text, XML, or something else vaguely human-readable format.
  2. A vector graphics format, such as SVG.
  3. Spoken aloud and recorded as an MP3.

They share the same essential characteristics — the data required to reconstruct the circle. That’s it. Not all clients can handle all formats, of course: my math is extremely rusty, so calculus is right out; and I’m not fluent in binary. In HTTP, the solution is that clients send an Accept header specifying which formats are acceptable. The server gives them what they want, or, if that’s not possible, replies with “406 Not Acceptable”.

There are good reasons not to use content negotiation: it can be confusing to users, it’s more expensive to implement, causes errors in some situations, and adds complexity that’s probably unnecessary. More to the point, it’s likely that most will want the same format anyway, so why bother?

On the other hand, it opens up a lot of interesting possibilities, and in some contexts the costs are minimized. It can be useful if diversity of formats is already going to be supported, for example, such as when creating an interface for a web service. Instead of different URIs for data in XML or JSON, the preferred type can be specified in the Accept header.

Methods

There’s an aptly-titled section in Fielding’s dissertation called “Manipulating Shadows“:

Defining resource such that a URI identifies a concept rather than a document leaves us with another question: how does a user access, manipulate, or transfer a concept such that they can get something useful when a hypertext link is selected? REST answers that question by defining the things that are manipulated to be representations of the identified resource, rather than the resource itself. An origin server maintains a mapping from resource identifiers to the set of representations corresponding to each resource. A resource is therefore manipulated by transferring representations through the generic interface defined by the resource identifier.

How can a resource be “manipulated by transferring representations”?

The circle example from the previous section is a good illustration. Changing the radius of a circle results in a different circle, either larger or smaller, but that doesn’t mean that a radius and a centre point are the circle. They’re just a representation of it.

On the Web, the representation submitted to the server is most likely to be the body of a HTTP POST request — from a HTML form, for example — which consists of nothing more than specially encoded key-value pairs. The server can take that data and construct weblog comments, airline bookings, whatever.

Atom

This explains everything I never understood about the Atom Publishing Protocol. How is it possible to POST an Atom entry to a collection and create a resource that will display in HTML? Especially one that will display alongside significant non-entry data: CSS for presentation, sidebars, navigation etc. Why is the link URL different from the edit-URI and different from the ID URI? Doesn’t the Atom representation identified by the edit-URI better represent the entry than the one located by the link?

The technical side, the mechanics of it, that was always easy. It was the theory that didn’t make sense. It was like “faking files”: a kind of dodgy hack that worked, certainly, but felt dirty, like taking advantage of a weakness in the system. Like a kludge.

If anything, the truth is the exact opposite. Having a separate edit-URI feels like a nod to practicality; there’s nothing stopping a pure implementation using content-negotiation to serve both HTML and Atom from the same address. If there’s a hack there, it’s in mapping files to URLs without thought for what resources those URIs will identify. It’s quick and convenient, and sure, it works. But it’s still a kludge.

(A rose by any other name?)

Further Reading

   
This entry was posted on Wednesday, November 8th, 2006, in the categories “web”, “REST” and “HTTP”.

Leave a Reply