Falsehoods Programmers Believe About Names


From "People have exactly one canonical full name" to "People have names": names are hard.

Broken Promises: Responding to the Surprising Failure of Anonymization


Supposedly "anonymous" datasets have a history of revealing far more personal information (including identifying details and details which could reasonably be considered private, such as sexuality) than intended.

Digg's "your friend has Dugg this" dataset


Brave new non-relational world:

For this feature, the fully denormalized Cassandra dataset weighs in at 3 terabytes and 76 billion columns.



I was going to express a fond wish that was a good first step towards the kind of free access that could give us an Australian TheyWorkForYou -- then discovered that it's existed since June 2008.


Downloadable Australian government datasets under open licenses!

Freebase: "an open, shared database of the world's knowledge"


Most of the first-generation mashups have been limited to the data freely available from governments, sites with APIs etc.; I can't wait to see what we'll get now that they can use Freebase. I can't wait to use it myself, for that matter. Best of all, those magical letters: "CC-BY".

Piggy Bank - turn your browser into a mashup platform


Piggy Bank plugins are site-specific screen-scrapers that extract structured data as RDF, which can be analyzed, sorted and connected at your leisure, or shared through a "semantic bank".

