Tuesday 27 May 2008

Miasma theory - wrong in the 1840s, wrong now

A couple of years ago I wrote:
My generation draws the Internet as a cloud that connects everyone; the younger generation experiences it as oxygen that supports their digital lives. The old generation sees this as a poisonous gas that has leaked out of their pipes, and they want to seal it up again.

Bill Thompson and Nick Carr are worried about governments interfering too:

In the real world national borders, commercial rivalries and political imperatives all come into play, turning the cloud into a miasma as heavy with menace as the fog over the Grimpen Mire that concealed the Hound of the Baskervilles in Arthur Conan Doyle's story.

Except, if you have read or listened to Steven Johnson's excellent The Ghost Map, you'll know that the miasma theory of disease was a fatal error for urban England in the 1840s - the real problem was not the bad smells in the air, but the diseases in the water. The fault, dear governments, lies not in our clouds but in your pipes.

Monday 26 May 2008

An API is a bespoke suit, a standard is a t-shirt

Brad is calling for APIs, and even the NYT is proposing one, but there is a problem with APIs that goes beyond Dave's concern about availability.

When a site designs an API, what they usually do is take their internal data model and expose every nook and cranny in it in great detail. Obviously, this fits their view of the world, or they wouldn't have built it that way, so they want to share this with everyone. In one way this is like the form-fitting lycra that weekend cyclists are so enamoured of, but working with such APIs is like being a bespoke tailor - you have to measure them carefully, and cut your code exactly right to fit in with their shapes, and the effort is the same for every site you have to deal with (you get more skilled at it over time, but it is a craft nonetheless).

Conversely, when a site adopts a standard format for expressing their data, or how to interact with it, you can put your code together once, try it out on some conformance tests, and be sure it will work across a wide range of different sites - it's like designing a t-shirt for threadless instead.

Putting together such standards, like HTML5, OpenID, OAuth or OpenSocial or, for Dave's example of reviews, hReview, takes more thought and reflection than just replicating your own internal data structures, but the payoff is that implementations can interoperate without knowing of each others' existence, let alone having to have a business relationship.

I had this experience at work recently, when the developers of the Korean Social network idtail visited. I was expecting to talk to them about implementing OpenSocial on their site, but they said they had already implemented an OpenSocial container and apps using OpenID login, and built their own developer site for Korean OpenSocial developers from reading the specification docs.

I'm looking forward to more 'aha' moments like that this week at I/O.

Tuesday 6 May 2008

Portable Apps, not data?

Brad Templeton has a post on Data Hosting not Data Portability that fits in neatly with the VRM proposal I discussed yesterday. In fact, what he describes is a great fit for OpenSocial.

He says:

Your data host’s job is to perform actions on your data. Rather than giving copies of your data out to a thousand companies (the Facebook and Data Portability approach) you host the data and perform actions on it, programmed by those companies who are developing useful social applications.

Which is exactly what an OpenSocial container does - mediate access to personal and friend data for 3rd party applications.

This environment has complete access to the data, and can do anything with it that you want to authorize. The developers provide little applets which run on your data host and provide the functionality. Inside the virtual machine is a Capability-based security environment which precisely controls what the applets can see and do with it.

This maps exactly on to Caja, the capability-based Javascript security model that is being used in OpenSocial.

Your database would store your own personal data, and the data your connections have decided to reveal to you. In addition, you would subscribe to a feed of changes from all friends on their data. This allows applications that just run on your immediate social network to run entirely in the data hosting server.

Again, a good match for OpenSocial's Activity Streams (and don't forget persistent app data on the server).

Currently, everybody is copying your data, just as a matter of course. That’s the default. They would have to work very hard not to keep a copy. In the data hosting model, they would have to work extra hard, and maliciously, and in violation of contract, to make a copy of your data. Changing it from implicit to overt act can make all the difference.

The situation is worse than that; asking people for their logins to other sites is widespread and dangerous. I'd hope Brad would support OAuth as a step along the way to his more secure model - especially combined with the REST APIs that are part of OpenSocial 0.8

If you're interested in these aspects of OpenSocial, do join in the linked mailing lists, and come along to the OpenSocial Summit on May 14th (just down the road from IIW).

Monday 5 May 2008

Mixing degrees of publicness in HTTP

At the Data Sharing Workshop the other day, we had a discussion about how to combine OAuth and Feeds, which I was reminded of by Tim Bray's discussion of Adriana and Alec's VRM proposal today.
The session was tersely summarized here, but let me recap the problem.

When you are browsing the web, you often encounter pages that show different things depending on who you are, such as blog, wikis, webmail or even banking sites. They do this by getting you to log in, and then using a client-side cookie to save you the bother of doing that every time. When you want to give a site access to another one's data (for example when letting Flickr check your Google Contacts for friends), you need to give it a URL to look things up at.

The easy case is public data - then the site can just fetch it, or use a service that caches public data from several places, like the Social Graph API. This is like a normal webpage, which is the same for everyone, returning a HTTP 200 response with the data.

The other common case is where the data is private. OAuth is a great way for you to delegate access to a web service for someone else, which is done by returning an HTTP 401 response with a WWW-Authenticate: OAuth header showing that authentication is needed. If the fetching site sends a valid Authorization header, it can have access to the data.

The tricky case is where there is useful data that can be returned to anyone with a 200, but additional information could be supplied to a caller with authentication (think of this like the social network case, where friends get to see your home phone number and address, but strangers just get your hometown). In this case, returning a 401 would be incorrect,as there is useful data there.

What struck me was that in this case, the server could return a 200, but include a WWW-Authenticate: OAuth header to indicate that more information is available if you authenticate correctly. This seems the minimal change that could support this duality, and much easier than requiring and signalling separate authenticated and unauthenticated endpoints through a HTML-level discovery model, or, worse, adding a new response to HTTP. What I'd like to know from people with deeper HTTP experience than me is whether this is viable, and is it likely to be benign for existing clients — will they choke on a 200 with a WWW-Authenticate header?

HTTP does have a 203 response meaning Non-Authoritative Data, but I suspect returning that is more likely to have side effects.