Monday, 3 October 2005

Resilient File Formats

On the question of which file formats succeed, the answer is those that are resilient. The ones that provide a method for expansion, and a way for multiple versions to coexist safely.Backwards compatibility is a necessary part of this, but it is not sufficient - forwards compatibility is what wins out.
I see 3 big generations of file format here:
  1. RFC 822 style (ASCII key:value, as in Mail headers and HTTP headers)

  2. IFF style (keyed binary blobs with length offsets) (IFF, AIFF, TIFF, QuickTime, WAV, AVI, MPEG4)

  3. SGML style (ASCII <tag> </tag> model) (SGML, HTML, XML, XHTML)

In each case, these define a way for different versions of the same format to coexist by defining that it is OK to discard elements you don't understand.
This provides baseline compatibility (old parsers generally don't crash on new data, unlike more naive formats), but still requires work to define the sub elements of the format to interoperate.
It provides for graceful degradation, with older or less-featured clients able to display the subset they understand, rather than balking completely.
If you replace an element with a more general one, you may need to continue to include the old version for the previous generation of parsers.
Having worked at Apple on QuickTime for 5 years, and spent 10 years before that tracking it, I've seen that it does take some care to adapt and update in a way that will not break old clients, but the benefits for users of your format are immense (the unofficial motto there was 'no movie left behind'). Of course, if your users are happy, this helps your adoption.
HTML took this from SGML, and in many ways expanded it further due to the toleration of sloppy markup from user-agents, to the point where people writing parsers had a bit of tough time of it.
XML was an over-reaction to this - it instituted draconian parsing by design, and effectively gave the green light for everyone to make up their own format without consideration for others at all (with namespaces as a figleaf to cover this, and coerce coexistence post hoc).
Microformats build on the older model of backward compatibility through selective enhancement. This is a bit more work for the parser and format designer, but much less for those creating data using the format, who can readily pick up the latest version to enhance their existing HTML without harming their other uses.
Working within XHTML does impose constraints on how you can express things, but as Cory Doctorow put it last week:
"It's like this: engineering is all about constraint. Given a span of foo feet and materials of tensile strength of bar, build a bridge that doesn't go all fubared. Write a fun video-game for an eight-bit console that'll fit in 32K. Build the fastest airplane, or the one with the largest carrying capacity... But these days, there's not much traditional constraint. I've got the engineer's most dangerous luxury: plenty. All the computational cycles I'll ever need. Easy and rapid prototyping. Precision tools.


Working with constraints is what makes for good Art, and good Engineering, whether the constraints are cultural or structural.
Without shared meaning there can be no communication. Microformats work to converge shared meaning without disrupting other uses, and to enhance rather than replace what you are doing already.


Technorati Tags: , ,

No comments:

Post a Comment