Friday, 7 January 2005

Semantics in translation

Finding translations of foreign writing can be tricky.
Tim Oren writes a requirements list:
[...]a small and pragmatic first step towards the inter-language blogosphere I've been writing about recently. Specifically, the idea of an RSS tag or something of the sort that would denote posts saying the same thing in different tongues, and be bait for aggregators and crawlers interested in that information.
This set off my semantic XHTML radar. Surely we can express this with a rel attribute on a link?
A quick rummage finds me existing specification text at w3c:
Alternate
Designates substitute versions for the document in which the link occurs. When used together with the lang attribute, it implies a translated version of the document.

Lets see how this fits in with Tim's list:
Quicky Requirements
  • Need both TRANSLATES and TRANSLATED-BY flavors. Since the former can be spoofed, the latter form embedded in the original doc will have more credibility.
Bidirectional links can affirm an authoritative translation, as in XFN's me attribute. We could perhaps add a original and translation values for rel if we define a new profile.
  • Need Source and Target URLs. Should be able to point at whole docs or tagged spans (posts) within docs. Arbitrary linkage problematic due to limits of good ol' HTML.
The rel does this, with an implicit reference to the document you are reading. If you want subsections a <blockquote cite="..."> could be used.
  • Source and Target languages, in ISO-639.2
The lang does this, in the head of the document you are reading, and as an attribute on the link.
  • Translation type: Manual, Automatic. More flavors?
  • Translation authority: Who or what did it. What existing designators can be coopted?
  • Translation time and date stamp, and perhaps an MD-5 hash of the original. This is a placeholder for the whole versioning can o' worms. If the original is edited or updated, we have a state consistency problem...
These belong as metadata in the translated document, probably as explicit human readable text. The XOXO definition list model might be useful here.
  • Should do something useful in contemporary browsers, shouldn't be relying on having RSS readers/aggregators available in all target languages
Well, exactly. That is the whole point of semantic XHTML. You can deploy all this today. Please do!

No comments:

Post a Comment