Wednesday, 23 April 2014

Fragmentions for Poets

Since the original fragmention implementation and discussion, we've been trying it out in various contexts, including any wordpress blog, Shakespeare's complete works and even the indiewebcamp wiki. Also there have been a lot of reactions and suggestions. Some of these occur often enough that I thought I'd write some responses down.

What if the linked-to text changes?

Several people pointed to the New York Times Emphasis project, which builds IDs from initial letters of sentences in a paragraph to provide some degree of resilience against the linked-to text being changed. It also tries using edit distances if it can't find the text.

Whether you still want to link to changed text is a tricky problem - if it has completely been removed, then the annotation or point of linking may have gone (pointing out a typo, or misstatement). Even a small change (adding the word 'not' for example) can mean that the point of linking has changed, so my first thought is that changing the text breaking the link can be reasonable.

If you want some fuzzy matching to go on, having more of the linked-to text in the fragment can only help the linked-to page identify where in the text was intended. Indeed, if enough is included, you could show the difference between what was linked to and what is there now.

What if the linked-to text occurs more than once?

By default, go to the first instance. If you want a different link, use more words to create a unique reference. While there have been proposals to link to the nth occurrence using more complex syntax, I don't think this is actually a natural choice, and likely to be more fragile. The NYT Emphasis tool mentioned above switched from a nth sentence type model to a content dependent one for this reason; fragmentions simplify and extend this idea.

I have tried to come up with a use case that fits this goal - the closest I can think of is referring to a particular repetition of a line of poetry, for example in a villanelle.

I can link to the 3rd line of the 1st verse by citing it in full. If I wanted the final line, linking to it and the line before would work. I think this is clear, though linking to night & Rage would be enough.

The only reason I can think of to link to specific lines would be to discuss them in the context of surrounding lines, so I think this works adequately.

If a tool is made to allow readers to construct a link to a specific phrase, indicating that that phrase is not unique to encourage them to choose a longer phrase may be worth it.

Could you combine an id and a fragmention?

A link of form #id##some+words has been suggested, but again I'm not sure I see the utility. This is the nth occurrence idea in a different guise. It combines two addressing models in one, making it harder to construct and more fragile to resolve.

Can we get rid of ## and just use #?

The other thought, based on closer reading of the HTML5 spec ID attribute:

The id attribute specifies its element's unique identifier (ID). [DOM]

The value must be unique amongst all the IDs in the element's home subtree and must contain at least one character. The value must not contain any space characters.

There are no other restrictions on what form an ID can take; in particular, IDs can consist of just digits, start with a digit, start with an underscore, consist of just punctuation, etc

is that we may not need the ## (which technically makes an invalid URL) at all.

If an HTML5 id cannot contain a space, then a fragment that contains one like #two+words can never match an id (as id="two words" would be invalid). If it can't be an id it should be treated as a fragmention.

If you really want a one-word fragmention a trailing space like #word+ could be used.

This means that the idea of fragmention could be simplified: if a fragment contains a space it MUST be a fragmention, and should be searched for in the text. If it doesn't match any IDs in the page, it COULD be a fragmention and should be searched for in the text anyway.

Fragmentions become a fallback to be used when an id can't be found.

Do not go gentle into that good night

Do not go gentle into that good night,
Old age should burn and rave at close of day;
Rage, rage against the dying of the light.

Though wise men at their end know dark is right,
Because their words had forked no lightning they
Do not go gentle into that good night.

Good men, the last wave by, crying how bright
Their frail deeds might have danced in a green bay,
Rage, rage against the dying of the light.

Wild men who caught and sang the sun in flight,
And learn, too late, they grieved it on its way,
Do not go gentle into that good night.

Grave men, near death, who see with blinding sight
Blind eyes could blaze like meteors and be gay,
Rage, rage against the dying of the light.

And you, my father, there on the sad height,
Curse, bless, me now with your fierce tears, I pray.
Do not go gentle into that good night.
Rage, rage against the dying of the light.

by Dylan Thomas

hear him read it

Originally on my own website

Thursday, 17 April 2014

Fragmentions - linking to any text

A couple of weeks ago, I went to a w3c workshop about annotations on the web. It was an interesting day, hearing from academics, implementers, archivists and publishers about the ways they want to annotate things on the web, in the world, and in libraries. The more I listened, the more I realised that this was what the web is about. Each page that links to another one is an annotation on it.

Tim Berners-Lee's invention of the URL was a brilliant generalisation that means we can refer to anything, anywhere. But it has had a few problems over time. The original "Cool URLs don't change" has given way to Tim's "eventually every URL ends up as a porn site".

Instead of using URLs, Google's huge success means that searching for text can be more robust than linking. If I want to point you to Tom Stoppard's quote from The Real Thing:

I don’t think writers are sacred, but words are. They deserve respect. If you get the right ones in the right order, you can nudge the world a little or make a poem which children will speak for you when you’re dead.

the search link is more resilient than linking to Mark Pilgrim's deleted post about it, which I linked to in 2011.

Another problem is that linking in HTML is defined to address pages as a whole, or fragments within them, but only if the fragments are marked up as an id on an element. I can link to a blog post within a page by using the link:

http://epeus.blogspot.com/2003_02_01_archive.html#post-body-90336631

because the page contains markup:

<div class="post-body entry-content" id="post-body-90336631" >

But to do that I had to go and inspect the HTML and find the id, and make a link specially, by hand.

What if instead we combined these two ideas:

  • use a fragment to identify part of a page
  • mention words in the page as the identifier

I've named these "fragmentions"

To tell these apart from an id link, I suggest using a double hash - ## for the fragment, and then words that identify the text. For example:

http://epeus.blogspot.com/2003_02_01_archive.html##annotate+the+web

means "go to that page and find the words 'annotate the web' and scroll to show them"

If you click the link, you'll see that it works. That's because when I mentioned this idea in the indiewebcamp IRC channel, Jonathan Neal wrote a script to implement this, and I added it to my blog and to kevinmarks.com. You can add it to your site too.

However, we can't get every site to add this script. So, Jonathan also made a Chrome Extension so that these links will work on any site if you're running Chrome. (They degrade safely to linking to the page on other browsers).

So, try it out. Contribute to the discussion on the Indiewebcamp Fragmentions page, or annotate this page by linking to it with a fragmention from your own blog or website.

Maybe we can persuade browser writers that fragmentions should be included everywhere.

Originally posted on kevinmarks.com