Thursday 27 January 2005

Links and Votes

The rel="nofollow" has caused a lot of impassioned debate, particularly regarding using it to link to things you disagree with or disapprove of.

I like this, as I am in favour of people adding more user-created metadata to their pages. It is a defence against a publicity attack — saying something outrageous to get a reaction — and that is why I prefer the Vote Link categories of vote-for, vote-against and vote-abstain to the explicitly search-engine directed nofollow.

Google's PageRank explanation has long said:
PageRank interprets a link from Page A to Page B as a vote for Page B by Page A. PageRank then assesses a page's importance by the number of votes it receives.


I just think we should be able to link without voting, or link and vote against too.

Tuesday 25 January 2005

Tagging is for posts

Over the past few days I've seen a lot of reaction to Technorati's new feature. I hope to deal with the wider social networking, spam and ontological issues at a future date, but I'd like to have another go at clarifying the details of rel="tag" again, as if Tim Bray, Roland Tanglao and Kevin Burton are all puzzled by it, we're obviously not conveying the idea clearly enough.

It is not for tagging a URL.

The new rel="tag" syntax we have proposed and adopted is attempting to solve a particular problem — how to tag web pages or blog posts.

It seem that there is demand for a general decentralised syntax for tagging URLs, and that is certainly something to think about, but this is not meant for that.

Tags do not require Technorati-proprietary URLs.

The tag help pages on the Technorati site can be read in that way, but they are focused around linking to the pages themselves; the underlying specification is clear that the url is to a 'tagspace' - a place that collates or defines tags.
I think this is an elegant way for the '' discussion to be resolved - if there are colliding meanings for a tag, you can link to a disambiguating , just as Tim describes.

Tags are not invisible metadata.

They are meant to be visible links on your page and in your posts. The 'this is just like meta keywords and will suffer the same death' argument misses this point.

Part of the Google insight was that visible data is more trustworthy than invisible - the links that people make visible on their pages are more part of their work than those they hide away for robots alone. Of course it doesn't make them immune to gaming, but it does make the gaming rather more obvious to human readers, especially to authors, who may not always be aware what invisible metadata is being generated on their behalf.

Thus, the <link rel="tag" href="..." /> syntax is not read by the Technorati spider.

This point is also behind some of the problems people have had with having their links picked up by the Technorati spiders - they sequester the tags away between posts, and omit them from their feeds. We're working hard on improving our tag detection (and many thanks to people who have spent time going back and forth on this with us, especially Stephanie), but we do expect them to be part of the post body.

I hope this makes more sense now; we'll continue updating the formal spec and discussion pages in the light of this feedback.

Why not sell News?

Dan says:
One of these days, a newspaper currently charging a premium for access to its article archives will do something bold: It will open the archives to the public -- free of charge but with keyword-based advertising at the margins.


Why can't newspapers give away the Olds and sell the News?
Online subscription for fresh news, free access to day-old archives, when the paper version is fishwrap.

Thursday 20 January 2005

Welcome, Google. Seriously.

The rel="nofollow" initiative spearheaded by Matt and Jason at Google is a welcome step in helping webpage authors give semantic hints to search engines.
I first wrote about this in March 2003, suggesting a way to link while expressing endorsement, neutrality or approval, and cited some previous suggestions there. This became the Vote Links specification later, which has been supported by Technorati for a while, and it is great to see Google, Yahoo and MSN picking up on this idea.

Since Wednesday morning, the Technorati spiders have been respecting the rel="" attribute on links, along with rel="" and rel="vote-against", excluding these links from Authority counts, and Cosmos searches (we've seen just over a thousand nofollows so far).

I've taken the liberty of drafting a more formal specification for the nofollow rel attribute, and I trust we can submit this to a suitable standards body in future.

I'd also like to second Dave's call for a Spam Squashing Summit, as there are many other sources of in blogs and elsewhere on the web.

Tuesday 18 January 2005

Tag spec published

Following on from the last post, we have published a formal specification for the rel="tag" HTML extension, and it will be submitted to GMPG.

Friday 14 January 2005

Clarifying tags

Dave said:
tell us how you support open protocols and we'll figure out how to plug you in. 

We use the existing HTML standard rel attribute for link relationships and add a new link type "tag".

Niall said:
Adding rel="tag" to any link should be enough to build a tag library for links off the link text. Technorati instead grabs the last part of the URL after the "/" and treats it as a post tag. I was hoping for a decentralized del.icio.us implementation.

This is a misunderstanding of what rel on a link means. It defines a relationship between the page containing the link and the linked page. We have defined a new value "tag" which means 'treat the linked page as a category tag'. It is the link itself that has meaning here, not the link text you use.

If you use the pages at http://technorati.com/tag/ as your destination, we know what this means. You could also use another domain that has tag-specific urls, like flickr, del.icio.us, or wikipedia. The assumption the technorati spider makes is that the last path component of the url is the tag, so http://apple.com/ipod would even work.

We think that linking to our tag collator is a useful thing to do here, but they are your links and you can do as you wish.

This was never meant to be a replacement for delicious, but a complement to it. del.icio.us is a great way to label links with tags. The new rel="tag" enables you to label your own blog posts with tags.

Thursday 13 January 2005

Can I have an inclusive?

The CBS memos case known as 'Rathergate' has been picked over for months in the blog world, so it was a bit of a surprise to me that only now have CBS issued their report.

When told that the memos were fake, Rather said "If the documents are not what we were led to believe, I'd like to break that story." He is thinking of a story he can put EXCLUSIVE on. But whom would he be excluding? Presumably other big media organisations.

More reflective journalists, such as Dan Gillmor, are instead thinking how they can put INCLUSIVE on their stories - they are measuring success by how many people they bring into the conversation, and they recognise it doesn't necessarily start with them.

Tags: ,

Monday 10 January 2005

Silly names department

is designed for putting things in a single place. They work for libraries where there is one copy of a book. are designed to mark things you want later, without moving them. But, as if by magic, you can call later and bring them to you.

So how about we call it a 'Tagsonomy' ?

Friday 7 January 2005

Technorati developers contest winners

Congratulations to Joshua Tauber and the other entrants. Go see what they have made and try them out.

Gillmor gets Hayekian

In Distributed Journalism's Future, Dan Gillmor says:

In a posting yesterday about how bloggers helped keep the pressure on U.S. House Republicans to reconsider an ethical issue, I mentioned the way two bloggers convinced average citizens to call their members of Congress and ask how they'd voted on the issue (it was a secret ballot). The inquiring citizens then let one of the bloggers know, and he posted the running results of the tally.

I said this was an example of something I'm calling "distributed journalism." Chris Nolan called today to ask what I meant by this, and here's some of what I told her. (Here's her eWeek story on the subject.)

I think of distributed journalism as somewhat analogous to any project or problem that can be broken up into little pieces, where lots of people can work in parallel on small parts of the bigger question and collectively — and relatively quickly — bring to bear lots of individual knowledge and/or energy to the matter. Some open-source software projects work this way. The important thing is the parallel activity by large numbers of people, in service of something that would be difficult if not impossible for any one or small group of them to do alone, at least in a timely way.

This is a very good summary of Hayek's 'spontaneous order' idea — that individuals acting independently can achieve more precisely because they are working parallel on their own goals. Dan's 'people versus government' subtext here is an interesting aspect of this, reminding me of Jane Jacobs' 'Two moralities' too.

Suppose, for example, that we assemble a nationwide group of volunteers — lawyers who are familiar with statutes — and ask each of them to take a small section of one of those immense congressional bills that the members of Congress don't even read themselves. Suppose, further, that we could get this analysis posted before the House and Senate did their final votes. We might catch a lot of sleazy stuff before it became law. Today we're lucky if we know about any of it before it actually passes.

This is promising, but it is still a bit too top-down and hierarchical — someone in the middle is parcelling out the bills to lawyers to analyse, and somehow has to match each lawyers expertise with a legislative area. There is a better way, and Joshua Tauberer has already built it.
govtrack.us collates all US Government bills into a more readable form than the official sites, and it also collates and adds weblog comments on each bill (see the sidebar on this copyright bill). This way we don't need a central co-ordinator, we just need to encourage lawyers, or indeed other citizens, to review bills that they have expertise or interest in, and blog the results with a link.

Semantics in translation

Finding translations of foreign writing can be tricky.
Tim Oren writes a requirements list:
[...]a small and pragmatic first step towards the inter-language blogosphere I've been writing about recently. Specifically, the idea of an RSS tag or something of the sort that would denote posts saying the same thing in different tongues, and be bait for aggregators and crawlers interested in that information.
This set off my semantic XHTML radar. Surely we can express this with a rel attribute on a link?
A quick rummage finds me existing specification text at w3c:
Alternate
Designates substitute versions for the document in which the link occurs. When used together with the lang attribute, it implies a translated version of the document.

Lets see how this fits in with Tim's list:
Quicky Requirements
  • Need both TRANSLATES and TRANSLATED-BY flavors. Since the former can be spoofed, the latter form embedded in the original doc will have more credibility.
Bidirectional links can affirm an authoritative translation, as in XFN's me attribute. We could perhaps add a original and translation values for rel if we define a new profile.
  • Need Source and Target URLs. Should be able to point at whole docs or tagged spans (posts) within docs. Arbitrary linkage problematic due to limits of good ol' HTML.
The rel does this, with an implicit reference to the document you are reading. If you want subsections a <blockquote cite="..."> could be used.
  • Source and Target languages, in ISO-639.2
The lang does this, in the head of the document you are reading, and as an attribute on the link.
  • Translation type: Manual, Automatic. More flavors?
  • Translation authority: Who or what did it. What existing designators can be coopted?
  • Translation time and date stamp, and perhaps an MD-5 hash of the original. This is a placeholder for the whole versioning can o' worms. If the original is edited or updated, we have a state consistency problem...
These belong as metadata in the translated document, probably as explicit human readable text. The XOXO definition list model might be useful here.
  • Should do something useful in contemporary browsers, shouldn't be relying on having RSS readers/aggregators available in all target languages
Well, exactly. That is the whole point of semantic XHTML. You can deploy all this today. Please do!

Tuesday 4 January 2005

Online maps for cyclists and pedestrians

Here's a nice simple idea for any company that does online route-finding - create a version that finds routes optimised for cyclists or pedestrians.
I realise this will complicate your routing algorithms that seem to be 'head for the nearest freeway as soon as possible', but cyclists and pedestrians have different needs:
  • Prefer roads with cycle paths and pavements (sidewalks)

  • Take steepness into consideration as well as distance (this one is specially true for San Francisco

  • Allow for short-cuts through alleys and going the wrong way up one-way streets

  • Show contour lines or gradient markers

  • For extra credit, link in public transport timetables - especially ones that carry bikes


Go on Jeremy, see if Yahoo maps can do this for us.

A strange view of reality

Chris Anderson passes on this David Foster Wallace quote:
TV is not vulgar and prurient and dumb because the people who compose the audience are vulgar and dumb. Television is the way it is simply because people tend to be extremely similar in their vulgar and prurient and dumb interests and wildly different in their refined and aesthetic and noble interests.

This is partly true, but I think there is something else going on too - the reality TV explosion is more than just chasing ratings through prurience. When I look at most reality TV shows, I am reminded of Greek mythology. The TV network looks down from Olympus, and plucks some ordinary mortal from obscurity, and gets him to do strange things for our amusement, unsure whether he'll see a shower of gold or be chained to a rock for eternity.

I think that the reality TV trend plays to the networks' need to feel powerful, and as they lose the wholesale power of swaying opinion and determining conversation topics, they reach for the power over individuals in a more direct and insidious way.

The notion of 'reality TV' bears some examination - 'reality' apparently consists of obeying arbitrary and complex rules, lying to your family or backstabbing your competitors, twisting the truth to put you in the best possible light in the rare hope of getting a million dollars instead of them. Anyone who has been through the TV commissioning process will see where this model of reality comes from.

Of course, this just suggests a few more possible reality shows:
  • Sisyphus rocks

  • Augean stable hand

  • Pandora's dilemma - "take the money", "open the box"

  • Blind Date with Zeus - what animal will you get?