Friday, 10 September 2010

The Slutsky vanishes - Google Instant has a smutty mind

At the Google Instant Launch on Wednesday, I ran into my former colleague, the writer and internet famous video star Irina Slutsky. We sat together, and so naturally when we were trying out Google Instant during the launch, I tried typing her name in. And an odd thing happened - Google whited out the Instant search results.

you better recognize

Irina asked about this, and Johanna Wright of Google replied that they white out some words related to sex and hate speech, in case inappropriate results appeared for people who weren't expecting it. 'Slut' is one of these words, but it is not clear at all why 'Slutsky' is.

I already wrote about my concerns that Google's predictive words could narrow the range of searched-for terms into clichés - that as you type Google, in Stoppard's words, is

announcing every stale revelation of the newly enlightened, like stout Cortez coming upon the Pacific — war is profits, politicians are puppets, Parliament is a farce, justice is a fraud, property is theft… It’s all here: the Stock Exchange, the arms dealers, the press barons… You can’t fool Brodie — patriotism is propaganda, religion is a con trick, royalty is an anachronism… Pages and pages of it. It’s like being run over very slowly by a travelling freak show of favourite simpletons, the India rubber pedagogue, the midget intellectual, the human panacea…

At least these suggestions are based on integrating over the text of the web; the words that get the silent whiteout treatment seem to have been chosen by a committee though, and clearly an American one at that, as it whites out 'ass' but not 'arse, 'shit' but not 'shite', 'slut' but not 'slag' and so on (I didn't type every smutty British slang word in, life is too short).

However, the modern-day Bowdlers at Google don't white you out based on what you type, but on what they predict you're going to type.

If I type 'blue-footed' - it predicts I'm typing 'blue-foooted booby' and as 'boobies' is an Official Google Smutty Word, my search goes white (in fact 'blue-foo' is enough).

Similarly, typing 'turn again d' implies 'turn again Dick Whittington', and 'dick' is a an Official Google Smutty Word.

The same is true for Irina -so shocking is her last name that all you have to type is 'irina sl' and the Google whiteout erases her from results.

Weirdly, if you type 'who killed cock' it is completed to 'who killed cock-robin' with a hyphen inserted, which implies someone has edited the auto-complete list manually.

My worldview and sense of appropriateness is probably close enough to Google's committee that I'm not going to be too bothered by this, but I do wonder about them deciding what the norms of speech are for everyone in the world.

Tuesday, 7 September 2010

If Google predicts your future, will it be a cliché?


I wonder if Michael Frayn saw the launch of Google Scribe today, and smiled to himself. In 1965, Frayn wrote a book The Tin Men, which featured a mechanism that wrote newspaper articles by joining together clichéd phrases through a small number of rules.
There's an explanatory extract from it in this discussion of why you should avoid clichés when writing Poetry.
He opened the filing cabinet and picked out the first card in the set. Traditionally, it read. Now there was a random choice between cards reading coronations, engagements, funerals, weddings, comings of age, births, deaths, or the churching of women. The day before he had picked funerals, and been directed on to a card reading with simple perfection are occasions for mourning. Today he closed his eyes, drew weddings, and was signposted on to are occasions for rejoicing.
The wedding of X and Y followed in logical sequence, and brought him a choice between is no exception and is a case in point. Either way there followed indeed. Indeed, whichever occasion one had started off with, whether coronations, deaths, or births, Goldwasser saw with intense mathematical pleasure, one now reached this same elegant bottleneck. He paused on indeed, then drew in quick succession it is a particularly happy occasion, rarely, and can there have been a more popular young couple.From the next selection, Goldwasser drew X has won himself/ herself a special place in the nation’s affections, which forced him to go on to and the British people have clearly taken Y to their hearts already.
Goldwasser was surprised, and a little disturbed, to realise that the word “fitting” had still not come up. But he drew it with the next card — it is especially fitting that.
This gave him the bride/bridegroom should be, and an open choice between of such a noble and illustrious line, a commoner in these democratic times, from a nation with which this country has long enjoyed a particularly close and cordial relationship, and from a nation with which this country’s relations have not in the past been always happy.
Feeling that he had done particularly well with “fitting” last time, Goldwasser now deliberately selected it again. It is also fitting that, read the card, to be quickly followed by we should remember, and X and Y are not merely symbols — they are a lively young man and a very lovely young woman.Goldwasser shut his eyes to draw the next card. It turned out to read In these days when he pondered whether to select it is fashionable to scoff at the traditional morality of marriage and family life or it is no longer fashionable to scoff at the traditional morality of marriage and family life. The latter had more of the form’s authentic baroque splendour, he decided.
George Orwell, in Politics and the English Language, described this way of writing:

As I have tried to show, modern writing at its worst does not consist in picking out words for the sake of their meaning and inventing images in order to make the meaning clearer. It consists in gumming together long strips of words which have already been set in order by someone else, and making the results presentable by sheer humbug. The attraction of this way of writing is that it is easy. It is easier—even quicker, once you have the habit—to say “In my opinion it is not an unjustifiable assumption that” than to say “I think”. If you use ready-made phrases, you not only don't have to hunt about for the words; you also don't have to bother with the rhythms of your sentences since these phrases are generally so arranged as to be more or less euphonious. When you are composing in a hurry—when you are dictating to a stenographer, for instance, or making a public speech—it is natural to fall into a pretentious, Latinized style. Tags like “a consideration which we should do well to bear in mind” or ”a conclusion to which all of us would readily assent” will save many a sentence from coming down with a bump.
Clearly, Google Scribe has been trained on the vast corpus of English language text that is also used for Google Translate to come up with plausible sentence fragments. Equally clearly, that means it is bound to be plucking phrases that have been written before out of the web for you, and favouring those that have been said most often. It won't come up with a crisp, resoundingly clear phrase for you, unless it has already been said many times before.
Orwellian prediction

The most likely words to follow “clocks were” now, according to Google, are “striking thirteen”. I hope Orwell would appreciate the irony.
Now, this is amusing in itself, but it is also indicative of a wider problem. If you've done much web searching for, say, home maintenance tips, you'll see a lot of prose that has either been written by a machine of this type, or by poorly paid human writers who use a very similar compositional process. We have a kind of mutated Turing Test going on all around us, where robotic writers are trying to convince robotic readers that they are human, and their stilted prose is worth presenting to the real people searching. Of course, the robots are searching too, to get the source material that is fed into their word mills to create this shambling facsimile of human prose.
It may be impressive that computers can now write bad prose like so many people do, but I do wonder about Eric Schmidt's grand vision of Google predicting what we will want to do before we think of it ourselves. Will it in fact be what we wanted, or will it be a mishmash of expected behaviours, that we'll regret on our deathbeds?
1. I wish I'd had the courage to live a life true to myself, not the life others expected of me.
This was the most common regret of all. When people realise that their life is almost over and look back clearly on it, it is easy to see how many dreams have gone unfulfilled.
A scene in Tom Stoppard's The Real Thing sums this up well:
He’s a lout with language. I can’t help somebody who thinks, or thinks he thinks, that editing a newspaper is censorship, or that throwing bricks is a demonstration while building tower blocks is social violence, or that unpalatable statement is provocation while disrupting the speaker is the exercise of free speech… Words don’t deserve that kind of malarkey. They’re innocent, neutral, precise, standing for this, describing that, meaning the other, so if you look after them you can build bridges across incomprehension and chaos. But when they get their corners knocked off, they’re no good any more, and Brodie knocks corners off without knowing he’s doing it. So everything he writes is jerry-built. It’s rubbish. An intelligent child could push it over. I don’t think writers are sacred, but words are. They deserve respect. If you get the right ones in the right order, you can nudge the world a little or make a poem which children will speak for you when you’re dead.
People are used to typing questions into a box on Google and getting a machine's suggestions. Increasingly though, they're typing emotions into a box on Twitter or Facebook, and getting a human response instead.


Thursday, 2 September 2010

Welcome Apple, seriously

Yesterday's update of iTunes added Ping, a music-focused social network. When I tried it out early in the evening, it had Facebook Connect enabled, and both imported friends from Facebook, and notified me when new ones joined. Shortly afterwards, Mark Zuckerberg joined, and shortly after that the Facebook connection was missing.
This morning, neither company is talking on the record, though Kara Swisher reports that Steve Jobs complained about 'onerous terms' from Facebook.

Supernova This naturally reminds me of the problems we had with Google Friend Connect, where Facebook's accusation of a ToS violation was never backed up by an explanation of what would not violate the terms, leading to the "Data Roach Motel" accusations at Supernova. The underlying issue is whether you should give another company veto power over your application. Last time I wrote on this, it was Apple's veto I was warning about, though at the same time Apple was trying to avoid giving Adobe veto power over their platform again.

The thing is, we have been round this cycle before, and the answer is known too - the way to interoperate with another company without having to have a business agreement with them is to use open standards, not proprietary APIs.

Apple knows this - they have helped lead development of HTML5 and WebKit, along with many other standards in the past, including podcasting and MPEG4. Facebook knows this too, and they have been strong supporters of OAuth and Activity Streams, and even of Portable Contacts, when it's them doing the importing.

Clearly it good for us as users to be able to delegate our contact lists to an existing source - this weeks launch of conference sharing site Lanyrd shows that. It's also in our interests to be able to propagate the actions of playing, liking and purchasing music, videos and anything else between sites of our choosing, so that we can share with our friends, and so we can get more useful recommendations for the future (at minimum, not suggesting things we already have).

This was the core of the discussion at the VRM Workshop last week in Boston - that we should control over who sees what about us, and I think that with these common standards we can solve both problems - the individuals get to save having to re-enter their information everywhere, and control what flows to where, and the companies get the ability to interoperate without bizdev and single source lock-in. Activity Streams (and the associated standards they build on) are our best hope for this.