Where will Google translate and statistics lead us?

I have been critical of the mighty Google in earlier posts – The dangers of personalisation in search - but I’d like to talk about another side of the giant, one that in my view is much more likely to bring benefits to mankind that the insidiousness, in my view at least, of search personalisation.

Hans Rosling presented a fascinating programme on BBC four (available on the BBC iplayer for those not fortunate enough to live in the UK) about the Joy of Statistics. The way Hans Rosling presented statistical information about world health made me realise that I need to up my game in reporting Google Analytics data to clients, but I digress.

Hans explained that Google translate is a statistical product. I hadn’t given how Google translates an item much thought but it is revolutionary. Earlier attempts at automated translation tools had relied on compiling rules and structures in much the same way as a teacher might teach a person a foreign language. But this approach is set with problems. For every rule and structure there are exceptions, for which rules need to be constructed. So Google with its vast experience in managing huge data resources turned the problem on its head and looked at the problem as a problem in statistical correlation.

It all sounded very reminiscent of latent semantic indexing (LSI) which is at the heart of how Google ‘reads’ a page of content. I don’t feel I need to understand LSI in order to use Google or even optimise a site, in much the same way as I don’t need to understand the internal combustion engine to drive a car. But it does help to have some idea of how Google ‘thinks’ at least in outline, in order to understand what it is looking for and where.

There is no doubt that Google is an awesome thing, and the way I have always tried to get a feel for how Google reads a page is to look at a page of Arabic, about which I know absolutely nothing, and try and imagine working out using statistical correlations with the same material in English, which I do understand. Now I know that is not exactly – or even approximately – the way Google works but I find it is a useful metaphor.

It sounds remarkably like the task that the code breakers of Bletchley Park faced during WW2 and which accelerated the development of computing by Tommy Flowers and Alan Turing and so many others. Or the deciphering of the Rosetta Stone.

But back to Google translate, by adopting a statistical method of machine translation, Google has the potential to translate any language to any other language. At the moment, Google translate is used to translate copy found on the internet. But Google is looking at speech recognition, again using statistical methods and when this becomes reliable, then combined with Google translate, we have the possibility of being able to talk to anybody in the world irrespective of whether or not there is a common language.

Now being able to talk to anyone, anywhere is an awesome thought. I think it should lead towards greater understanding, world peace and harmony but I admit it is such a powerful tool that I don’t think I – or even Google – can envisage where it might lead and with what consequences.


Comments are closed.