I’ve always been curious about the information architecture behind search tools – infrastrucure and alogorithms. As I am not a mathematician or a programmer, some answers can become too complex.
However, I have found a couple of gems about Google. The first takes us back in time to Stanford University and two eager students, Sergey Brin and Larry Page, working on a large-scale prototype search engine. The Anatomy of a Large-Scale Hypertextual Web Search Engine introduced some key ideas that we are now familiar with – but which were revolutionary and which underpin the force of Google today.
Google buys, rather than leases, computer equipment for maximum control over its infrastructure. Google chief executive officer Eric Schmidt defended that strategy in a May 31 call with financial analysts. “We believe we get tremendous competitive advantage by essentially building our own infrastructures,” he said.
Google does more than simply buy lots of PC-class servers and stuff them in racks, Schmidt said: “We’re really building what we think of internally as supercomputers.”
Previous search engines had not analyzed links in the systematic way that Google did – all part of the original ideas of the two young researchers. If you’d like more answers to your question, How Does a Google Query Work, provides a few clues.
But Niall Kennedy, a web technologist has also come to my rescue with his post on Google phrase analysis where he explains that a few more details about Google’s possible analysis of page text is available from a recently published patent application by Googler Anna Patterson from June 2006. The application details how a search engine like Google might analyze text phrases, date-based topics, and associate a web page with related topics, even if the specific topic does not appear in the document itself. The 22-page document further emphasizes Google’s current work on “shingle” analysis to discover important phrases and concepts.
He provides a neat Google search diagram for muggles like me!