Wednesday, June 09, 2010

Google's New Indexing System Is Fully Caffeinated [Google]

Source: http://gizmodo.com/5559015/googles-new-indexing-system-is-fully-caffeinated

Google's New Indexing System Is Fully CaffeinatedGoogle's latest web indexing system, the tool that pre-scans the entire web to have a ready answer to your search query, promises "50 percent fresher results for web searches." It's called Caffeine. And it comes with staggering Google search stats.

The main difference with Caffeine is that, rather than search one entire group of sites (represented in that lead graphic as a layer), then another, less prioritized group of sites, then yet another less prioritized group of sites, everything with the Caffeine algorithm is pretty much indexed constantly. Teased for several months now, Caffeine is the sort of update Google needs to follow the pace of searching services like Twitter. And indeed, Google will need to maintain/continue such innovations to keep up—our world is translated from analog to digital in more, quicker ways every day.

So now for those wicked Google stats:

• Every second Caffeine processes hundreds of thousands of pages in parallel.
• If this were a pile of paper it would grow three miles taller every second
• Caffeine takes up nearly 100 million gigabytes of storage in one database
• Caffeine adds new information at a rate of hundreds of thousands of gigabytes per day.
• You would need 625,000 of the largest iPods to store that much information
• If these iPods were stacked end-to-end they would go for more than 40 miles.

[Google]