I went to a presentation by a Google engineer, Narayanan "Shiva" Shivakumar, tonight at the BC Cancer Agency in Vancouver. It was organized by HPC Vancouver. He talked about some of the backend things that power Google, such as the hardware itself, GFS (Google File System), MapReduce (an abstract parallelization interface for running algorithms on large data sets) (see also MapReduce at Wikipedia), and BigTable (a massive hash table, their own custom database).
He showed some pictures illustrating the growth in Google's computing power GFS form its days in a Stanford lab to the present in massive data centres. The GFS is amazing, accessing redundant 64MB chunks on file servers with everything automatically redundant-ized and load-balanced automatically (and the amount of redundancy is variable depending on what the data is). I thought MapReduce was pretty cool. Reminded me of the map and reduce functions in Python.
Near the beginning of his talk he mentioned that they needed a lot of computing power to compute page rank and mentioned something about constructing a large graph where each node is a URL or HTTP page and each edge represents a link (I assume the graph is directed?). He did not say anything more about that problem though...
Update (2006-07-29): I think I may have gotten food poisoning from the refreshments at this event. If anyone else good food poisoning please let me know. I am trying to figure out if I got infected at this event or at lunch that day.
Add new comment