I woke up a bit early this morning wondering how to describe a petabyte. I can easily count to a hundred and do math in that range. In the thousands and above, I resort to scientific notation. A petabyte is 1,000,000,000,000,000 bytes, or 10^15 bytes. How do I convey that incredible size in a comprehensible way?
Well, most of us nowdays have a gigabyte of memory in our laptops. A thousand laptops is a terabyte and a million laptops is a petabyte. What if you could make a million laptops all work together on the same problem? What could they do? What would you want them to do?
There are about two hundred billion stars in our Milky Way Galaxy, or 2x10^11 stars. Five thousand Milky Way galaxies would contain 10^15 stars, a petastar.
If you started typing and typed a petabyte of data it would show on the screen as a very long string. On the screen, how long would it be? Well, if you typed 5 characters per centimeter, then 10^15 characters would be 2x10^14 centimeters long, two billion kilometers.
Ok, thats still pretty incomprehensible. Let's see how long the beam of a flashlight takes to go from one end to the other. Light travels at 3x10^10 cm/sec or 6,666 seconds to traverse the petabyte string, almost two hours.
This is still hard to grasp. Consider instead getting on a jet and flying at 1000 km/hr over the string. The trip would take 2x10^6 hours, about 220 years. Better fly first class.
Of course, you could not type a petabyte yourself in your lifetime, nor could you and all of your friends. But the Web is perhaps a tenth of a petabyte or so right now and is still growing really fast. Lots of people are typing at the same time and computers are helping them. With Hadoop on a good sized cloud, you can run analytics over that dataset in reasonable time. What kind of questions do you want to ask?