The Internet Archive

Intro

Archives are good. Original documents, original materials, well taken care of, can provide invaluable information to researchers who are trying to find out why and how something happened. who are trying to get a perspective on a subject. Our US National Archives and Records Administration (http://www.archives.gov/) is one great example, but almost every governmental organization, every not-for-profit organization, every academic institution, every body that needs to document its history, and many people who have make a profound intellectual or aesthetic impact on our lives need to develop some kind of archive.

But how do you archive the the entire Internet, the entire World Wide Web? How do you archive transient electronic information?

Solving those problems and creating such an archive is the goal of the not-for-profit Internet Archive (http://www.archive.org), which is headquartered in the Presidio in San Francisco. To quote its home page, "The Internet Archive is building a digital library of Internet sites and other cultural artifacts in digital form. Like a paper library, we provide free access to researchers, historians, scholars, and the general public."

So, the organization's Webcrawlers/Webspiders download everything that they can from the Internet. The Archive says it has over one petabyte (1,000,000,000,000,000) of information stored on its disk arrays and is downloading about 20 terabytes each month.

This translates into over 40 billion Web pages in the Archive's database. But, remember, this is a very rough estimation.

Note: The Internet Archive is mirrored at the Bibliotheca Alexandrina (the new Library of Alexandria) in Egypt at: http://www.bibalex.org/English/initiatives/internetarchive/web.htm.

Internet Archive Tools

The Wayback Machine (anybody remember the reference)

Recall (Currently inactive)

Movie Archive

Live Music Archive

Text Archive