Some years ago, I posted an article here about Google Books. At that time, I was very impressed with what Google had done and had high hopes for its future. Sadly, since that time, I have become increasingly disappointed with the quality and availability of the Google Book collections. However, that deficit is much less frustrating for me since I learned about the Internet Archive. The discovery of the Wayback Machine was an added bonus.
Why you want to know about the Internet Archive and the Wayback Machine …
The Internet Archive is older than Google Books, having been founded in 1996, while Google Books was first introduced in 2004. The Internet Archive is also a much more ambitious project, since its stated aim " … is building a digital library of Internet sites and other cultural artifacts in digital form." This means they archive not only print content, but also audio and visual materials as well as entire web sites. One can think of the Internet Archive as the digital version of the ancient Royal Library of Alexandria, which was the largest library in the ancient world, holding a copy of nearly every book produced during its era. Quite appropriately, the mirror site for the Internet Archive is the Bibliotheca Alexandrina.
Though the Internet Archive is working hard to preserve film, music and electronic records of all kinds, my focus here is its massive Text collection. They have nearly two dozen scanning centers in five different countries and it is estimated that between them they scan at least 1,000 books every day. From 2006 to 2008, Microsoft Corporation was partnering with the Internet Archive in its now defunct Live Search Books project. Microsoft provided financial support as well as superior scanning equipment for this effort. Over the course of those two years, more than 300,000 books were scanned, all of which were added to the collections of the Internet Archive. When Microsoft closed down its Live Search Books project, in May of 2008, they donated their scanning equipment to the Internet Archive, where it continues to be used at the scanning centers operated by Internet Archive members.
It is that specialized book scanning equipment which makes the scanned texts at the Internet Archive light-years better than anything done by Google Books. Though I am not a noted fan of Microsoft, the scanning equipment which they developed and donated to the Internet Archive effort was carefully and intelligently designed and manufactured. I have what I consider a great advantage with regard to my ongoing Regency research, I work less than a block from the main branch of the Boston Public Library, which operates one of the Internet Archive scanning centers. Over a year ago, I was introduced to the director of the scanning center there and was very fortunate to be offered a tour of the facility. That tour included an opportunity to watch one of the technicians at work scanning a book with the Microsoft scanning equipment. In that case, it was a book from the personal library of John Adams, which is owned by the Boston Public Library. At the time, they were in the process of scanning John Adams’ entire book collection into digital format, thus making it available to the world for study, while keeping the original volumes safe in their climate-controlled storage vault.
Unlike the scanning process for Google, in which books are apparently laid flat on a table and held open by a human (all those inadvertent images of thumbs and fingers on Google book pages attesting to that method), the Internet Archive scanners do not require human hands on the book pages while they are scanned. Instead, the scanner has an adjustable V-shaped cradle into which the book is laid so that it is fully supported with no more pressure on its often fragile spine than is necessary. Once the book has been placed into the cradle, a matching V-shape of a pair of panes of glass are lowered to the book, the point of the V-shaped glass cover pressing into the gutter of the book just firmly enough to hold the book in the cradle and keep the two visible pages flat without putting undue pressure on the spine of the book. There are a pair of lenses set above this book cradle and glass cover unit, each of which is set at an angle so that it focuses precisely on the plane of one side of the V-glass cover. When the scanning technician triggers the scanner, an image is taken of each open page. With a foot pedal, they can then raise the glass cover, turn the page, lower the cover and scan another two pages and so on, until the entire book has been scanned.
Not only does the Internet Archive scanning method put less pressure on a book’s delicate spine, since it is not forced fully open on a flat surface by occassionally careless humans, the image of each page is clean and crisp. The glass cover of the scanner holds the pages completely flat and still while they are scanned, so there are never any pages with hideously warped and twisted text, or parts in focus and other parts blurry. Each page of a book which has been scanned by the Internet Archive is clean, in sharp focus and fully legible, with nary a human finger or thumb in sight. [Author’s Note: There is a caveat which must be included here. In the fall of 2007, it is estimated that approximately 900,000 books from the Google Book project were uploaded to the Internet Archive. These are all full copies of books, but sadly, due to Google’s sloppy scanning practices, not all of the books in that group will have pages which are fully legible. However, in most cases, search results at the Internet Archive note if the book was scanned by Google, so you will be aware of that fact and can select a copy from another source, if it is available.]
Another advantage of the Internet Archive is that it offers only full copies of books, since they only scan books which are out of copyright and in the public domain. There are no teasing "Preview" views, which might appear during a search on Google Books, often with the information you need not visible. Nor are there any of those annoying "Snippet" pages which only offer meta data on the book, but none of its contents. And unlike Google Books, the Internet Archive pays close attention to books with multiple volumes, making the effort to clearly differentiate each volume in a set. This can be crucial to those of us who might want to read a complete Regency-era novel, most of which were published in three volumes.
Most of the books at the Internet Archive are available in multiple file formats, which provides more options for reading those books on various devices. But even better, you can read the books online. And when you do so, the book can be viewed on your full computer screen. Unlike Google Books, which restricts your viewing area to a small portion of the screen. When Google Books was first introduced, the text of the book was available in about three-quarters of the screen. But now, when trying to read a book online at Google Books, the actual text of the book is restricted to less than half the screen. The rest of the screen is cluttered up with ads and controls which take up entirely too much of your screen real estate. That is not a problem at Internet Archive. When you choose to view a book online, the book fills your whole screen, which is much easier on the eyes when doing in-depth research.
Search at the Internet Archive can easily be refined based on the type of media you are seeking. Since I only go there looking for books, I select "Texts" from the "Media Types" pick list before I run my search. A texts search can be refined even more by selecting from other criteria on the pick list, such as American Libraries, University Libraries, Project Gutenberg or Children’s Library. These refinements will reduce the number of search results that are returned, since a smaller portion of the database will have to be searched. However, since I am not always certain where books I am seeking might be found, I prefer to run a wider search. I do get a larger results set, but I prefer scrolling through that longer list to taking the chance that I might miss something by using a tighter search. There is also an option to search all media types, which will return an even longer lists of search results, but covers all media types in the Internet Archive collection. An advanced search option is also available, if you want to more closely refine your search for a specific item based on any keywords you are using.
One of my favorite features of the Internet Archive is the Wayback Machine. It got its name from the machine that the very smart dog, Mr. Peabody, built for his friend Sherman, in the Peabody’s Impossible History segments which were part of the Rocky and Bullwinkle cartoons from the 1960s. Since 1996, the Internet Archive has been archiving as many web pages as they can, and all of those archived pages are available for search using the Wayback Machine. The Wayback Machine makes it possible for you to see how quite a lot of web sites looked in the past. Would you like to see how http://www.janeauten.org or http://www.georgetteheyer.com looked five or ten years ago? Just type that URL into the search box of the Wayback Machine and click the "Take Me Back" button. In fact, the Wayback Machine has become the only way to see the Good Ton web site once that wonderful traditional Regency resource went offline.
Once you enter a URL into the Wayback Machine search box and click the button, you will be presented with a unique search results page. Across the top, is a grid of years and below that is a twelve month calendar. When you click on a specific year, blue dots will appear on various dates on the calendar below. Each of those blue dots is a link to a snapshot of the web site for that specific date. Keyword searching is not currently supported at the Wayback Machine, so you do need the correct URL for any web site you would like to see. It must be noted that the Wayback Machine does respect robots.txt files. These are files which a web site uses to tell search engines to stay away. Any web site which has posted a robots.txt file will not be included in the Internet Archive web site database. Therefore, the Internet Archive cannot make every web site of the past available to searchers, but it does have quite a lot of them available for viewing. Would you like to see how the web site of your favorite Regency author looked when they first started writing? Just type their URL into the Wayback Machine search box and go back in time to have a look.
One can sign up for a free Virtual Library Card with the Internet Archive. The advantages of setting up an account are that you can then create bookmarks of materials within the collection that you are using for research. And, you can also sign up for a monthly newsletter from the Internet Archive which will keep you apprised of new additions to the collections. If you live in the San Francisco area, the home of the Internet Archive, you can also sign up for email notifications of local Internet Archive events which are held in the Bay area.
Earlier this week, I received my copy of the Internet Archive monthly newsletter, in which they announced a most impressive statistic. They now have over two million books scanned and available online for researchers. And they are already at work scanning more books for their next million. The Internet Archive is a rich resource for researchers and scholars around the world, regardless of the topics in which they are interested. All of the books you will find there are complete copies and are all available for download. The Internet Archive also has large audio and visual archives, which include popular music and television news and entertainment programs, along with all of the web sites which can be viewed using the Wayback Machine. Take some time to look over the offerings at the Internet Archive, you are certain to find something there of interest, whether or not it is related to the Regency.