Library of Congress Twitter Archive Approaches 170 Billion Tweets, Takes 24 Hours to Search

By I-Hsien Sherwood | i.sherwood@latinospost.com (staff@latinospost.com) | First Posted: Jan 05, 2013 10:53 PM EST

The Library of Congress has almost completed amassing a backlog of every public message ever tweeted on the social networking site Twitter.

The ambitious ongoing project hopes to preserve for posterity every vain or vapid comment sent into the ether by celebrity-obsessed teenage minds, as well as some useful information.

Since it started the effort in 2010, the Library of Congress has gathered about 170 billion tweets, an archive that grows by half a billion every day.

"The Library's first objectives were to acquire and preserve the 2006-10 archive; to establish a secure, sustainable process for receiving and preserving a daily, ongoing stream of tweets through the present day; and to create a structure for organizing the entire archive by date," posted Gayle Osterberg, director of communications for the Library of Congress, on the organization's website.

"This month, all those objectives will be completed," she wrote. "We now have an archive of approximately 170 billion tweets and growing. The volume of tweets the Library receives each day has grown from 140 million beginning in February 2011 to nearly half a billion tweets each day as of October 2012."

For its part, Twitter has agreed to provide all public tweets to the Library free of charge, indefinitely.

But the Library still isn't sure what to do with all that information now that it has it.

"The Library's focus now is on addressing the significant technology challenges to making the archive accessible to researchers in a comprehensive, useful way," wrote Osterberg.

A white paper provided by the Library illustrates the challenges faced by such a daunting pile of information. "The Library has not yet provided researchers access to the archive. Currently, executing a single search of just the fixed 2006-2010 archive on the Library's systems could take 24 hours. This is an inadequate situation in which to begin offering access to researchers, as it so severely limits the number of possible searches."

Get the Most Popular Tech Stories in a Weekly Newsletter

Library of Congress Twitter Archive Approaches 170 Billion Tweets, Takes 24 Hours to Search

Latinos Stream Tech

Trending

Gaming

Science