There’s a photo of my daughter that I love. She sits, smiling, in our old yard, her chubby hands clutching the fresh grass. It was taken in 2013, when she was almost a year old, with an old Samsung digital camera. I originally stored it on a laptop before transferring it to a beefy external hard drive.
A few years later, I uploaded it to Google Photos. When I search for the word “grass”, Google’s algorithm finds it. This always makes me smile.
I pay Google £1.79 a month to keep my memories safe. I’m putting a lot of trust in a company that’s only been around for 26 years, but the hassle it eliminates seems worth it. There are so many things nowadays. The administrator required to keep them up to date and secure is very expensive.
My parents didn’t have this problem. They would occasionally take photos of me with a film camera, and every so often they would print them on paper and put them in an album. These photos can still be seen today, some 40 years later, on faded and yellowed photographic paper – a few photos for each year. Many of my memories from the following decades are also recorded on paper. The letters I received from my friends when I traveled abroad in my 20s were handwritten on lined paper. I still have them tucked away in a shoebox, a fun but relatively small archive from an offline time. We no longer have these space limitations. My iPhone takes thousands of photographs a year. Our Instagram and TikTok feeds are constantly updated. Collectively, we send billions of WhatsApp messages, texts, emails and tweets.
However, although all this data is abundant, it is also more ephemeral. One day, in the perhaps not-so-distant future, YouTube will no longer exist, and your videos may be lost forever. Facebook – and your uncle’s vacation posts – will disappear. There is precedent for this. MySpace, the first large-scale social network, deleted all photos, videos and audio files uploaded before 2016, apparently without warning. Entire portions of Usenet newsgroups, home to some of the Internet’s earliest conversations, have gone offline forever and been erased from history. In June of this year, more than 20 years of music journalism disappeared when the MTV News archives were taken offline.
For many archivists, alarm bells are ringing. Around the world, defunct websites or at-risk data are recovered or collected to save as much of our digital lives as possible. Others are working on ways to store this information in formats that will last hundreds, perhaps even thousands of years.
The effort raises complex questions. What is important to us? How and why do we decide what to keep – and what to let go of? And how will future generations understand what we are capable of saving?
“Welcome to the challenge of every historian, archaeologist, novelist,” says Genevieve Bell, cultural anthropologist. “How do you understand what’s left? And then how do you avoid reading it through the lens of now?”
The Last Chance Salon
There are more things being created now than at any time in history. At Google’s I/O conference this year, the company’s CEO Sundar Pichai said that 6 billion photos and videos are uploaded to Google Photos every day. More than 40 million WhatsApp messages are sent per minute.
Even with much more volume, our data is more fragile than ever. Books may burn in the odd library fire, but information is much easier to erase forever. We’ve seen this happen – not just in incidents like the accidental deletion of MySpace data, but also, sometimes, intentionally.
In 2009, Yahoo announced that it would retire the GeoCities website hosting platform, putting millions of carefully crafted web pages at risk. Even though most of them may seem irrelevant – GeoCities was famous for its amateurish, for starters aesthetic and for its pages dedicated to various collections, obsessions or fandoms – they represented an embryonic chapter of the web and one that was about to be lost. Forever.
And it would have been, if an impromptu group of volunteer archivists, led by Jason Scott, hadn’t intervened. “We took action, and part of the anger and confusion at the time was that we were going from downloading a bunch of interesting sites to suddenly taking over an anchor site from the early days of the web,” Scott recalls.
His group, called the Archive Team, mobilized quickly and downloaded as many GeoCities pages as possible before it closed for good. Ultimately, he and his team were able to save the majority of the site, archiving millions of pages between April and October 2009. The group leader estimates they were able to download and store about a terabyte, but notes that GeoCities’ size has grown, shrunk and was around nine terabytes at its peak. Most likely, most of it is gone forever. “It was 100% user-generated works, popular art, and honest examples of human beings writing information and stories that didn’t exist anywhere else,” he says.
Known for his top hat and cyberpunk-inspired sense of style, Scott has made it his life’s mission to help save parts of the web that are in danger of being lost. “Increasingly, it is understood that archives, archiving and preservation are a choice, a duty, and not something that simply emerges, like tides,” he says.
Now, Scott works as a “free archivist and software curator” at the Internet Archive, an online library founded in 1996 by Internet pioneer Brewster Kahle to save and maintain data that would otherwise go extinct. Over the past two decades, the Internet Archive has accumulated a massive library of material scraped from the web, including content from GeoCities. It doesn’t just save purely digital artifacts; he also has a vast collection of digitized books that he has scanned and rescued. Since its inception, the Internet Archive has collected more than 145 petabytes of data, including more than 95 million public media files such as films, images and text. Nearly half a million MTV news pages were saved. Its Wayback Machine, which allows users to rewind to see what certain sites look like at any given time, has more than 800 billion web pages stored and captures another 650 million a day.
It also records and collects TV channels from around the world and even saves videos from TikTok and YouTube. Everything is stored in multiple data centers owned by the Internet Archive. It’s a Sisyphean job.
As a society, we are creating so many new things that we must always delete more things than we did in the previous year, says Jack Cushman, director of Harvard’s Library Innovation Lab, which helps libraries and technologists learn from each other. “We have to figure out what to save and what not to save,” he says. “And how do we decide?”
(MIKE MCQUADE)
Archivists have to make such decisions constantly. Which TikToks should we save for posterity, for example?
We shouldn’t try too hard to imagine what future historians would find interesting about us, says Niels Brügger, an Internet researcher at Aarhus University in Denmark. “We can’t imagine what historians 30 years from now would like to study today, because we have no idea,” he says.
“Therefore, we should not try to anticipate and restrict the possible questions that future researchers would ask.”
Instead, according to Brügger, we should just keep as many things as possible and let them discover them later. “As a historian, I would definitely opt to take it all, and then the historians will figure out what the hell they’re going to do with it,” he says.
At the Internet Archive, priority is given to what has the greatest risk of loss, says Jefferson Bailey, who works there. He helps develop archiving software for libraries and institutions. “Material that is ephemeral, or endangered, or that has not yet been digitized and would therefore be more easily destroyed because it is in analog or printed format – those take priority,” he says.
People can request that pages be archived. Libraries and institutions also make recommendations, and the team resolves the rest. On open social networks like TikTok and YouTube, teams of library archivists around the world select certain accounts, copy what they want to save, and share those copies with the Internet Archive. It can be snapshots of what’s trending each day, as well as tweets or videos from accounts run by notable individuals, like the US president.
The process cannot capture everything, but it offers a good slice of what worried us in the first decades of the 21st century. While historical records are typically based on the private letters and belongings of society’s wealthiest, an archival system that collects tweets will always be a little more egalitarian.
“You can get a very interesting and diverse portrait of our cultural moments from the last 30, 40 years,” says Bailey. “This is very different from what a traditional archive was like 100 years ago.”
As citizens, we could also help future historians. Brügger suggests that people could make “data donations” of their personal correspondence to archives. “One week a year, invite everyone to donate that week’s emails,” he says. “If you had these periods of email correspondence from thousands of people year after year, that would be really great.” Scott imagines that future historians will eventually use AI to query these archives and gain unique insight into how we lived.
“You can ask a machine, ‘Could you show me images of people having fun at amusement parks with their families in the 1960s?’ and it will say, ‘Here it is,’” he says. “The work we have done so far has been done in faith that something like this could exist.”
The past guides the future
Human knowledge does not always disappear with a dramatic flowering like GeoCities; sometimes it is erased little by little. You don’t know something is missing until you go back to check. An example of this is “link rot”, where hyperlinks on the web no longer direct you to the right target, leaving you with broken pages and dead ends. The Pew Research Center, in a study from May this year, found that 23% of web pages that existed in 2013 are no longer accessible. It’s not just web links that die without constant curation and care. Unlike paper, the formats that now hold most of our data require certain software or hardware to run, and these tools can quickly become obsolete. Many of our files can no longer be read because the applications that read them have disappeared or the data has been corrupted, for example.
One way to mitigate this problem is to regularly transfer important data to newer media before the programs needed to read it are lost forever. At the Internet Archive and other libraries, the way information is stored is updated every few years. However, for data not actively handled, it may only be a few years before the hardware needed to handle it is no longer available. Think of once-ubiquitous storage media like Zip or CompactFlash drives.
Some scholars are looking for ways to ensure that we can always access older digital formats, even if the resource needed to read them has become a museum piece. The Olive project, directed by Mahadev Satyanarayanan of Carnegie Mellon University, aims to enable anyone to use any tool, no matter how old, “with just one click”. Since 2012, his team has been working to create a massive decentralized network that supports “virtual machines” – emulators for old or defunct operating systems and all the software they run. Keeping ancient data alive in this way is a means of guarding against what computer scientist Danny Hillis once dubbed the “digital dark ages,” a reference to the early medieval period when the lack of written material left little for scientists to do. future historians progress. Hillis, an MIT alumnus who pioneered parallel computing, thinks the rapid technological upheaval of our time will leave much of our lives a mystery to scholars.
“When people look back at this period, they will say, ‘Oh, well, you know, there was a kind of incomprehensibly rapid technological change and a lot of history was lost during that transformation,” he says. Hillis was one of the founders (with Brian Eno and Stewart Brand) of the Long Now Foundation, a San Francisco-based organization known for its compelling art and science projects, such as the Clock of the Long Now. Jeff Bezos-funded giant mechanical device currently under construction on a mountain in West Texas, designed to keep accurate time for 10,000 years. He also created the Rosetta Disc, a nickel circle engraved on a microscopic scale with documentation for around 1,500 languages of the world.
In February, a copy of the disk landed on the Moon aboard the Odysseus probe. Part of Long Now’s focus is to help people think about how we protect our history for future generations. It’s not just about making historians’ lives easier; it’s about helping us be “better ancestors,” according to the organization’s mission statement.
It’s a sentiment that resonates with Vint Cerf, one of the founders of the Internet. “As I get older, I keep thinking: how can I be a good ancestor?”, he says. “An understanding of what happened in the past is useful for anticipating or interpreting what is happening in the present and what might happen in the future,” says Cerf. There are “all kinds of scenarios in which the absence of knowledge of the past is a debilitating weakness for a society.” “If we don’t remember, we can’t think, and the way society remembers is by writing things down and putting them in libraries,” agrees Kahle. Without these repositories, he said, “people will be confused about what is true and what is not true.” Kahle created the Internet Archive as a way to ensure all knowledge is free for everyone, but feels the balance of power has shifted from libraries to corporations.
And that will likely be a problem for keeping things affordable in the long run. “If this is left to the corporations, everything ends,” he says. “We are not just talking about classic published works – like your magazine or your books – but Facebook pages, Twitter [currently X], your personal blogs. Overall, all of these are on enterprise platforms now. And all of this will disappear.”
The loss of our long-term digital archives has real implications for how society functions, says Harvard’s Cushman, who points out that our legal decisions and documentation are largely stored digitally. Without a permanent, unalterable record, we can no longer rely on past judgments to inform the present. His team created ways to allow courts and law journals to archive copies of web pages in the Harvard Law Library, where they are kept indefinitely as a record of legal precedents.
He’s also creating tools for people to interact with these archives, either by scrolling through historical versions of a website or using a custom GPT to interact with collections.
Several other groups are working on similar solutions. The US Library of Congress has suggested standards for storing video, audio and web files to be accessible to future generations. It demands an urgency for archivists to think about certain questions, such as whether the data includes instructions for accessing it or how widely adopted the format has been (the idea being that a more prevalent format is less likely to become obsolete Bug
ultimately, digital files are more difficult to maintain than physical ones, says Cushman. “If you run out of budget and leave the books in a quiet, dark room for 10 years, they’ll be fine,” he says. “If you don’t pay your AWS bill for a month, your files will be gone forever.”
Storage for impossible timescales
Even the physical way we maintain digital data is impermanent. Most long-term storage in data centers – for use in disaster recovery, among other applications – is on magnetic hard drives or tapes. Hard drives wear out after a few years; tape is a little better, however, it still doesn’t take much beyond a decade or so of storage use before it starts to fail.
Companies make new backups all the time, so this is less of an issue in the short to medium term. However, when you want to store important cultural, legal, or historical information for all eras, you must think differently. You need something that can store an abundance of data, but also stands the test of time and doesn’t need constant care. DNA has often been touted as a long-term storage option. It can hold surprising amounts of information and is incredibly durable; pieces of bone contain readable DNA from hundreds of thousands of years ago.
However, today, encoding information is expensive and slow, and specialized equipment is needed to “read” the data later. This makes it impractical as a serious long-term support for knowledge of our world, at least for now.
(MIKE MCQUADE)
Fortunately, there are already some attractive alternatives. One of the most advanced ideas is Project Silica, currently under development at Microsoft Research in Cambridge, UK, where Richard Black and his team are creating a way of long-term storage in squares of glass, capable of lasting hundreds or even thousands. of years. Each is made using a powerful and precise laser, which writes nanoscale deformations in the glass below the surface, capable of encoding bits of information.
tiny imperfections are placed on top of each other in the glass, and then read with a powerful microscope, which can detect how light is refracted and polarized. Machine learning is used to decode the bits, and each square has enough training data to allow future historians to retrain a model from scratch if necessary, says Black. When I hold one of the silica squares in my hand, it feels pleasantly sci-fi, as if I’d just pulled it out to turn off HAL in 2001: A Space Odyssey. The encoded data is visible as a faint blue, in which the light hits imperfections and scatters.
A video shared by Microsoft shows these squares being microwaved, boiled, baked in the oven, and electrocuted with a high-powered magnet, all with no apparent ill effects. Black envisions using silica to maintain long-term scientific archives, such as medical information or meteorological data, for decades. Essentially, the technology can create files that are isolatable from the Internet and do not require special power or care. They can simply be locked in a silo and should work well and be readable centuries from now. “Humanity has never stopped building microscopes,” says Black.
In 2019, Warner Bros. archived some of their back catalog on silica glass, including the 1978 classic Superman.
Black’s team also designed a library storage system for the Silica Project. Shelves filled with thousands of glass squares fill a small room in the Cambridge office. Attached to shelves, bag-sized robots fly along them and occasionally stop, letting go of one of the supports, and climb up or down between them before darting off again along the way. When they arrive at a certain location, they stop and take one of the squares, the size of a CD, from the shelf. Its content is read, and the robot returns to its position.
Meanwhile, deep in the vaults of an abandoned mine in Svalbard, Norway, GitHub is storing some of history’s most important software (including the source code for Linux, Android, and Python) in a special film that its creators claim could last. more than 500 years. The material, manufactured by the company Piql, is coated with microscopic silver halide crystals that permanently darken when exposed to light. A high-power light source is used to create dark pixels just six micrometers in diameter, which encode binary data. A scanner then reads the data back. Instructions for accessing the information are written in English on each reel, in case there is no one else around to explain how it works.
In addition to the GitHub collection, the repository, known as the Arctic World Archive, also includes data provided by the Vatican and the European Space Agency, as well as diverse artwork and images from governments and institutions around the world.
Yale University, for example, stored a collection of software, including Microsoft Office and Adobe, as Piql data. Just a few hundred meters further you will find the Svalbard Global Seed Vault, a deposit that preserves a selection of the world’s biodiversity for future generations. Data about what each seed container contains is also kept on Piql-type film.
Ensuring that this information is stored in decodable formats hundreds of years from now will be critical. As Cushman points out, we still argue about the correct way to play Charlie Chaplin’s films because the intended playback speed has never been recorded. “When researchers try to access these materials decades from now, how expensive will it be to build tools to display them, and what will be the chances of getting it wrong?” he asks. Ultimately, the motivation for these projects is the idea that they will act as support for humanity. A long-term solution that will withstand an apocalypse, an electromagnetic pulse from the Sun, the end of civilization, and that will allow us to start again. Something to let people know we were here.
Accidents welcome
Sometime in the first century, a Roman woman named Claudia Severa was planning a huge birthday party at a fort in northern England. She asked a servant to write an invitation to one of her best friends on a wooden board, and then signed it with a touch of elegance. Claudia could never have suspected that, almost 2,000 years later, the Vindolanda Plates (of which her invitation is the most famous) would be used to give us a unique insight into the daily lives of Romans in England at that time. This is always the way. Throughout history, the strangest and most random things have survived to serve as guides for historians.
The same will happen to us. Despite the efforts of archivists, librarians, and storage researchers, it is impossible to know for sure what data will still be accessible once we are long gone. And we might be surprised at what they find interesting when they come across it. Which batch of archived emails or TikToks will be the key to unlocking our era for future historians and anthropologists? And what will they think of us? Historians sifting through our digital detritus may be left with a host of unanswered questions and only the best guesses to make.
“You would have to ask who had digital technology,” says Bell. “And how did they feed her? Who has to make choices about this? And how was it stored and distributed? Who witnessed it?” We don’t know what will still be in operation 20, 50 or 100 years from now. Maybe Google Photos’ cloud storage has been abandoned, a giant pile of old hard drives buried in the ground.
Or perhaps, with luck, one of Scott’s archivist spiritual heirs saved him before he fell. Maybe someone downloaded it onto some kind of glass disk and kept it in a certain safe somewhere. Perhaps some future anthropologist will find it one day, dust it off, and discover that it is still readable.
It can randomly select a file, create some kind of software emulator, and find a billion photos from 2013.
And see a happy chubby girl sitting on the grass.
(NIALL FIRTH)
( fonte: MIT Techonology Review )