Rushing to document and save: The War in Ukraine 2022 web archive

May 12th, 2022

by Liladhar R. Pendse, Librarian for East European, Central European, Central Asian and Armenian Studies Collections, UC Berkeley

As a youth in Belarus, Russia, and Ukraine during the Soviet era, who was taught that the principle of Druzhba Narodov (“the Friendship of the Peoples”) was a multiculturally affirmative counterbalance to American Imperialism, I could never have imagined the conflict and destruction that we all now witness.

World War II caused unprecedented loss of life and property. The excesses of Stalinism suppressed cultural diversity. We’ve come a long way since. Technologies have evolved and increased our interconnectedness, which makes the scale of loss this time both unimaginable and tragically real.

As Ukraine tries to assert its multicultural national identity within its complex history and current democratic state status, governmental and non-governmental websites represent viewpoints or cultural-political information that could be lost if not preserved. And the outages began even before Russian tanks crossed the borders.

Ukrainians know the loss of access to cultural heritage resources from all-too-recent experience. Government websites went dark during Russia’s 2014-15 annexation of Crimea. Similar problems persisted in the internationally un-recognized separatist regions of the self-proclaimed Donetsk and Luhansk Peoples’ Republics.

Screenshot of archived web page announcing state symbols in Crimea post-2014

A web capture from the Ukraine Crisis: 2014-2015 web archive announces the sanctioned forms and uses of state symbols in Crimea after Russian annexation.

Leveraging my recent experience curating the At-Risk Afghanistan Web Archiving Project (ARAWA) : 2021 and Belarus Crisis 2020-2021, I decided early to begin archiving Ukrainian websites, with a specific goal of creating a sustainable topical and discrete archive that documents Russian invasion of Ukraine. UC Berkeley Library’s Library Information Technology department contributed their Archive-It account to the effort, which now includes approximately 345 GB of data from 59 websites.

This was a good and necessary way to get started quickly, but a more extensive and sustainable archive required much wider collaboration. Parallel collecting efforts have since taken shape, led by Harvard University’s Ukrainian Research Institute and the multi-institutional Saving Ukrainian Cultural Heritage Online (SUCHO) project.

But I remember feeling desperate about how to go about safeguarding more resources without adding redundancies and duplicating labor, as each day in this war meant loss of information for the future. Being a pragmatic librarian, I am not much of a believer in miracles, but perhaps I was wrong.

I had an unexpected e-visit from two angels in disguise–Mirage Berry and Kody Willis from the Internet Archive. They jumped into action by creating an Archive-It account with 2 TB of data allotted immediately to house and expand the War in Ukraine: 2022 Web Archive, to preserve the narratives of the invasion as it occurs.

Academic neutrality and representing many diverse viewpoints are guiding principles of this preservation effort. To that end I am privileged to include the work of Dr. Gudrun Wirtz of Bavarian State Library and Kirill Babeev, a student at the London School of Economics and Political Science, in curating the archive.

Together we divided the project into seven distinct sections, or collections:

  • Russian Language Media
  • The Separatist Enclaves
  • Ukraine: Educational and Cultural Institutions
  • Ukraine: NGOs and Social Media
  • Ukraine: News
  • Ukrainian Governmental Websites
  • Ukrainian Public Figures/ Ministries on Twitter

Ukrainian governmental websites comprise the most voluminous collection in the archive because of their special vulnerability to loss during Russian attempts at regime change. At the time of publishing, we have collected more than 1 TB from 718 seeds in this collection already. We also determined it necessary to collect the social media presences of various Ukrainian offices and officials in order to preserve their view of the Russian invasion.

Screenshot of archived web page from the Holodomor-Genocide museum in Kyiv, Ukraine

A web capture from the National Museum of the Holodomor-Genocide in Kyiv, Ukraine, announcing an emergency support fund for culture and media preservation in Ukrainian, Russian, and English.

Accordingly, we agreed to include the Luhansk and Donetsk separatist enclave websites and social media accounts so that future researchers can understand the conflicting and even irreconcilable viewpoints that they represent. 

Finally, any web archive of the War in Ukraine is incomplete for scholarship without providing access to Russian narratives. So far, we have collected 196 GB of data successfully from 83 seed Russian language sites. Authoritative voices inside of the country can pose technical or contextual challenges to archive. While a major media site like smotrim.ru is not accessible to the Internet Archive’s web crawling machines in California, other independent media sites have been censored or taken down entirely within Russia.

Photograph of robots wrapped in Ukrainian flags painting a Lada car

A web capture of a photograph hosted by the Russian state-owned news agency RIA Novosti, appearing to show robots wrapped in Ukrainian colors painting a Soviet-era Lada sedan.

After just two eventful months of crawling, the web archive already includes 1.7 TB in total from 983 seeds sites. Still, there are more narratives and more kinds of narrative that we must include to preserve these primary historical sources justly. For instance, we plan to next include more seeds from Western European sources. 

While the war is likely to continue until either Russian aggression in Ukraine is defeated or the Russian government decides to stop the fight to restore peace in the region, the destruction and loss will be permanent. One decision to cross the international borders in February of 2022 will have reverberating consequences into future generations. As archivists, we cannot be bystanders or silent witnesses, rather we must do everything possible to steward the heritage and human stories threatened with erasure.