Build, Access, Analyze: Introducing ARCH (Archives Research Compute Hub)

July 19th, 2023

by the Archiving & Data Services team

Since 2020 the Internet Archive has been hard at work developing an exciting new service – ARCH (Archives Research Compute Hub). ARCH represents a combination of the Internet Archive’s experience supporting computational research for more than a decade with efforts like Archive-It Research Services, broader Internet Archive data services, and our recent collaboration with the Archives Unleashed team at the University of Waterloo and York University.

After a year of private beta use, including by many Archive-It pilot partners, ARCH is now released as a publicly available service!

image of ARCH dashboard

Screen shot of the ARCH interface.

What does ARCH do?

With ARCH’s intuitive interface, users can build custom research collections from existing digital collections, generate and access research-ready datasets, and analyze those datasets both in ARCH and in other analysis and visualization tools. In line with best practices in reproducibility, ARCH also supports open publication and preservation of user-generated datasets. ARCH is a key asset to enabling new types of access for research, scholarship, classroom instruction, and collection management. Optimized for web, text, and image collections, ARCH gives users the power to study and understand digital collections in new ways.

Image of ARCH's dataset generation interface

Generate datasets simply.

ARCH currently works with public Archive-it collections that cover a broad range of subjects, events, and timeframes. ARCH also works with various portions of the overall Wayback Machine global web archive going back to 1996. Users will be able to browse and select for analysis public Archive-It collections, much as users can browse partner web archive collections and archived websites on archive-it.org. Through ARCH, users can analyze research collections built from multiple original institutional collections. For example, a user could combine multiple institutional climate change collections thus creating the most representative research collection possible. ARCH will include the ability for collection stewards to know when their collections are being used in ARCH in order to support organizational reporting on usage and impact. Archive-It partners that want to test out ARCH or promote it to their users and stakeholder communities, or have any questions about the service, please contact us here. We will also be scheduling webinars and open calls over the next few months to introduce the Archive-It community to the ARCH service.

Who is ARCH for?

ARCH is for any user that seeks an accessible approach to working with digital collections at scale. Possible users include but are not limited to librarians and archivists seeking to support new uses of digital collections or to better understand their collections, and researchers and educators seeking to explore disciplinary questions computationally – e.g., data science, digital humanities or social sciences, digital scholarship, AI, machine learning, and more! Recent research efforts making use of ARCH include but are not limited to analysis of COVID-19 crisis communications, health misinformation, Latin American women’s rights movements, and post-conflict societies during reconciliation.

How can I get access to ARCH?

Libraries, archives, and museums may acquire an organizational account in order to provide access to any user. ARCH cost is calculated according to data processing and infrastructure costs associated with the amount of data volume being analyzed in the service, with some variance for different institution sizes and use cases. In lieu of an organizational account, any individual may sign up for the service or request access to a free sandbox account to begin experimenting with ARCH. All account types include the ability to generate unlimited research-ready datasets, access in-browser visualization, conduct analysis, and openly publish and permanently preserve derived datasets at archive.org.

As part of the release of ARCH, current active Archive-it partners can get a trial ARCH account and use one of their Archive-It collections in ARCH for generating one-time datasets in order to better understand the service and its features. Early ARCH adopters will receive a discount in the first year of service.

Interested in learning more?

Please reach out to us via the following form or at arch@archive.org.

We look forward to talking with you!