Exploring art resources on the web as data: A hands-on workshop for data analysis and instruction

August 14th, 2023

by Karl Blumenthal, Web Archivist, Internet Archive

Art librarians, web archivists, students, and educators gathered at the National Gallery of Art in Washington, D.C., last month to explore the data from online art resources computationally at scale. Attendees left the workshop with an understanding of web archive research use cases and how to support them. They learned the processes of creating web archive collections and analyzing them hands-on as data, preparing them to consult with faculty and researchers on the questions that make use of computational methods.

IMLS and CARTA logos

Like the previous workshop held at the Art Librarians Society of North America (ARLIS/NA) conference in Mexico City, this event was supported by a generous grant from the Institute of Museum and Library Services (IMLS). The IMLS grant, A National Network of Art Libraries Building Web Archives, supports the Collaborative ART Archive (CARTA), which collected and shared the workshop’s sample web archive collection data.

Photographs of a tour group at the National Gallery of Art Library (left) and workshop attendees at work in the classroom (right)

Left: Executive Librarian Roger Lawson leads workshop attendees on a tour of the National Gallery of Art Library. Right: Karl Blumenthal, Web Archivist for the Internet Archive, tours attendees through the WARC file format.

Workshop participants learned about the curatorial and technical decisions that make a web archive collection, then constructed their own in response to researcher questions and institutional needs, using the Internet Archive’s Archive-It service to run their first web crawls. Collection topics included auction houses, alternative art spaces, regional art scenes, and art history e-journals. 

Drawing on inspiration from tours of NGA’s library and modern art collections, participants then transformed their collections into datasets for computational research using the Archives Research Compute Hub (ARCH), the dataset engine developed in collaboration with the Archives Unleashed Project and support from the Andrew W. Mellon Foundation.

Screenshot of the Voyant web platform for text analysis

Screenshot of a web page text dataset from the Art Galleries web archive collection interpreted by the natural language processing (NLP) tools hosted on the Voyant platform.

Using CARTA’s Art Galleries web archive collection as a demonstration, participants examined the contents of different dataset types and practiced parsing and reading them with free, browser-based tools. They visualized and explored their data as communication network graphs, digital object repositories, text mines, and more.

Internet Archive staff will continue to iterate on these workshop materials to create more live training and on-demand tutorials for supporting computational research at the scale of web archives. In the meantime, anyone may find the sample data from this workshop and short guides to analyzing their contents with popular and open source tools here in the new ARCH Help Center

If you would like to see specific tutorials added or host an instructional event of your own, please reach out to the ARCH program and development team anytime here: arch [at] archive [dot] org and stay tuned for more updates!