A brief look at web accessibility in Archive-It

June 16th, 2022

by Tori Maches, Digital Archivist, UC San Diego,

Lydia Tang, PhD., Outreach and Engagement Coordinator, Lyrasis,

Tanya Ulmer, Web Archivist, Archive-It.

Introduction

In May 2021, the Society of American Archivists’ Web Archiving Section hosted a chat about web accessibility. Throughout the session, a recurring question that arose was how accessible web archives are. We follow up with Tori Maches, Lydia Tang, and Tanya Ulmer to take a deeper look at the accessibility of archived websites in the Archive-It service, as well as the accessibility of the web archiving process.

How accessible are archived websites?

We looked at some of the earliest web crawls in Archive-It from as far back as 1996, and used the Web Accessibility Evaluation Tool (WAVE) to do a very basic accessibility check. What we learned is that, in some ways, the web was more accessible than it is now! While one can think of cringeworthy neon letters and dancing cats, which can be problematic for color contrast, there was overall less flexibility for error (beyond bespoke coding adventures). The main tenets of accessibility for websites: proper headings, ordered lists, etc, were usually present, particularly for the university and government websites found in the Archive-It service. As websites became more complex and visually rich, the foundational aspects that screen readers still rely on today would be more often missed.

As an example, we reviewed two different crawls of the City of San Diego’s homepage: a snapshot from September 2007, and a second snapshot from October 2021.

While the 2007 version of the page has a plain design typical of government websites in the mid-2000s, it also has high contrast between text and background colors, more consistent alt-text on images, and no dynamic content. However, it does use tables to structure the content, rather than screen reader-friendly headings.

Comparison of the Wayback page for the City of San Diego from 2007 on top with its same version analyzed by WAVE on bottom

Wayback page for the City of San Diego website in 2007 (top)
and the same Wayback page analyzed using the WAVE Tool (bottom)

In contrast, the 2021 version has a more contemporary streamlined appearance but has lower color contrast between text and background colors, empty links and buttons, more missing alt-text, and, while the page lacks tables, it also has many unordered lists.

Comparison of the Wayback page for the City of San Diego from 2021 on top with its same version analyzed by WAVE on bottom

Wayback page for the City of San Diego website in 2021 (top)
and the same Wayback page analyzed using the WAVE Tool (bottom)

Inaccessible elements common in present-day web design can also make it harder to archive websites. For example, in addition to not working well with screen readers, dynamic visual content is often more challenging to capture and replay. Prior to the release of the New Wayback in September 2021, it was often difficult or impossible to replay more complex visual elements such as expandable lists, pop-ups, data visualizations, and certain types of embedded content, or know for sure whether they were captured. These elements are also obstacles to accessibility due to their visual centricity, making an auditory version challenging.

Now, more of this content can be played back, but capturing it can still involve significant trial and error. For example, the current version of the City of San Diego homepage includes an Accessibility Tools pop-up with links to turn on screen reading, magnification, and screen masking; a contrast toggle; and information about other accessibility options and resources. This pop-up does not fully play back in the captured version (only the contrast toggle and “more info” link work), though it may work better after changes to the crawl’s scope.

The Accessibility of WARCs

Websites are currently archived using the Web ARChive “WARC” file format. A WARC file is essentially a container file of HTML, CSS, Javascript, and other technical elements that get reconstituted again for access through Archive-It’s Wayback. When looking at the web archiving process, we asked if and how it captures accessibility elements.

Essentially what we found are a few points:

  1. WARC files capture the main technical “bones” of a webpage (headings, lists, and alt-text)
  2. However, there are limitations to what WARC files currently capture. For example, when archiving a video, it crawls the MP4 video file, but does not appear to capture the caption text files (such as .srt) when they are present.

Comparing the WARC and ARC files for the City of San Diego’s archived pages from 2007 and 2021, we find many of those technical “bones” for the accessibility features mentioned above. The ARC file for the 2007 capture shows that the contrast between the text color and the background color was very high, making the text readable for low-vision users. Most image files also had reasonable alt-text coded in, as the image for then Mayor Jerry Sanders shows on line 398973:

Part of a 2007 ARC file for the City of San Diego capture with alt-text for an image of then Mayor, Jerry Sanders

Screenshot of a portion of the 2007 ARC file for the City of San Diego capture
with alt-text for the Mayor’s image

The footer of the website even has a useful link to a “Text-only version”, which is carried over into the ARC. On the other hand, the ARC shows that these HTML pages used tables to structure the content, rather than headings which are much friendlier to screen readers and improve accessibility overall.

The October 2021 WARC shows evidence of the dynamic features the City of San Diego has implemented for accessibility. While the default contrast for the City of San Diego website isn’t always the highest (for example, in the pink banner at the top), the Accessibility pop-up’s “high contrast” toggle is written into the WARC file, although its playback is not functional on the archived page itself. The application of alt-text is present in the WARC, for example, see the alt-text applied to the City’s logo on line 101.

Screenshot of a portion of the 2021 WARC file for City of San Diego capture with alt-text for the city's logo

Screenshot of a portion of the 2021 WARC file for the City of San Diego capture
with alt-text for the city’s logo

This 2021 WARC file shows tables were no longer used to structure the content on the page, but there could be improved use of headings to make the pages more accessible for everyone. Overall, many of the accessibility features do appear to get written into the WARCs, but they don’t necessarily function on the archived page as they do on the live site, or as well as users with disabilities might like.

The Accessibility of the Archive-It’s Public Interface

Next, we looked at the accessibility of the Archive-It public website. As we mentioned earlier, accessibility can be an ever-evolving target, but there are some primary elements that we looked at.

Using the WAVE accessibility Chrome browser extension, we assessed the Archive-It website, archive-it.org. On this very basic glance, it looks pretty good. WAVE didn’t detect any errors, and identified several positive accessibility attributes. The primary and pervasive drawback was with the low contrast website theme colors of white text or gray text on an orange or green background. An additional detail to note is that the WAVE tool doesn’t particularly look at fonts in terms of overall contrast and visibility, so while the top menu text of “Home, Explore, Learn More,…” on a visual inspection appears to be low contrast due to the very narrow width of the font, it does not register as low contrast using the automated checker.

Screen shot of the WAVE tool analyzing the homepage for archive-it.org

Screenshot of a portion of Archive-It’s public website’s homepage analyzed with the WAVE tool

Screen shot of the WAVE tool analyzing a collection landing page on archive-it.org

Screenshot of a portion of The University of San Diego’s Human Rights at the California-Mexico Border Website Collection landing page on archive-it.org analyzed with the WAVE tool

In examining the calendar view of an archived website, WAVE noted that some alt-text was long or that text was very small, but otherwise didn’t identify any color contrast or structural errors. However, when trying to navigate the calendar view with ZoomText (a screen magnifier and reader for low-vision users), there was no audio indication of when an archived page was available. It simply read through each day of every month, without acknowledging the visual bolding of the archived dates or providing further information. Navigating the timeline with the vertical bars was similarly skipped, only reading off the progressing years without indication of web archived pages available. Of course, this is just an example using a single tool (with much thanks to Jeff Swada for the demonstration!), but indicates that future accessibility and usability testing would be worth deeper inquiry.

Screenshot of the WAVE tool analyzing a Wayback Calendar page

Screenshot of a portion of the Calendar Page for the Seed URL http://www.cndh.org.mx/
analyzed using the WAVE tool

Conclusion

Researching for this blog post was an educational journey for all of us. None of us are truly accessibility experts but we are continuously striving to learn more. Here are some links to resources that have been helpful to us as we learned more about accessibility:

An Early History of Web Accessibility by Jay Hoffmann for The History of the Web blog on January 14, 2019. Overview of web accessibility as a field from the mid-1990s to early 2000s, including the development of the WAVE accessibility checker.

Digital Library Federation’s Digital Accessibility Working Group. From their wiki: “The DLF Digital Accessibility Working Group (DAWG) exists to explore issues related to ensuring that the digital resources within information organizations meet the needs of disabled users and staff.” We also found their documentation on Accessibility Auditing Resources and their Accessibility Auditing Shortlist to be helpful.

Society of American Archivists’ Guidelines for Accessible Archives for People with Disabilities. Recommendations for making spaces and services in archives inclusive, and ways to improve accessibility even with limited institutional resources.

W3C Web Content Accessibility Guidelines. From the introduction: “Following these guidelines will make content more accessible to a wider range of people with disabilities, including accommodations for blindness and low vision, deafness and hearing loss, limited movement, speech disabilities, photosensitivity, and combinations of these, and some accommodation for learning disabilities and cognitive limitations.”

Slide deck from SAA Web Archiving Section/Accessibility and Disability Section coffee chat by Lydia Tang. Introduction to web accessibility and its relationship to web archives.

How a web designed for the visually impaired is a better web for everyone by Jason Webber for the UK Web Archive Blog on Dec.15, 2021. Describes the evolution and effect of web accessibility guidance, implementation, and law in the UK, with examples from archived web pages.

International Association of Accessibility Professionals’ Certificates. Information about accessibility as a field, and certificate programs for developing expertise in various specializations within the accessibility field.