eDiscovery

eDiscovery for Today: Handling Modern Data

Many of us on the eDiscovery front line have noticed how much data has changed over the last decade. In the past, it was unusual…

Heidi Amaniera

October 25, 2023•7 minutes read

Many of us on the eDiscovery front line have noticed how much data has changed over the last decade. In the past, it was unusual to get a request to process chat data. And, since COVID, the influx of requests to handle audio/video (“a/v”) files has increased exponentially. In fact, unstructured data is now 80-90% of all new data and it is growing at a rate of three times faster than structured data. By 2025, it is expected to grow to 175 zettabytes – that’s five times the size it was in 2018! [1]

What exactly is unstructured data? The technical definition is any information that is not arranged according to a preset data model or schema. Simply put, if the data is not organized in a manner that makes it easy to process, then it is unstructured. Examples of unstructured data include social media content, chat data, content created with collaborative software, rich media, and machine-generated data.

Challenges

Structured data is generally very easy to process. It has clear and identifiable metadata that can be captured, and its content can be displayed in a standard document viewer. This is not the case with unstructured data. Below are the four main reasons why unstructured data has some obvious challenges within the eDiscovery world:

Even if the data has an internal structure of some kind, it usually cannot fit into a pre-defined data model. This makes it very difficult to collect, process, and present with the most common eDiscovery software.
Unstructured data commonly contains billions of items that will require, not only definition but a means to pre-filter and manage in a manner consistent with the needs of the project without trying to boil the ocean.
Common collaborative elements of unstructured data require special handling to ensure its content, creator/editor, version history, and other tracking information are preserved.
Unstructured data is commonly updated with new content that may require repeated collections during the discovery process of a project.

By way of a common example of the above challenges, think of that chat thread you have had going for the last three years – a few days or a week may go by with no activity, and then bam! Same players pick up this conversation to provide updates on the same topic or to bring up a new, but similar item. Or maybe it’s a team chat conversation that covers many different topics. Now, what part of that chat is related to your eDiscovery project? Where is the cut-off point? How much of that thread do you want to view per record in your database? How do you track members of the chat who is added and when, and who is removed and when? All of these challenges require a game plan of some kind. Below we will review a couple of scenarios that demonstrate how cutting-edge eDiscovery software like OpenText^TM Axcelerate^TM is handling these challenges.

Chat Data

Axcelerate Chat features support chat data from:

Bloomberg
Slack
Microsoft Teams
Mobile device collections using:
- Cellebrite
- XRY
- Oxygen
XML formats

Chat data can be ingested into Axcelerate from either an export or directly from the Chat application by using one of the Axcelerate compatible Connectors (see “Other Structured Data and Connectors” section below). There are no native chat files associated with this type of data – it doesn’t export out like MSG files to an email container. This data is primarily a stream of information that is encoded in a manner that needs to be reassembled for easy viewing (see examples below).

[Example 1: Axcelerate Near Native view]

[Example 2: Axcelerate Text view]

To avoid lengthy chat documents, there are a couple of options that can be applied. The default is to split chats by day. But the system never splits chat messages in channels from their replies, even if they span several days. There is also an option to use Adaptive Chat Splitting that splits chats up by identified gaps within the dates of the conversation thread. Chat splitting can also be disabled, for example, if the case team wants to produce a complete chat as one document.

Whatever option is selected, locating all chat documents belonging to the same chat in Axcelerate is easily done with the Chat History Smart Filter. Other Chat specific filters include Chat Platform, Chat Event Type, and Chat Count. The Chat Event Type is particularly useful in identifying when new participants are added or removed.

Chat Viewer

The optional Axcelerate Chat Viewer has extra features to allow for an easier Chat reviewer process. This feature includes ability to highlight select chat contributors using the Members list (see example below).

[Example 3: Chat Viewer: Near Native]

While viewing a chat document in the Chat Viewer, users have the ability to select specific messages within the chat thread itself for redaction and production purposes. After selecting the desired messages inside the Chat Viewer, the users can then switch to the Redaction view that will only display those selected messages for that document. The users can then proceed to redact and mark up this view of the document and ready it for production. This simplifies the process of producing only the relevant messages in a chat thread without manually redacting the non-relevant portions. To learn more about Axcelerate’s chat features, take a look at this short YouTube video.

Audio/Video Support

Axcelerate CE has now rolled out support for audio and video (“A/V”) files in its 23.4 release. As A/V files are not inherently searchable, there is now a Transcription feature that will quickly generate text from your media. And this release also includes an Audio-Video Player to allow for easy playback. Users can even synchronize the playback with the media’s transcript (see example below).

[Example 4: Audio-Video Player]

OpenText plans to continued expanding this new feature to include many more A/V support options, including:

Ability to redact A/V files and/or their transcripts by timeline or time stamp selection
Generation of redaction list menu for further navigation options
Ability to assign redaction reasons to each markup
Production of redacted A/V files natively

This new feature is covered in more details within a separate Blog post.

Other Structured Data and Use of Connectors

Another great way that Axcelerate supports other types of unstructured data is with the use of Data Connectors. These connectors allow data to be ingested directly into Axcelerate from the original source system. This not only saves time, no longer having to deal with exported data, but it can be used for seamless transfer of other unstructured data from systems like Box, Atlassian Confluence, Google Drive and more. Currently, Axcelerate has over 40 connectors.

In addition to Axcelerate’s Chat features, newly released A/V support for Cloud, and Data Connectors, this platform has tons of other options for addressing unstructured data. For example, there are parsers for seamlessly ingesting forensic images without mounting the data. And the Cloud Edition of Axcelerate has a special Native Excel viewer for seamless review of large Excel files. For more information, please take a look at the Axcelerate Product Overview.

[1] See Researchworld.com article “Possibilities and Limitations of Unstructured Data”.

Share this post

Heidi Amaniera

Heidi Amaniera is a Director in LegalTech Professional Services with world-wide leadership responsibilities over off-cloud and public cloud implementation, enablement, managed services, and consulting for Axcelerate. Heidi’s background includes management of eDiscovery services within both the vendor and law firm environments. She also spent over 15 years as a seasoned litigation paralegal specializing in Intellectual Property.

See all posts