Continuous machine learning: keep up with the digital document deluge

Digital business files have replaced many paper documents, and the volume of content is expected to soar in the coming years. Every day, I talk…

Daren Gardner profile picture

Daren Gardner

February 1, 20243 minutes read

Digital business files have replaced many paper documents, and the volume of content is expected to soar in the coming years. Every day, I talk to organizations leveraging intelligent document processing solutions to help them cope with the digital document deluge. But even today’s automated platforms can fall behind.

Traditional machine learning lost a step

As document content and layouts change over time, systems require costly, time-consuming manual tasks that reduce efficiency and revenue. AI adds efficiency and accuracy to automated capture workflows. Unfortunately, machine learning models can also take time and resources to train and calibrate.

Machine learning accuracy drifts and degrades over time as the layouts of incoming documents change. Keeping models accurate relies on periodic updates by data scientists in a labor-intensive cycle of retraining, sometimes at the code and database level. These specialized skill sets and activities come at a considerable cost to the organization. Updates typically occur only periodically and without input from key knowledge workers.

Take the leap with continuous machine learning

There is an ongoing shift taking place from machine learning to continuous machine learning (CML). Many organizations have turned to CML to address their content classification and data extraction needs to enable intelligent document processing. With CML, models are updated on the go as they encounter new data and layouts in production. Updates occur in real-time in small batches, which reduces computational time. More importantly, CML reduces the data and human resources required to retrain machine learning models.

How does OpenText leverage CML for information capture and intelligent document processing?

OpenText leverages a CML approach that offers flexibility, accuracy, and efficiency for automated information capture while minimizing or eliminating manual machine learning model retraining.

OpenText information capture products and intelligent document processing solutions solve the machine learning challenge by embedding CML. An AI approach to information capture and data extraction, continuous machine learning eliminates data staleness through an ongoing refresh as the model self-corrects and relearns. Humans in the loop ensure data accuracy as part of daily production runs – eliminating the need for week- and month-long pauses as data scientists scrub data sets to retrain models.

The OpenText approach to CML relies on methodology embedded in its Information Extraction Engine (IEE). Data and differing layouts can quickly be reinforced with just a few clicks by a knowledge worker using a human-in-the-loop UI. IEE continuously assesses human feedback to reinforce or adjust the model accordingly. IEE eliminates the need for a team of data scientists to maintain and retrain machine learning models.


Ready to learn more about CML?

Download the Continuous machine learning: Your AI edge position paper for more information about:

  • How CML recognizes documents
  • How to ensure humans are in the loop
  • What’s coming next in CML for intelligent document processing

Share this post

Share this post to x. Share to linkedin. Mail to
Daren Gardner avatar image

Daren Gardner

Daren is the Director of Product Management for Capture products at OpenText. His team of product managers are dedicated to providing best-in-class capture technology, products and services. Daren has been focused on capture for more than 20 years and during this time his team has created many new innovative capture products and services. In his spare time, Daren enjoys restoring antique clocks and building full-size working replicas of astromech droids from Star Wars.

See all posts

More from the author

Stay in the loop!

Get our most popular content delivered monthly to your inbox.