Choosing an intelligent document processing (IDP) solution to automate your manual and paper-based processes can be difficult. With multiple options available, how do you ensure you pick the one that will provide the most return on your investment?
One key thing companies often overlook is the approach solutions use for extracting data. Solutions that use a “hybrid approach” leveraging multiple technologies and methodologies typically provide better results — with less cost and effort — than traditional IDP solutions that use just one or two.
We sat down with ClearDox® Chief Technology Officer Marc LeFebvre to discuss the differences between hybrid and traditional approaches to IDP software solutions, the advantages of the hybrid approach and more.
ML: A traditional approach involves using optical character recognition (OCR) technology in combination with one artificial intelligence (AI) and machine learning (ML) solution to extract data from documents. A hybrid approach leverages multiple technologies and methodologies in a plug-and-play fashion tailored for each document type. With a hybrid approach, data from a form-based document like an invoice would be extracted using different techniques than the ones used to extract data from a text-heavy document like a contract.
With both approaches, models must be built for each document type that needs to be processed, however, it’s a much more costly, labor-intensive effort with a traditional approach. For example, if you want to process bills of lading (BOLs), a couple hundred BOLs must be fed into the system so the AI/ML solution can learn how to extract the required data points from all the BOL formats you receive.
Most AI/ML technologies will extract around just 90 to 95 percent of what you need, so the process usually needs to be repeated with new documents until everything works correctly. It’s a very time-intensive and expensive endeavor that often involves a data scientist or AI/ML developer.
With a hybrid approach, the IDP solution can build models much more quickly and with less documents. If one AI/ML solution doesn’t find all the needed data from the document sample set, additional AI/ML solutions and even other technologies are used to find and extract it so the model is built right the first time around. Because things are up and running faster, implementation costs are significantly lower than when a traditional approach is used.
ML: I think that’s a fair statement because with a hybrid approach, the customer has more control over the extraction process — they can use the technologies that will deliver the most accurate results for each document type.
For example, our ClearDox solution can extract data from just about any document type with extremely high accuracy because there are a wide range of extraction options baked into it. These options consist of multiple OCR engines, multiple AI/ML solutions and our proprietary parsing engine.
We use all the options in conjunction with each other like a plug-and-play processing pipeline. For instance, inventory reports can be passed through an OCR engine, then through an AI/ML solution in combination with our parsing engine to fill in any gaps. Since we’re not committed to any one technology or product, we can use the one that’s best for each situation.
As mentioned, with a traditional approach you only get around 90 percent of where you need to go. If you process 10 documents, for example, eight might turn out great, but two won’t. So now you need to re-train your model by feeding it 50 more documents.
“With a traditional approach to IDP, it’s like using a hammer to solve every problem that pops up. With a hybrid approach, you don’t just have the hammer — you also have the screwdriver, wrench and pliers so you can use whatever’s most appropriate for the situation.”
Marc Lefebvre, Chief Technology Officer, ClearDox
ML: While all document types benefit from a hybrid approach, complex or unstructured documents such as contracts, BOLs and inventory reports definitely benefit the most. These documents often require custom data extraction and processing logic that’s not supported by AI/ML models. Successfully processing these documents requires a hybrid IDP solution like ClearDox that can leverage multiple methodologies and technologies.
Request a demo to see how ClearDox can benefit your organization.
ClearDox CTO Marc started writing code in 1996 and hasn’t looked back since. He has led software development for companies spanning a variety of industries including commodities, e-commerce and education.