IDP и OCR в вопросах и ответах: Главное, что нужно знать

When we were preparing this material, we argued for a long time about which characters could be used to visually compare OCR and IDP. The Coen brothers, Thor and Loki, Caesar and Brutus, and many others were suggested. As you can see, we at Smart Engines did not come to a consensus on this matter.

`

Hello, tekkix! Today we will discuss what IDP and OCR are, their fundamental differences, and where the truth lies (and Smart Engines). Without long introductions - let's go!

For those who want it in a nutshell ⬇️

Hidden text

OCR significantly surpasses IDP. And here's why -

  • IDP operation is impossible without OCR technologies. No intelligent processing can occur without prior recognition and extraction of data from the document;

  • contextual analysis, which underlies IDP, can correct OCR errors, but harms data processing accuracy. For users who require complete correspondence of information in the document and its digital copy, this is unacceptable;

  • good OCR works accurately, autonomously, and does NOT require external operators in the document image processing. No HITL and clouds! And if there is accuracy, then IDP has nothing to correct;

  • people, even if they are confused by the terms, need quality OCR. The market is only growing, but few can create their own quality OCR - let's be honest, only a few. And those who want to but cannot, have to take someone else's OCR (from Google or Amazon) and sell it in IDP packaging;

  • for science, IDP does not exist, at least for now. This follows from the statistics of mentions of OCR and IDP in scientific publications and reports.

More details - below.

Contents:

  • What is OCR

  • About the application of OCR

  • A few words about IDP

  • What's the difference?

  • Part about milk 🥛

  • And part about the cow 🐄

  • How about scientifically?

What is OCR

Optical Character Recognition or OCR is a technology that allows you to recognize and analyze characters in images and video streams and convert them into a machine-readable editable format. In other words:

OCR is a technology for "reading" an image and extracting textual information from it

Modern OCR solutions can automatically improve image quality, add contrast, or increase sharpness to improve recognition accuracy. OCR algorithms using machine learning and ultra-light neural networks identify and extract document content, and if difficulties arise, they mark the problematic area for subsequent human evaluation.

OCR extracts information from any source: from scans, photos, or videos, in an application or browser window, etc. Moreover, it is possible to "read" and recognize with character-by-character accuracy any, even frankly unsuccessful "inputs": with distorted proportions, creases, spines, heraldic lines, security elements such as holographic details or guilloche background, and the like. And all this is fast and reliable, without entrusting the content of documents to third parties. Our OCR, by the way, has recently become available even for document verification to detect forgeries.

In general, the main task solved by OCR is the automatic accurate conversion of characters into text that can be edited if necessary. And the main direction of work in this area today is document recognition, including ID recognition and authentication.

Read more about the history of the development of Russian computer vision technologies, which recently celebrated their thirtieth anniversary, here and here.

Russian OCR is 30 years old. We recall how the first domestic recognition technology appeared (Part 1. OCR Tiger) In 2023, the first Russian commercial text recognition technology turned exactly 30 years old ...habr.com

The rapid development of OCR technologies in our country prompted programmers to create a program capable of scanning a key document - the Russian passport. This opportunity opened the way for subsequent innovations and improvements. We also talked about how the "mastering" of the Russian passport by domestic OCR systems took place here.

So, conclusion number one: modern high-quality OCR is fast, accurate, and reliable. In a word, cool.

On the use of OCR

For all industries without exception that actively interact with documents, whether it is banking, retail, telecom, libraries and archives, medicine or logistics, industry and much more, OCR has long become a familiar and integral technology in the work process. Don't believe it? - count for yourself what you are familiar with from the listed.

  1. Digitization of text data: OCR is widely used to convert paper documents or books into digital files, making it easy to search, edit, and store information electronically. This, by the way, is where it all started. A paper document can be torn, lost, or given to a dog to chew on, but a virtual one is unlikely. In addition, digitization automatically means increased accessibility, and users understand this perfectly. Proof of this is the piles of digitized and converted into PDF files important books, invoices, receipts, and any other documentation.

  2. Automation of business processes: OCR allows companies to automate data entry, minimizing manual input, and in some tasks completely eliminating it. This significantly increases productivity and reduces the risk of errors. This is especially important in areas where accuracy and confidentiality are required. The result is a very real benefit from saving on costly "oops" and "oops((.".

  3. Fast text recognition on images: Used in server and mobile solutions, it allows recognizing text (including handwritten) from photos or scans. This feature allows you to extract information from bank statements, legal contracts, invoices, and other documents of any size, shape, and seriousness in a matter of moments. Recognized and forgotten.

  4. Security and control systems: OCR is used in access control systems, where it is necessary not only to read the data of identity documents but also to check their authenticity. To not let in those who should not be let in and to detect fraudsters and generally make our world safer. For example, such solutions are effectively used today in automatic checkpoint systems at airports, aviation and railway ticket offices, and, of course, in banks.

There is no truth in the sheets: how can universities quickly and safely enter applicants' data? In a week, universities will start accepting applicants' documents for all forms of education. Collect and send...habr.com

A few words about IDP

Now to IDP. Here, everything is not so straightforward: it has not yet been possible to formulate a more or less coherent (and approved by the scientific community) definition, alas, for anyone. IDP stands for Intelligent Document Processing. In a nutshell, it combines OCR with text interpretation technologies, extraction of valuable information, and processing this information like a human. At the same time, it demonstratively ignores OCR and is presented as a new stage of recognition systems. In short:

IDP reads and "thinks" about the extracted text, evaluates and refines it

Some argue that the distinguishing feature of IDP is the use of natural language processing (NLP) technology. Thanks to it, the extracted information is built taking into account the context of the original data. And the more interpretations a single word or phrase has, the more complex the process becomes. For example, "замок" - is it a fortress or a door mechanism? And "машина" - is it a car or a computer? But these are still flowers, what about "coffee drink", "grain product", "carbonated drink", "dairy product"? And "red light"?

Here lies another dog. As noted above, IDP uses OCR to convert text into a machine-readable format (this is an essential step without which even a super-IDP will not work), and then uses machine learning technologies to interpret the data contained in the document. Therefore, the better the OCR technology, the faster and easier it will be to integrate and work with IDP. But without using OCR, further "intelligent" work with the data is impossible. Which is not surprising - they simply will not exist in text form!

Is it correct after this to compare OCR and IDP as equal, independent competitors - the question is rather rhetorical. The fact is that all illustrative evidence of the cosmic speed that IDP is capable of developing in competition with abstract little people and Natashas is absolutely true! Only you need to thank not IDP for this, but OCR at its core.

The second conclusion: the functioning of IDP is simply impossible without OCR technologies. This, as they say, is the foundation.

The tale of how we were looking for new vector extensions on ARM. At Smart Engines, we are actively engaged in low-level optimization of neural networks. Our libraries r...habr.com

What's the difference?

Now to the facts. OCR is a computer vision technology recognized by the scientific community, which is successfully used worldwide and successfully performs a clearly defined task: it recognizes the content of any document - text, graphs, tables, images, diagrams - and outputs it absolutely accurately. All information processed by OCR algorithms becomes available for editing, searching, analyzing, and any other manipulations that can be performed with textual data. The content of the document in the final digital version does not differ from the content of the original. Noted.

Now to IDP. This is a software solution that, according to open sources, collects, transforms, and processes data from documents based on AI. Moreover, as we have already found out, collecting, transforming, and processing information here is not so much a unique ability of IDP as a whole technology, but rather a functionality of OCR algorithms "embedded" in IDP. Today, IDP is often presented as a more comprehensive intelligent automation tool, in which the work of OCR is just a quick initial stage, which in 2024 is not even worth stopping at. But this is not quite true.

Everything that the IDP technology boasts about is essentially not done by the IDP "as a whole", but by the inseparable OCR technology. Let's go step by step -

  • Distinguishing types of documents, fields, tables, and their contents. Today, this can hardly be called an achievement, let alone a know-how. Cognitive Technologies' OCR could do this three decades ago!

  • Using AI algorithms in work. Look at the resources of the most prestigious conferences on artificial intelligence and you will not find a section dedicated to IDP. But document recognition and analysis using OCR - absolutely!

  • (and of course) Processing thousands of documents at high speed. Speed is not something to boast about these days. But it is nothing more than a technical characteristic. Some developers today achieve impressive performance even on mobile devices! (yes, this is about Smart Engines)

In short, all the key selling bases of IDP are actually performed by OCR. But what happens after OCR text recognition - the so-called processing - raises justified doubts. Round.

Let's summarize in simple words:

IDP equals OCR plus processing. No questions about OCR, plenty about processing.

“To reach the industrial level of Western countries, a specific task needs to be set. No one sets it for anyone” “A joke characterizing our industrial approach: if you set a task for an Indian to do something in a month ...habr.com

Part about milk 🥛

Thus, the entire hyped "unique" potential of IDP is more of an add-on than a technology per se. And so be it! But the second component of IDP - processing - is sometimes frankly dangerous for business documentation, bringing chaos where strict order is needed.

Judge for yourself: if the original version of an important document says: 100 (thousand), is it necessary to think and correct this? Suppose it is indeed a simple typo, not a deliberate distortion. OCR will save it in the digital copy and signal this. What to do next is up to the person. But predicting the behavior of IDP is much more difficult: what do you like more - 1000 (thousand) or 100 (hundred)? In which direction will the AI's imagination fly?

All this contextual thinking is an example of processing. Another example is when milk turns into cheese, cottage cheese, or sour cream. It seems to us that in the agro-industrial sector it is appropriate and useful, but in the field of technology, it can do considerable harm. Do you disagree with the example of milk? What if we are talking about the textbook phrase execute cannot pardon? Where will the comma be placed here and, most importantly, is it needed at all if it is not in the original?

In practical terms, this situation poses a certain threat to the customer who wants to obtain accurate source data in digital form. And it doesn't really matter whether it's a Russian passport, a credit agreement, an invoice, or a handwritten leave application. Sometimes, the vital things for the client are hidden precisely in the non-obvious details, and there are so many of them in all our variety of documents today that they can be interpreted and "corrected" endlessly. But does the user really need it?

Hence the third conclusion: if you want to get all the data in digital form intact, you need OCR.

How to correctly generate training data for OCR? At Smart Engines, we write a lot about document recognition. And, of course, for document recognition...habr.com

And the part about the cow 🐄

A reasonable question may arise here: why is there so much hype around IDP lately? What are all these hypothetical cases and far-fetched comparisons for, from which - oh miracle! - IDP emerges as the unchanging winner? The answer is prosaic: it's nothing more than the power of marketing. Quoting a classic: it's impossible to convince consumers to buy your toast for six dollars. But if you just play a little with the name and put a crouton up for sale - easily!

``

Let's look at the situation from the perspective of marketing theory. OCR occupies a high market share, but the growth rate of technology sales has slowed down. According to the BCG matrix, we have a classic cash cow (we didn't come up with this, don't think so). Here's what Wikipedia says about it:

High market share, but low sales growth rate. "Cash cows" need to be protected and maximally controlled. Their attractiveness is explained by the fact that they do not require additional investments and at the same time provide good cash income.

To qualitatively change the situation and give a new impetus to the product, it was simply decided to repackage it. But one nuance emerged: really high-quality OCR is expensive. But for IDP, very little is needed: any free recognizer, publicly available text processing models, and 1-2 people to keep an eye on all this. Voila! You can shout left and right that you have learned to solve a super task. But will it be reliable, convenient, and controllable for the client

Comments