What is OCR software? This is a common search term in Google searches, and even accountants regularly ask this question. OCR is a technology that is widely used around the world, often without people realising it. In this blog post, I explain how it works and what the main advantages are.
What is OCR?
It is the abbreviation for “Optical Character Recognition”. It can read characters from an image and prepare them for further processing.
What does OCR software do?
OCR software has been used for decades to digitise books and paper documents. One example is scanners and copiers that are equipped with OCR technology and enable paper documents to be scanned directly into editable Word files or saved as PDF files.
In recent years, technical innovations have also made new targets available. These include the automatic recognition of number plates, traffic signs (autonomous driving), passports and driving licences (identification).
The term OCR software is increasingly becoming a collective term for technologies for many different purposes. One example of such an OCR purpose is the niche of invoice recognition, which we deal with at TriFact365.
How does the OCR software work?
The technology is complex, but can actually be explained simply in 3 steps: (1) input, (2) throughput and (3) output. Now “input, throughput, output” are characteristics of an open system(https://en.wikipedia.org/wiki/Open_systeem), something we also use in TriFact365. Based on these 3 steps, I will explain the concept of OCR in more detail:
1. reading in images (input)
Everything you scan or photograph is an image and can be read, provided of course that it is supplied in the correct format. Examples are images of: Books, magazines, work instructions, business documents and of course invoices.
2. character recognition (throughput)
Once an image has been supplied, the actual recognition of the characters takes place. This consists of 3 phases (source: https://en.wikipedia.org/wiki/Optical_character_recognition)
In the first stage (pre-processing), the OCR software checks whether the image has just been scanned in size, whether the edges are smooth, and various other operations are performed to optimise the image supplied for the next stage.
In the second phase, the OCR software analyses the image at pixel level and identifies letters, numbers and other punctuation marks, for example. The underlying techniques can be very complex and usually consist of neural networks and computer vision-like techniques.
In the third stage, the accuracy can be further increased by limiting the results using a glossary (lexicon). This is a list of words that may appear in the document.
3: Export of raw data (output)
The output of the OCR software (usually a file) can therefore contain letters (multilingual), numbers and other characters. So if you run an invoice through OCR software, the raw data output is not yet a booking proposal. Why is that? Because the jumble of characters has no relationship to the fields of a journal entry.
Customised OCR software
The providers of OCR technology have not been idle in recent years. OCR systems are increasingly being optimised for processing very specific data. I have already written about applications for autonomous driving and identification. This is backed by billions in investment from Big Tech (Google, Amazon, Facebook, Apple and Microsoft), for example, but further investment is also flowing into innovation and start-ups from the automotive sector (autonomous driving) and banking/SaaS platforms.
As a Dutch niche provider, we at TriFact365 also work intensively on our self-developed software for the interpretation of raw OCR data.
How OCR from TriFact365 works
All digital booking documents received by TriFact365 pass through our self-learning software. The aim is to recognise 100% of invoices and generate automatic booking proposals.
TriFact365 independently develops machine learning (“OCR+”), which enables us to take the recognition of invoice data and the allocation to booking proposals to a much higher level than was thought possible just a few years ago.
Our OCR claim
The journey TriFact365 embarked on a few years ago is paying off. Recognition rates continue to increase across all measured customers and our unique approach to real-time rule recognition is now live for all users and showing promising results. We are currently achieving a performance of around 90% correctly recognised fields across all customers.
In view of the changes we will be launching in 2021 and the many innovations we already have in the pipeline for 2022 and 2023, it seems realistic to us that invoice recognition will exceed 95% in the next two years. Our goal is to achieve invoice recognition of over 99% with self-learning OCR software.
The above measurements are backed up by our internal measurements and reports. Our team of OCR specialists concludes that a proportion of invoices are already processed 100% error-free. Therefore, the “automatic chargeback” is announced as an improvement at Expo 2021.
4 Advantages of TriFact365 OCR software
TriFact365 software includes super-fast and self-learning OCR software that can process pages and produce raw output with punctuation in a fraction of a second. As a user, you won’t notice any of these techniques under the bonnet and will enjoy the following benefits.
Advantage 1: Automatic conversion of files into the correct OCR format
Some users scan in PDF format, others in JPG or TIFF format. As a universal submission portal, TriFact365 therefore also accepts Word, Excel and all common scanned file formats in addition to PDF format. TriFact365 automatically converts these into a format that can be read by our OCR software. So no action required, TriFact365 takes care of everything for you.
Advantage 2: All files are read by the OCR software
With TriFact365, all incoming documents are read by our OCR software immediately after delivery. This automates the tagging of documents after they have been entered, which in turn saves work steps and therefore time when processing accounting documents.
Advantage 3: OCR software suitable for all business documents
At present, mainly accounting documents such as incoming invoices, sales invoices and receipts are processed with OCR. This is to be extended to business documents such as contracts, annual financial statements, etc., which will then be made searchable.
Benefit 4: Combine OCR output (data) with machine learning (AI) and generate automatic booking suggestions down to line level.
By using OCR on a large scale with machine learning, our cloud software presents accurate booking proposals (journal entries) within seconds. After that, all you have to do is perform a visual check and with one click you have the invoice posted to your accounting system in no time at all. Helpful functions enable you to make the invoice processing process even smoother.