In the era of data deluge, efficient data curation is a persistent challenge across organisations. Efficient data curation is aimed at the end-to-end management of data lifecycle – from acquisition through extraction and downstream consumption of such data.
Though the world is undergoing a digital transformation at a rapid pace, fully or partially handwritten documents are still in vogue, that warrant manual intervention. For example, application forms, cheques, historical manuscripts and artefacts etc.
Manual extraction is time consuming, laborious and error-prone. Many commercially available solutions for extracting data from documents, especially cursive handwriting, are far from delivering customer delight. Such products are heavily dependent on having a team of data scientists to train the variations, require large amounts of data and are hard to implement and maintain.
ANTstein Cognitive Machine Reading (CMR) is the only platform that seamlessly addresses customer pain points for extracting printed, block or cursive handwriting data from structured and unstructured documents using its proprietary machine learning algorithms powered by fractal science. It is language agnostic and requires no code, facilitating plug and play for ease of operation. It is easy to implement, and the machine learning engine of the platform can learn the continuously evolving variations in patterns using its self-learning capability. In this blog, we will highlight the unique approach taken to solve the mystery of handwriting recognition.
The Challenges of Cursive Handwriting Extraction
Optical character recognition (OCR) and optical font recognition (OFR) based techniques fare poorly in their ability to detect, recognize and extract offline cursive handwritten text. Owing to the post-facto nature of analyses in estimating the properties of the strokes - pressure, speed, slant, irregular spacing, etc, the problem at hand gets tougher. The complexity is further accentuated because of the connected nature of the characters that come with a host of variations.
Deep learning methods have been employed to learn the features of the handwritten text in recent years to help OCR improve its performance in extracting cursive handwritten text. The fundamental shortcoming of the OCR/OFR is its template-driven approach. The problem persists even after deep learning models have come into the fray. The challenges become magnified with poor quality of paper and/or the image. Hence, the outcome of such an approach yields poor accuracy and higher manual intervention.
The input for any supervised model algorithm constitutes the features of the data (text/image) along with their labels. The features may either be relevant or irrelevant. Sometimes the relevant features may be redundant.
The goal of dimensionality reduction consists of picking the optimal subset of relevant features that are not redundant. Selecting such a subset of features can help boost the model performance both in terms of improving the accuracy as well as reducing computation complexity. Such an optimal set is called the intrinsic dimension of the data.
The computational complexity of identifying the intrinsic dimension stems from the need to compare all the permutations and combinations (mCn) of the possible feature subsets to arrive at an optimal set. Unscrupulous use of too many features will lead to the curse of dimensionality. Too many dimensions will lead to an exponential growth in the amount of data points needed for training the model and generalization.
There are many feature selection methods that are used for dimensionality reduction such as principal component analysis (PCA), linear discriminant analysis (LDA) and factor analysis (FA) to name a few. Most of these techniques suffer from the assumptions of linearity, while real life data exhibits non-linearity.
In sharp contrast to traditional feature extraction techniques that suffer from limitations due to unrealistic assumptions, the family of deep learning models which are available in the market perform feature selection on the fly in their convolution layers. Feature engineering is controlled by the choice of architecture. They are also able to deal with non-linearity in data.
Yet these models are prone to computational complexity. They require humongous data to efficiently learn a plethora of variations in stroke patterns even when the quality of the document or image is good. Also, the modelers have very little handle on features extracted, their relative importance (relevance) and their combination that contribute to the model’s robustness and performance.
Fractal science and its components are in the sweet spot, providing the balance between the computationally costly deep learning models and the less effective traditional dimension reduction techniques to estimate the intrinsic dimension of the data.
Learn more about fractal science in part two of this blog series.
Download your copy of Capture Cursive Handwriting using Fractal Science technical brief.