In my previous blog, I discussed the challenges of cursive handwriting extraction, as well as dimensionality reduction and the importance of feature selection. Dimensionality reduction consists of picking the optimal subset of relevant features that are not redundant to provide the best input for a supervised model algorithm. It is imperative to understand of the fundamentals of fractal science to appreciate the power of feature selection techniques that rely on fractal science.
The formal definition of fractal as per Mandelbrot (Mandelbrot, 1977) is “a subset of the Euclidean space for which the Hausdorff- Besicovitch dimension strictly exceeds the topological dimension.” This definition is rather terse, but there is no consensus on the exact definition of fractals.
Another mathematically rigorous definition that is often cited is “a function that is continuous everywhere but differentiable nowhere.” Though Mandelbrot discusses the concept of fractional dimension in his papers published in 1967, he did not introduce the term“fractal” until 1975.
To understand this definition, familiarity with calculus is required. But it’s essential to make this subject matter accessible to everyone. Attempts have been made to understand fractals by taking recourse to the properties of fractals. The commonly cited properties of fractals may be of help to drive home this rather elusive topic that shakes the very fundamentals of our mathematical understanding – or lack of it.
Any object, be it natural or manmade, may be a fractal if it is irregular in shape, agnostic to the level of magnification, and contains copies of itself when broken into smaller pieces.
The Fractal Dimension
Due to the peculiar properties of fractals, traditional Euclidean geometry, which approximates all structures to their closest smooth structures (earth to sphere, mountains to cones etc,) will not hold water when dealing with fractal dimensions.
While the dimensions taught in Euclidean geometry are integral in nature (i.e., 1, 2….), the dimensions of fractals are fractional – hence the name. To illustrate this concept, consider these aspects of Figure 1:
The piece of paper (the background in Figure 1), that has several geometrical figures represented, is a plane
A plane has two dimensions.
The lines in the plane have one dimension
The point (circled) in the plane has zero dimensions, typically a circle drawn with zero radius
The cube in the plane has three dimensions.
The word “AntWorks” in Figure 1 may be considered to be a curve which is continuous everywhere but non-differentiable
Therefore, the word “AntWorks” is a fractal. It is neither a line nor a plane, and it contains self-similar copies (Don’t search for character A everywhere in the word. By repeating self-similar pattern, the curves that make the word are meant). Its dimension is neither one nor two. It is in between one and two – a fraction, say 1.7.
Figure 1: Fractal Dimension Illustration
The fractal dimension of an object or image is used as an approximation for estimating the intrinsic dimension of the data. Several methods are proposed in the literature for calculating the fractal dimension of an object. The popular among them are:
Differential box counting
Triangle box counting
Improved triangle box counting
ANTstein CMR uses its proprietary and efficient box counting method, (a hybrid approach which bends and modifies the traditional box counting to suit the needs) to estimate the fractal dimension to overcome the shortcomings of the popular methods available.
Cursive handwriting is a natural candidate amenable for fractal analysis. The roughness and other characteristics of the handwriting strokes that repeat themselves and their agnosticism to scale can be measured correctly by fractal dimension.
The higher the fractal dimension, the rougher the object. ANTstein CMR can cleverly compute the fractal dimension of data from structured and unstructured documents that contain block and/or cursive handwriting, as well as machine printed characters.
After engineering the features, ANTstein CMR trains labelled features using a proprietary fractal-based machine learning algorithm to detect and extract cursive handwriting. The model compares the fractal features (fractal dimensions, lacunarity and entropy) of the labelled handwriting characters with that of the extracted values from incoming samples and gives a composite confidence score to aid operational decisions.
The confidence is 100 percent when the difference between the fractal features (ground truth and predicted) are zero. The confidence score is close to 0 when the difference between the fractal dimensions is huge.
This approach assures more than 70% of accuracy just after initial setup. This is significantly higher than state-of-the-art, OCR-based models powered by deep neural networks. With its ability to learn new patterns based on its self-learning capability, typically a feedback loop, ANTstein CMR continues to improve accuracy of extraction over time, thereby aiding the reduction of dependence on the error-prone manual effort. It also aids intelligent automation of processes that involve handwriting such as cheque processing and application forms processing to name a few.
Apart from cursive handwriting extraction, ANTstein CMR also contains the following cool features powered by proprietary machine learning algorithms powered by fractal science:
Checkbox detection and extraction
Signature detection and verification
ANTstein CMR solution architecture is given in Figure 9.
Figure 9: ANTstein CMR Cursive Handwriting Extraction Solution Architecture
While digital transformation is moving ahead at a ferocious pace, handwritten documents are still widely used. As customer expectations for faster responses and 24/7 access to up-to-date information, organisations are challenged to find ways around manual processing.
Having a base understanding of the field of available technologies and how fractal science-based handwriting recognition falls on the leading edge of efficient data curation can put your enterprise ahead of the competition when it comes to providing customer service excellence.
Download your copy of Capture Cursive Handwriting using Fractal Science technical brief