Every business is subject to know their customer (KYC), In order to detect and avoid any fraud in financial transactions. When this KYC is done electronically then this process is known as e-KYC. Now a day from banking to telecom organization to other business e-KYC has become an inevitable feature to have. At ValueFirst we also understand this and our Chabot offering (Surbo) has taken care of it. Our offering based upon the hybrid solution, which can extract textual information out of provided identity card images. This solution is an amalgamation of standard image processing techniques and deep learning models.
Our solution based on 5 sequential steps, which start from the reshaping the image to standard size after that, in second step solution detects the edges in the images, in third step houghines are identified on detected edges and image get rotated if it is tilted and in the fourth and fifth step text detection and text recognition is done on Identity card respectively. We got around 85% of accuracy by our implemented solution.
The implemented approach for automatic text extraction from identity cards is consisting of five steps as discussed below.
- Reshaping to standard size
- Edge detection
- Hough Lines and angle correction
- Text detection from the identity card
- Text Recognition from detected text
Text extraction steps in implemented solutions
2.1. Step descriptions:
Following is the description of the steps in our implemented solution.
- Reshaping: As shown in figure 2.1 in the reshaping step provided identity card images are reshaped to a standard size. This reshaping will normalize the high resolution and low-resolution images to moderated resolution images (i.e. 1344, 1344), which is good for most of image processing techniques.
- Edge Detection: Once the image is reshaped, then it is fed for edge detection phase, in this phase instead of using canny edge detection technique, we have used convolution neural network model for detecting edges. This CNN model is not as aggressive as canny to even detect the minuscule detail as an edge, instead, this model detects the more clear edges which are like in the edges of the identity card.
- Hough Lines and angle correction: Hough Transform method is used to detect the shape of an image in mathematical form. It can detect the shape even Hough lines are broken or distorted a bit. In order to find out the Hough Lines first image has to go through the edge detection technique, which we have discussed in the second phase. Once we have found the Hough lines then we can identify the angles between different lines and based upon those angles the image gets tilted to the correct orientation.
- Text detection: Once the image shape and orientation are fixed, then our system approaches to the fourth phase, which is to detect and localize the text in the given identity card. In this phase also there is a dark horse (i.e. deep learning) at work, which does the job of localizing and detecting the text in the identity card.
- Text recognition: The last phase of our implementation is to take the areas where the text is detected and recognized, what text is written there. For doing this job we have used the tesseract library.
2.2. Implementation details
As discussed in the step descriptions that an identity card has to go through all the five phases (Reshape, Edge Detection, Hough Lines and Angle correction, Text detection, and Text recognition) to extract text out of the image. Following is the heuristic for the implemented solution.
Image→(Reshaping| Edge detection |Hough Lines|Angel Correction|Textdetection|Text Recoginition)→Text
Equation 2. 1 Heuristic of the implemented solution
In order to give better intuition of what is happening behind the scene, we have described our implementation in the form of heuristic. Following is the heuristic of our implemented solution.
Equation 2. 2 Heuristic for our solution.
Description of each step mentioned in this above heuristic already discussed in the step description section.
2.4. Data pre-processing:
As identity cards hold sensitive and personal information so there no public dataset available for Identity cards. It is very difficult to collect a reasonable amount of personal identity cards within an organization. In order to overcome this challenge, we have collected a few identity cards and then generated a reasonable amount of testing identity cards using image augmentation techniques. This image augmentation technique involves brightness changes in the images, height, and width shift in the images and horizontally flipping of the images.
3. Results and discussion
We have evaluated our implementation solution on 56 different identity cards rotated at a different angle, with different lighting conditions and with different zoom sizes, etc. When our implemented solution is fed with an image it produces an image with corrected orientation, an identified text area with a red color box on the image, and with the text recognized text. The input and output images sample is shown in figure 3.1.
Sample demonstration of our solution
On the testing dataset, our solution touches 85% accuracy on a diverse set of identity cards (these identity cards involve Aadhar cards, Pan Cards, and driving license). Following is the table which shows the accuracy of our implemented solution on various fields (these specific fields are chosen as they common in most of the identity card on which testing has run).
Text recognition accuracy
Following the bar, the plot shows the accuracy we get on different fields with a threshold line at 80%.
Barchart for accuracy on different identity card text field.
As demonstrated by bar chart for all of the field bars crosses over the red threshold line. It means the solution is able to identify mentioned identity cards fields correctly with an accuracy of more than 80%.
Traditionally the Identity card recognition was done using the image processing the technique, which is good for reading the text from the straight identity card images or the identity card which are not having varying resolutions, rotation tilts. Moreover, different lighting conditions tend to provide different outputs in traditional solutions. But in our implemented solution different resolutions of Identity cards and orientation of the images are automatically taken care of. For correcting the resolutions our system converts provided identity card to standard resolution and for correcting the orientation our system has used CNN based model and Hough lines techniques. The amalgamation of different phases in the implemented model has provided an accuracy of around 85%.
5. Future work
However, the implemented solution is giving an accuracy of around 85%. Yet there are the areas where we can make further improvements in the implemented solution. These areas are as follows:
- More robust orientation correction: Although the Hough lines technique works really well for correcting the orientation of images but for the tilt less than 45 degrees, if orientation tilt in an identity card is more than 45 degrees in either direction then Hough lines shows the sign of non-performance. We can train a deep learning model that correctly identifies the rotation (0 to 360 degrees) of the identity card. This will help to correct the orientation of the identity card more accurately.
- Multilingual support: Currently the implemented solution only able to extract the English text correctly. It does not support the national language. As a future, we would like to add multilingual support to this solution.
Reach out to us on email@example.com for any assistance and information.