Deep Learning Based Video KYC

Video KYC allows customers to use their smartphone cameras to validate themselves to the business. In the banking sector, video KYC is being used to validate and accelerate the loan and credit card application processes. Apart from banking, in other businesses also video KYC feature is becoming a new normal. At ValueFirst we are also offering advanced deep learning techniques based video KYC. Video KYC is essentially the amalgamation of face detection, recognition, and verification. Our implemented deep learning-based solution can take care of face verification along with face detection and recognition. Here we are using the pre-trained networks for the benefit of accurate face detection and recognition on our specific face dataset. This solution is based upon the 4 steps, which starts with the face detection in the image, in the second step, the detected image has been fed to the face embedding capturer step which gives the embedding of the face, in the third step the face embedding are fed to a logistic regression classifier which identifies the face against the database of known faces (e.g. who this person is?). Till the first 3 steps our solution does the face detection and face recognition task, but if one is interested in verifying a face against a known identity (e.g. is this the person?) then step four is worth considering. We got around 90% of accuracy on specific faces with the implemented solution.

1. Implementation:

As discussed in the abstract section, our face detection and recognition solution is consisting of four steps, where the fourth step is optional for face verification. Ideally, our four-step solution is for face detection, recognition and verification. But due to specific usage, we kept the fourth step as an optional step. Following is the description of various steps.

  • Face detection
  • Face embedding capturing
  • Face recognition
  • Face verification

Figure 2. 1 Face detection, recognition and verification steps in implemented solution

2.1 Step descriptions:

Following is the description of the steps in our implemented solution.2.1.1 Face Detection: As shown in figure 2.1 in Face detection is the first step of our solution. This step detects the face from the given image or video. We are using transfer learning for Face detection using the Multi-Task Cascaded Convolution network (MTCCN). Before passing the image to the MTCCN network image has to be resized to 160*160 pixels. The Implementation of MTCNN is available through pip installation.2.1.2 Face Embedding: In this step also we have used the transfer learning techniques. Here we are using the vggface2 network implementation for face embedding capturing. This step takes the co-ordinates of face given by the face detection step, then crop the face from the complete image and then reshapes the face to 224*224 pixels, in order to pass them as an input to the vggface2 network. We have used Keras implementation of the vggface2 network.2.1.3 Face Recognition: Once Face embedding step outputs the embedding a face then we pass these face embedding as features to the classifier (which is already trained on a supervised dataset), this classifier is responsible for recognizing the face against the database of know faces.2.1.4 Face verification: The last step is the face verification step. This step works upon the embedding of the face. This step is used for verifying whether the face belongs to the known identity. For example there an image of a person with the identity card produced by that person and we are interested to know whether the produced identity card belongs to that person. In such cases, we need the face verification step.

2.2 Implementation details:

As discussed in the step descriptions that an image has to go through all 4 steps (Face Detection, Face Embedding, Face Recognition, and Face Verification) in order to recognize the face in the images or verifying are the two faces in the image are similar. Following is the heuristic for the implemented solution.

Equation 2. 1 Heuristic of the implemented solution

2.2.1 Algorithm In order to give a better intuition of what is happening behind the scene, we have described our implementation in form of heuristic. Following is the heuristic of our implemented solution.

Equation 2. 2 Heuristic for our solution.

Description of each step mentioned in this above heuristic, already discussed in the step description section.2.2.2 Data pre-processing: Since very limited employee’s image dataset presents with us, so we have used the transfer learning techniques for this solution. But still, in order to learn a good amount of features for our specific image datasets, we need a decent number of images for training. In order to overcome this challenge, we have collected a single image per employee and then generated a decent amount of training images by applying the image augmentation techniques. This image augmentation technique involves brightness changes in the images, height, and width shift in the images and horizontally flipping of the images.

Results and discussion

We have evaluated our implementation solution on 10 different employees along with the group of employees in a single frame with different poses, different wearables, with different lighting conditions and with different zoom sizes. When our implemented solution is fed with a video or an image, it produces an image with a red color box on the face, recognized the name of the person along with the confidence in percentage. Here the image of a person in Identity card can easily be compared with a person's face on the identity card. The input and output images sample is shown in figure 3.1.

Figure 3. 1 Sample demonstration of our solution

On the testing dataset, our solution touches 90% accuracy on a diverse set of images with different poses, expressions, and lighting conditions.


The conventional machine learning methods such as Support Vector Machines, Principle Component Analysis, and Linear Descriptor Analysis, have limited capacity to utilize large volumes of data, hence these techniques do not produce appreciable accuracy in the face detection and recognition. With the emergence of big data and deep learning era, both the availability of data and networks to learn the complex problem has become a matter of past. With the transfer learning available now we don’t have to reinvent the wheel ourselves. Our implemented is based upon transfer learning which does a very good job on the images with different poses, varying facial expressions, and the images with different lighting conditions. This solution not only detects and recognizes the face, but this solution is also capable of mapping a face with the known identity. This 4 step solution has provided the 90% accuracy on all the tested images.

Future work

However, the implemented solution is giving accuracy around 90%. Yet there are the areas where we can make further improvements in the implemented solution. These areas are as follows: Training with More data: Although our implemented solution works very well with the fewer amounts of training example and giving 90% of accuracy in identifying the faces. But In order to get the extra stretch on this accuracy. We want to train our models with more training examples. Liveness detection: In case of face capturing through the video example our solution does not check for the liveness of the face. In order to have a more robust solution, In the future, we want to add the functionality of liveness detection in our implemented solution .Reach out to us on for any assistance and information.