Scale Chatbots to handle heavy concurrent traffic
In this blog, we will talk about the challenges and approaches to handle large number of concurrent users / persistent connections in chatbots for business platform, Surbo; built on python stack. As an enterprise product, scaling is essential to handle sudden spikes of concurrent traffic along with overall increasing traffic. Considering the nature of chatbot application, more and more concurrent connection requests are expected, and persist for couple of minutes. We had targeted to hit 100k concurrent users to conclude the technical design validation.
During initial design validation runs with Uwsgi seems to be bottleneck, as every requests runs in its own cleanly separated thread. A blocking request (such as a synchronous database call or an API call) will not influence other requests. The maximum amount of concurrent connections is limited by number of workers or threads and this is known to scale badly when you have the need of tens of thousands of open connections.
In order to overcome this behaviour, we thought of a model in which every time a request is made, the thread does not execute the tasks, instead defer the execution of tasks needed to be done to another moment and maintain a persistent connection. To do this, surely we required event-driven, non-blocking runtime environment.
Then we evaluated few options for asynchronous event based framework i.e.
Approximate performance results were similar without any major deviations in our case hence we preferred to proceed with tornado.
Python do have other Asynchronous frameworks i.e.
And python module i.e. Asyncio. You can evaluate and consider as per usecase.
Using Tornado, we build websocket connections with our chatbot in order to make simultaneous persistent connections and improve the user’s experience with the real time application. It provides the bidirectional communication over a persistent connection between a client and server which is the core for our chatbot API’s. It helps us to scale at massive rate without any overhead of hardware, key point to note.
We have made async calls to I/O bound process, which actually pass control further to actually perform the tasks, and report to main thread once process is finish by generating event with the help of callback, and then send response to the client.
[Torando framework works on event loop i.e. keeps polling for event, connected and waiting for a callback event to occur]
Testing on Asynchronous model with websocket persistent connections, able to achieve few thousands of connections but failed to hit minimum expectation of 100k connections, after few configurations, able to achieve the milestone, here is how?
In order to open the persistent connections with our backend servers or proxy servers in between, an important part is the ability of load testing client to establish a lot of concurrent connections. Lot of time, our clients are not able to generate the amount of load that we required for our servers, the problem isn’t with the client but something else at the hardware level.
First, we increased the OS level (Linux in our case) open file size limit for more connections as TCP connection/socket also opens a file. Setting this ulimit to some higher number (depending on number of concurrent TCP connections), by default this limit is 1024, check with ulimit -n command.
In case you are wondering how to change the limit, here is the quick reference:
a. Set the amount of possible open files handle in /etc/sysctl.conf:
b. Set the soft and hard nofile limit to the file /etc/security/limits.conf:
* soft nofile
* hard nofile
root soft nofile
root hard nofile
The ‘*’ is representing that we are setting the limit for all the users except root, and for root we need to specify it separately.’hard’ and ‘soft’ is the hard and soft limit. After setting this configuration, reboot the system.
Second, as we were doing load testing with single client machine, unable to open more concurrent TCP connections which mean that number of outbound connections are dependent on the number of ports available in system.
This is the problem of TCP port exhaustion means that the system doesn’t have more ephemeral port to make more TCP connections with our servers.
The default range is most commonly 32768 through 61000, means approx. 28000 connections possibility.
This concern can be handled by reducing the ephemeral port exhaustion with the Linux kernel, changing the range from the default to 1024 through 65000 is a practical way to double the number of ephemeral ports available for use.
echo 1024 65535 > /proc/sys/net/ipv4/ip_local_port_range
Now we were able to make 65000 connections from single machine, we added another machine as initial target was to hit 100k, and decided to use 2 core CPU and 8 GB RAM per machine.
Finally with all above changes in configurations, and load by generating users to create persistent websocket connections to the server, we hit 100000 persistent connections, well done!
Take a quick look at results, more than 100k concurrent TCP connections have been established with RAM usage at 7.5 GB, and these connections weren’t idle, performing some I/O bound and memory computation tasks on these connections.
Here look at the client wise persistent connections (2 machines used):
As we could imagine spreading over millions of users, this would serve required persistent connections with support of scalable hardware, unlikely to suffer the slowdown issue.
We are adding more client machines for more connections, would share the next set of results and hardware expectation soon in Part 2 of this blog.
Surbo, chatbots for business platform has already developed great intelligence; now embrace with more power for amazing experience. Try it at www.surbo.io, you would love it! To know more, drop in a mail on email@example.com.
Keep watching this space for more tech news! We are excited to share more learning & experiences on new technology!