Dispatching 3000 HTTPS requests/second on a $6 VM. Mario Macias' handcrafted blog

Sometimes I can see small peaks of visits to this blog, especially when a post reaches sites like e.g. Reddit or HackerNews. But the extra amount of load never spends a significant amount of server resources. Thinking about that, a question came to my mind: how many requests/second would my blog be able to dispatch with decent response times?

The engine running this blog does not use any kind of database but just the file system. The blog loads all the contents from the disk, renders the Markdown files, and stores them along with the rest of assets in an LRU cache that, given the small size of this blog, fits in few megabytes of main memory.

Given the small traffic that this blog receives (few tenths of users every day), my $6/month droplet at DigitalOcean is usually under 3% of its single vPCU consumption and at 50% of its 1GB of memory. Concretely, according to my New Relic Infrastructure monitor, the goblog process running this blog is normally near 0% of CPU and using around 50MB of memory (for both the code and the cached data).

Responding to that question allowed me to start learning the basics of K6, to simulate the connection of multiple users to different pages of my blog.

So I first created this Javascript file:

import http from 'k6/http';
import {sleep} from 'k6';

export const options = {
  vus: 200,
  duration: '5m',
};


const urlGroups = [ /* list of URLs */ ];

export default function () {
  let idx = Math.floor(Math.random() * urlGroups.length);
  let urlGroup = urlGroups[idx];
  urlGroup.forEach(http.get);
}

The urlGroups is a 2D array where each first-level entry is an array to the URLs of the payloads that a typical browser would get. For example a URL for for a blog post plus the URLs to the files/images it links.

The options array means that we would simulate 200 virtual users (VUs) continuously connecting to any random entry in the urlGroups list, during 5 minutes.

Running the above script from my laptop wouldn't allow me to send enough load to the server, so I installed the K6 operator on a 3x16CPUs Kubernetes cluster, copied the previous script in the ConfigMap and deployed the following file:

apiVersion: k6.io/v1alpha1
kind: K6
metadata:
  name: k6-sample
spec:
  parallelism: 48
  script:
    configMap:
      name: my-stress-test
      file: script.js

Running kubectl get jobs in the namespace I deployed the K6 instance shows a list of jobs that are running the 48 load generators. When they finish, each pod prints its connection statistics, so you need to manually aggregate them to get the global results. I did it with a few of Bash:

# Merge all the results from the logs into a single file
kubectl logs -l k6_cr=k6-sample --tail=20 > results.txt
# Filter the metrics lines, get the reported metric, and aggregate it
grep http_reqs results.txt | awk '{ print $2 }' | awk '{sum += $1;} END {print sum;}'
grep data_received results.txt | awk '{ print $2 }' | awk '{sum += $1;} END {print sum;}'

The results were, for both 200 and 1000 Virtual Users:

Metric	Value for 200 Virtual Users /	Value for 1000 Virtual Users
HTTP request duration	94ms (median) 103ms (p95)	299ms (median) 520ms (p95)
Requests/second	2100	2924
Data received	24 GB	33,6 GB
System CPU	84%	100%
System Memory	61%	66%
`goblog` process CPU	48%	56%
`goblog` process Memory	95 MB	134 MB

Clearly the goblog process is limited by the fact of being running in a Virtual Machine with a single vCPU and having to share it with other services in the system (e.g. the blog logs each request into a file that makes the systemd-journald service to spend 10% of the CPU during the high load scenario). Even it has to share the CPU time with the Garbage Collector that is provided by its own Go runtime.

Conclusion

Given the low traffic of this blog, being able to dispatch near 3000 requests per second is by far, more than enough.

However, if I wanted to increase the accepted requests per second, I could still do some actions before migrating to a bigger VM:

Optimize some services in my operating system. I didn't spent much time on it.
Reduce the size of the the returned payloads: compact HTMLs, compress images, use CDN URLs for third-party libraries...

After few minutes of load testing I spent around 6% of my monthly network transfer limit (1,000 GB). Before publishing this article I decided to implement a per-client rate limiter, to avoid that few uncautious readers drain my network quota by trying to reproduce the above experiments 😅.

Try to refresh compulsively this page and you'll end up receiving a nice HTTP 429 error. You will be able to refresh again after few seconds of inactivity.