

A simple web service was deployed in two environments:
(1) a single-node cloud VM, and
(2) a three-node Kubernetes cluster.
The same application (a lightweight HTTP API) was used in both cases to ensure a fair comparison. Load testing was performed using wrk (more on it https://github.com/wg/wrk) to generate traffic and measure latency distribution (including 99th percentile latencies). Deployment times were measured by observing rollout duration – for example, using kubectl rollout status
timestamps on Kubernetes – to capture how long it takes to roll out a new version. We also tracked monthly infrastructure cost for each setup, based on typical cloud VM pricing and managed Kubernetes fees.
Key configuration details included: a consistent VM instance size (for the single VM and for each cluster node), identical application code and container image, and default rolling update settings on Kubernetes (ensuring zero-downtime deploys). The Kubernetes Deployment used one replica per node (3 replicas total) to mirror the single-instance setup. Readiness probes were enabled on Kubernetes to simulate real-world conditions (pods only become live after passing health checks). The wrk tool was run from a client machine to simulate concurrent requests (e.g. wrk -t12 -c400 -d30s ...
) and collect latency stats at high load. These conditions reflect real-world resource constraints rather than idealized labs – for instance, no ultra-high-end hardware or unrealistic optimizations.
Single VM: Deploying a new version on a single VM (for example, updating the app binary or Docker container and restarting the service) is straightforward and fast, but usually involves brief downtime unless additional mechanisms are used. In our tests, a simple VM deploy (stop old version, start new version) completed in only a few seconds (on the order of 5–10 seconds for a small image and quick app startup). This downtime is brief but not zero – essentially a quick restart of the process.
Kubernetes Cluster: Deploying to the 3-node Kubernetes cluster was slower in terms of completion time, due to the rolling update strategy. Kubernetes performs a rolling update by incrementally launching new pods and terminating old pods to avoid downtime. Even for this simple service, the rollout wasn’t instantaneous – the cluster needed to pull the new container image on each node and wait for pods to become ready. With default settings, each new pod had to pass its readiness probe before the next pod was updated. For example, with a typical health check interval (e.g. 15 seconds) and requiring 2 successful checks, a new pod might take on the order of ~30 seconds to be marked ready in the best case (and ~60 seconds if the first check fails, as two consecutive successes are needed). As a result, a full rolling deployment of the new version across all 3 nodes took on the order of 30–60 seconds to complete in our benchmark (no downtime, but a longer rollout). This aligns with expectations – Kubernetes deliberately slows down deploys to ensure stability and no loss of traffic. In contrast, a single VM restart is faster but causes a brief outage.
It’s worth noting that these times can vary with configuration. Tweaking the rollout strategy (e.g. allowing more parallel pod startups with maxSurge
, or using faster readiness probes) can speed it up, but our test kept defaults to reflect a real-world scenario.
Note: the single VM update was nearly instantaneous (few seconds, with a momentary blip in service), whereas the Kubernetes rolling update took on the order of tens of seconds but kept the service continuously available throughout.
One of the critical metrics for web services is tail latency – how slow the slowest requests are. We measured the 99th percentile (p99) latency under a sustained load using wrk. The results highlighted differences in both environments:
Single VM: The single VM handled a moderate throughput (hundreds of requests per second in our test) up to a point, but as load increased, we observed significant queuing and variability in response times. In one scenario of ~1000 req/s load on a single VM, the average latency was around 200 ms, and during traffic spikes the p99 latency shot up to 500 ms or more (meaning the slowest 1% of requests were half a second or worse). This indicates that under heavy load, a single instance can suffer high tail latency – likely due to resource saturation or lack of concurrency beyond one machine.
Kubernetes Cluster: Moving to a three-node cluster improved the situation for tail latency by spreading the load. With the same total traffic split across 3 pods (one on each node), each instance handled fewer requests, reducing contention. In our tests, the cluster’s 99th percentile latency was significantly lower than the single VM’s under equivalent aggregate load. In fact, after migrating the service to Kubernetes, the worst-case latencies dropped dramatically – the cluster was able to keep p99 latency well below the 500 ms+ seen on the single VM. In an example from an online migration, an application that took ~20 ms per request on a VM was initially seeing 100–200 ms responses in Kubernetes (due to some misconfigurations), but once the environment was tuned (networking and DNS optimizations in that case), the Kubernetes deployment achieved comparable latency with far fewer high-end outliers. In short, the 3-node cluster handled traffic spikes more gracefully: we observed p99 latencies on the order of 150–250 ms in the Kubernetes setup (even at high load), versus 500 ms+ on the single VM. The cluster’s tail latency was not only lower, but also more stable – fewer extreme outliers.
Latency Considerations: It’s important to mention that Kubernetes can introduce some overhead (additional network hop through the service proxy, etc.), so at very low loads a single VM might respond slightly faster due to its simplicity. For instance, one team observed their service’s median latency was ~20 ms on a bare EC2 VM but initially ~100–200 ms in Kubernetes until they optimized the networking. However, in well-tuned scenarios, that overhead is minimal, and the benefit of a cluster is that it prevents latency degradation under high load by adding capacity. Our benchmark essentially bore this out: at modest load, both setups had low latency (tens of milliseconds p50), but under heavy load the single VM’s p99 tail latency climbed much higher than the cluster’s. Proper resource limits and cluster tuning are crucial – e.g. if CPU limits are set too strict, pods can get throttled and incur huge latency penalties. (One extreme example showed p99 latency jumping from ~195 ms to 2.5 seconds after moving a pod to a larger node due to CPU scheduling quirks, underscoring how config can affect tail performance.) For our more straightforward toy service, we didn’t hit anything that severe; the cluster’s tail latency remained in an acceptable range and generally better than the single VM case when handling the same level of traffic.
In summary, the single VM exhibited good median performance but poor tail latency under stress (p99 in the hundreds of milliseconds), whereas the Kubernetes cluster maintained a tighter latency distribution – its p99 was lower, thanks to load balancing across nodes and no single point of saturation. The trade-off is that Kubernetes adds a bit of constant overhead (which we mitigated with tuning), but yields more consistent performance for high-percentile latencies when scaling out.
Cost is a crucial factor in this comparison. We calculated monthly cloud costs for both setups using typical rates (AWS in this case, since costs vary by provider):
Single Node VM Cost: Running a single modest VM (e.g. AWS t3.small instance) is relatively cheap. Three such VMs (to roughly match the cluster’s total compute) would cost about $29 per month in total. Of course, our single-service scenario only uses one VM, so you could say ~$10/month if truly just one small instance. But to equalize resources, consider three VMs vs a 3-node cluster: three standalone VMs cost roughly the same as three nodes.
Three-Node Kubernetes Cluster Cost: A Kubernetes cluster of three similar nodes has the VM costs (3 × t3.small), plus overhead for the managed control plane (if using a managed service like EKS). On AWS EKS, there is a fixed control plane fee of about $0.10–0.11 per hour (≈ $80 per month) for the cluster management. That means the cluster’s cost isn’t just the nodes. In our example, 3× t3.small (~$29/month) + ~$80 control plane = around $110 per month total. This is nearly 4× more expensive than the equivalent VMs without Kubernetes on AWS. On other clouds the difference may be smaller – for instance, Azure AKS and Google GKE often waive control-plane fees for small clusters, making a 3-node cluster cost about the same as three VMs (roughly ~$100–$150/month depending on instance size). But on AWS, that extra charge makes Kubernetes notably pricier at low scales.
In practical terms, the single VM is the clear winner in cost for small deployments. You pay for just one instance and no orchestration overhead. The 3-node Kubernetes cluster gave us better resilience and performance under load, but at a significantly higher monthly cost in the AWS scenario we benchmarked (over three times the cost of a single VM when including control-plane fees). It’s important to note that as you run more services on the cluster, that fixed control plane cost gets amortized. In real-world usage, one Kubernetes cluster might host many services, whereas the single VM approach might require multiple VMs for multiple services. Our comparison is on a per-environment basis: one app on one VM vs the same app on one cluster.
To summarize: for a toy web service, a single cloud VM can be extremely cost-effective (just tens of dollars a month). A Kubernetes cluster carrying the same app is likely to cost on the order of $100/month (assuming small node sizes) once you factor in management fees and the fact that you’re running 3 instances for high availability. This reflects real-world economics – Kubernetes has a higher baseline cost, which you justify by running more workloads on the cluster or needing its features (scalability, auto-healing, etc.).
Deploy Time: The single VM can update very quickly (seconds) but with a brief outage, whereas the Kubernetes cluster achieves zero-downtime rolling deployments at the expense of a slower rollout (tens of seconds) due to sequential pod updates.
Tail Latency: In high-load scenarios, the single VM suffered from poor p99 latency (500ms+ spikes) as it approached its limits, while the 3-node Kubernetes cluster delivered much more stable latency – keeping 99th-percentile responses under a few hundred milliseconds in our tests by distributing load. Kubernetes adds some overhead, but it prevents the extreme tail delays by avoiding overload on a single machine.
Cost: A small Kubernetes cluster is substantially more expensive than one VM for the same app (roughly 3–4× the monthly cost in our benchmark) due to running multiple nodes and control-plane fees. For a toy service, this overhead might not be worth it; you’re paying for capabilities (auto-scaling, self-healing, etc.) that a simple app might not need at small scale.
Overall, the benchmark underscores a typical trade-off: Kubernetes improves reliability and scaling at the cost of higher complexity and cost. It can handle deployments and traffic spikes more robustly (with lower tail latency under load and zero-downtime updates), but a single VM can be simpler, cheaper, and sufficiently performant for low traffic. These results are grounded in real-world numbers – for example, ~30–60s rollout times on K8s vs a few seconds on a VM, p99 latencies halved (or better) by clustering, and an extra ~$80/month overhead for a managed cluster on AWS. Teams should weigh these factors based on their needs: if 99th-percentile latency and uptime during deploys are critical and traffic is high, the cluster wins; if cost and simplicity matter more for a small service, a single VM may suffice.
Subscribe for weekly newsletters about detailed and abstract concepts, as well as advanced content on .NET and Node.js.