Skip to main content

Benchmarks

Quanton delivers up to 4× faster performance across a variety of industry-standard workloads — including TPC-DS (read-heavy analytics), TPCx-BB (mixed query/write), TPC-DI (incremental ingestion), and LakeLoader (bulk load) — compared to OSS Apache Spark. See the full results in Apache Iceberg on Quanton: 3x Faster Apache Spark Workloads.

Run the TPC-DS benchmark yourself

The steps below walk through running all 99 TPC-DS queries against both OSS Spark and Quanton on your local machine using minikube.

Production results at scale

This setup is sized to fit on a developer laptop. The same suite scales up to larger Parquet datasets and node counts for production-grade comparisons — see Industry benchmarks below for results on real cloud infrastructure.

Prerequisites

Clone and build

git clone https://github.com/onehouseinc/quanton-operator
cd quanton-operator

# Build and load the data generation image
docker build -t tpcds-datagen:latest -f benchmarks/Dockerfile.datagen benchmarks/
minikube image load tpcds-datagen:latest
minikube image load apache/spark:3.5.0

Run

./benchmarks/run.sh

The script generates TPC-DS data, submits all 99 queries against both OSS Spark and Quanton on the same cluster, then collects the results.

Watch the Spark UI

While a phase is running, port-forward to the active driver pod to open the Spark UI at http://localhost:4040:

# Find the live driver pod
kubectl get pods -A | grep driver

# During Phase 3 — OSS Spark queries
kubectl port-forward oss-spark-tpcds-driver 4040:4040

# During Phase 4 — Quanton queries
kubectl port-forward quanton-tpcds-parquet-driver 4040:4040

The script runs phases sequentially, so only one driver is live at a time. Re-bind the port-forward as the script transitions between phases.

Interpreting results

When the run completes, the script prints a table to stdout:

Query | OSS-Parquet(s) | Quanton-Parquet(s) | Speedup

Raw JSON results are saved to ./results/sf_<scale>/ — one file per engine. Compare per-query and total wall-clock time using the Speedup column. Results vary by cluster size and data scale; larger production clusters will show more pronounced gains.

Troubleshooting

./benchmarks/run.sh exits silently at Phase 0 with no error message after the "Building datagen Docker image" line.

The script uses set -euo pipefail, so any failed command exits without further output. Three common causes:

  1. Stale Docker client — earlier Docker Desktop versions ship a client (API 1.43) that's incompatible with the daemon embedded in newer minikube releases (API 1.44+). The build fails immediately. Update Docker Desktop to the latest version.
  2. minikube isn't runningeval $(minikube docker-env) returns env vars pointing at a non-existent daemon. Check with minikube status; if host: Stopped, run minikube start.
  3. Spark Operator not installed (or webhook not ready) — the script gets past Phase 0 but Phase 2 (datagen submission) fails with failed calling webhook ... connect: connection refused. Reinstall per Local Quickstart, or wait ~30 sec for the webhook pod to become Ready and re-run.

To see the actual error when the script exits silently, run the failing step manually:

eval $(minikube docker-env)
docker build -t tpcds-datagen:latest -f benchmarks/Dockerfile.datagen benchmarks/