Benchmarks
Quanton delivers up to 4× faster performance across a variety of industry-standard workloads — including TPC-DS (read-heavy analytics), TPCx-BB (mixed query/write), TPC-DI (incremental ingestion), and LakeLoader (bulk load) — compared to OSS Apache Spark. See the full results in Apache Iceberg on Quanton: 3x Faster Apache Spark Workloads.
Run the TPC-DS benchmark yourself
The steps below walk through running all 99 TPC-DS queries against both OSS Spark and Quanton on your local machine using minikube.
This setup is sized to fit on a developer laptop. The same suite scales up to larger Parquet datasets and node counts for production-grade comparisons — see Industry benchmarks below for results on real cloud infrastructure.
Prerequisites
- minikube installed and running
- Docker
- kubectl
- Spark Operator and Quanton Operator deployed — see Local Quickstart
Clone and build
git clone https://github.com/onehouseinc/quanton-operator
cd quanton-operator
# Build and load the data generation image
docker build -t tpcds-datagen:latest -f benchmarks/Dockerfile.datagen benchmarks/
minikube image load tpcds-datagen:latest
minikube image load apache/spark:3.5.0
Run
./benchmarks/run.sh
The script generates TPC-DS data, submits all 99 queries against both OSS Spark and Quanton on the same cluster, then collects the results.
Watch the Spark UI
While a phase is running, port-forward to the active driver pod to open the Spark UI at http://localhost:4040:
# Find the live driver pod
kubectl get pods -A | grep driver
# During Phase 3 — OSS Spark queries
kubectl port-forward oss-spark-tpcds-driver 4040:4040
# During Phase 4 — Quanton queries
kubectl port-forward quanton-tpcds-parquet-driver 4040:4040
The script runs phases sequentially, so only one driver is live at a time. Re-bind the port-forward as the script transitions between phases.
Interpreting results
When the run completes, the script prints a table to stdout:
Query | OSS-Parquet(s) | Quanton-Parquet(s) | Speedup
Raw JSON results are saved to ./results/sf_<scale>/ — one file per engine. Compare per-query and total wall-clock time using the Speedup column. Results vary by cluster size and data scale; larger production clusters will show more pronounced gains.
Troubleshooting
./benchmarks/run.sh exits silently at Phase 0 with no error message after the "Building datagen Docker image" line.
The script uses set -euo pipefail, so any failed command exits without further output. Three common causes:
- Stale Docker client — earlier Docker Desktop versions ship a client (API 1.43) that's incompatible with the daemon embedded in newer minikube releases (API 1.44+). The build fails immediately. Update Docker Desktop to the latest version.
- minikube isn't running —
eval $(minikube docker-env)returns env vars pointing at a non-existent daemon. Check withminikube status; ifhost: Stopped, runminikube start. - Spark Operator not installed (or webhook not ready) — the script gets past Phase 0 but Phase 2 (datagen submission) fails with
failed calling webhook ... connect: connection refused. Reinstall per Local Quickstart, or wait ~30 sec for the webhook pod to become Ready and re-run.
To see the actual error when the script exits silently, run the failing step manually:
eval $(minikube docker-env)
docker build -t tpcds-datagen:latest -f benchmarks/Dockerfile.datagen benchmarks/