Run Quanton on Dataproc on GKE (Experimental)
This guide runs the Quanton Spark engine inside Google's Dataproc on GKE — jobs are submitted through the Dataproc Jobs API, but the Quanton runtime (with its native Velox engine) executes the work. It uses a custom Spark container image plus the Quanton license credentials minted by the Quanton Operator.
This is an experimental integration. Dataproc on GKE provides and manages the Spark runtime, so this guide uses a custom container image and a runtime overlay to bring Quanton's engine into the managed pods. For a fully supported Quanton deployment, use the GKE guide (Quanton Operator on a standard GKE cluster).
How it works
Dataproc on GKE mounts its own Spark at /usr/lib/spark and runs an entrypoint
that, before launching Spark, executes an optional /opt/init-script.sh. We use
that hook to overlay Quanton's Spark distribution onto SPARK_HOME at runtime,
and we mount the Quanton license (JWT + issuer cert + mTLS cert) that the engine
validates against the Onehouse control plane.
Prerequisites
- Google Cloud CLI installed and logged in
- kubectl, Helm >= 3.x, Docker,
jq, andpython3 - gke-gcloud-auth-plugin
onehouse-values.yamlfrom the Onehouse console
Set a few shell variables used throughout:
export PROJECT=<project-id> REGION=us-west2 ZONE=us-west2-a
export GKE=quanton-dpgke VCLUSTER=quanton-vc
export AR=${REGION}-docker.pkg.dev/${PROJECT}/quanton
export BUCKET=gs://quanton-dpgke-staging-${REGION}-${PROJECT}
Cluster setup
Step 1: Enable APIs
gcloud services enable container.googleapis.com dataproc.googleapis.com \
artifactregistry.googleapis.com compute.googleapis.com --project $PROJECT
Step 2: Create a Standard GKE cluster
Use x86 nodes — Quanton's native engine targets x86_64 / Graviton.
gcloud container clusters create $GKE --zone $ZONE --project $PROJECT \
--num-nodes 1 --machine-type n2-standard-4 \
--workload-pool=${PROJECT}.svc.id.goog
gcloud container clusters get-credentials $GKE --zone $ZONE --project $PROJECT
Step 3: Add a small untainted system pool
Dataproc's node pools are tainted (dataproc.googleapis.com/pool), which
kube-dns can't tolerate. Add a small untainted pool so cluster DNS stays
healthy (use a pd-standard disk to avoid the regional SSD quota):
gcloud container node-pools create system-pool --cluster $GKE --zone $ZONE \
--project $PROJECT --machine-type e2-small --num-nodes 1 \
--disk-type pd-standard --disk-size 30
Step 4: Artifact Registry + staging bucket
gcloud artifacts repositories create quanton --repository-format=docker \
--location=$REGION --project $PROJECT
gcloud auth configure-docker ${REGION}-docker.pkg.dev --quiet
gcloud storage buckets create $BUCKET --location=$REGION --project $PROJECT
Step 5: Create the Dataproc on GKE virtual cluster
gcloud dataproc clusters gke create $VCLUSTER --region $REGION --project $PROJECT \
--gke-cluster=$GKE --gke-cluster-location=$ZONE \
--spark-engine-version=latest --namespace=dataproc \
--pools="name=dp-default,roles=default,machineType=n2-standard-4,min=1,max=3" \
--pools="name=dp-spark,roles=spark-driver;spark-executor,machineType=n2-standard-4,min=1,max=6" \
--setup-workload-identity --staging-bucket=${BUCKET#gs://}
Build the Quanton container image
Find the Dataproc base image your virtual cluster uses (it's the FROM for the
custom image):
kubectl get nodes -o json | python3 -c "import sys,json;print([n for img in json.load(sys.stdin)['items'] for i in img['status']['images'] for n in i['names'] if '/spark/dataproc_2.2:' in n][0])"
# e.g. us-west2-docker.pkg.dev/cloud-dataproc/spark/dataproc_2.2:3.5-dataproc-27
Dockerfile — keep /usr/lib/spark content pristine (so the managed runtime
accepts it) but make it writable; stage Quanton's Spark, Java 17, and the agent
binary; overlay at runtime via the init script:
ARG BASE # the dataproc_2.2 base image discovered above
ARG QUANTON # dist.onehouse.ai/onehouseai/quanton-spark:quanton-operator-release-v0.26.0-al2023
FROM ${QUANTON} AS q
FROM ${BASE}
USER root
COPY --from=q /opt/spark /opt/quanton-spark
COPY --from=q /usr/lib/jvm/java-17-amazon-corretto /opt/quanton-java17
COPY --from=q /usr/local/bin/spark-agent-runtime /usr/local/bin/spark-agent-runtime
RUN chmod -R a+rwX /usr/lib/spark
ENV JAVA_HOME=/opt/quanton-java17
ENV LD_LIBRARY_PATH=/usr/lib/spark/native
ENV PATH=/opt/quanton-java17/bin:/usr/lib/spark/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
COPY init-script.sh /opt/init-script.sh
init-script.sh — runs before Spark launches; overlays Quanton onto SPARK_HOME:
#!/usr/bin/env bash
echo "[quanton] type=$1 overlaying /usr/lib/spark"
if [ -d /opt/quanton-spark ]; then
rm -rf /usr/lib/spark/* 2>/dev/null || true
cp -a /opt/quanton-spark/. /usr/lib/spark/
fi
exit 0
Build and push (target linux/amd64):
export QUANTON_IMAGE=dist.onehouse.ai/onehouseai/quanton-spark:quanton-operator-release-v0.26.0-al2023
export IMAGE=$AR/dpgke-quanton:v1
echo "<docker-token>" | docker login dist.onehouse.ai -u onehouseai --password-stdin
docker build --platform linux/amd64 --build-arg BASE=<dataproc-base> --build-arg QUANTON=$QUANTON_IMAGE -t $IMAGE .
docker push $IMAGE
Provide the Quanton license
Quanton's engine validates a license (JWT + issuer cert + mTLS cert) against the Onehouse control plane. The Quanton Operator mints these; run it once, then copy the secrets into the Dataproc job namespace.
Step 6: Install the operators and mint the license
helm repo add spark-operator https://kubeflow.github.io/spark-operator && helm repo update
helm upgrade --install spark-operator spark-operator/spark-operator \
--namespace spark-operator --create-namespace --set "spark.jobNamespaces={default}"
echo "<docker-token>" | helm registry login registry-1.docker.io -u onehouseai --password-stdin
helm upgrade --install quanton-operator oci://registry-1.docker.io/onehouseai/quanton-operator \
--namespace quanton-operator --create-namespace \
--set "quantonOperator.jobNamespaces={default}" -f onehouse-values.yaml
# GKE blocks the controller's priority class — remove it (see Troubleshooting):
kubectl patch deployment quanton-controller -n quanton-operator --type=json \
-p='[{"op":"remove","path":"/spec/template/spec/priorityClassName"}]'
# Run one example so the operator mints a per-job token and mounts issuer/mTLS:
kubectl apply -f https://raw.githubusercontent.com/onehouseinc/quanton-operator/main/examples/quanton-application.yaml
Step 7: Copy the license secrets into the Dataproc namespace
JWTSEC=$(kubectl get pod -n default quanton-spark-pi-java-example-driver -o json \
| python3 -c "import sys,json;d=json.load(sys.stdin);print([v['secret']['secretName'] for v in d['spec']['volumes'] if v.get('secret') and 'job-token' in v['secret']['secretName']][0])")
kubectl create namespace dataproc 2>/dev/null || true
for pair in "$JWTSEC:quanton-token" "quanton-operator-cert:quanton-issuer" "quanton-operator-mtls-secret:quanton-mtls"; do
src="${pair%%:*}"; dst="${pair##*:}"
kubectl get secret -n default "$src" -o json \
| jq --arg n "$dst" '.metadata={name:$n,namespace:"dataproc"}' \
| kubectl apply -n dataproc -f -
done
Submit a job
Submit through Dataproc, pointing at the custom image and mounting the license.
spark.quanton.standalone.mode=true is required to activate the native
engine, and control_plane.cloud=aws selects the (AWS-hosted) Onehouse control
plane regardless of where Spark runs:
PROPS="spark.kubernetes.container.image=$IMAGE"
PROPS="$PROPS@spark.executor.instances=1@spark.executor.cores=1@spark.driver.cores=1"
PROPS="$PROPS@spark.quanton.control_plane.cloud=aws@spark.quanton.control_plane.environment=production"
PROPS="$PROPS@spark.quanton.standalone.mode=true"
for r in driver executor; do
PROPS="$PROPS@spark.kubernetes.$r.secrets.quanton-token=/var/run/secrets/quanton/token"
PROPS="$PROPS@spark.kubernetes.$r.secrets.quanton-issuer=/var/run/secrets/quanton/issuer"
PROPS="$PROPS@spark.kubernetes.$r.secrets.quanton-mtls=/var/run/secrets/quanton/mtls"
done
gcloud dataproc jobs submit spark --cluster $VCLUSTER --region $REGION --project $PROJECT \
--class org.apache.spark.examples.SparkPi \
--jars file:///usr/lib/spark/examples/jars/calculate-pi-example_2.12-3.5.0.jar \
--properties="^@^${PROPS}" -- 100
Expected:
Pi is roughly 3.141592...
Job [...] finished successfully.
--properties uses the ^@^ delimiter so the @-separated list is parsed
correctly. Pass Spark sizing via cores — Dataproc's pod builder rejects
Spark-style memory quantities like spark.driver.memory=2g.
Verify the native engine
Confirm Quanton's Velox engine actually loaded (not just the Spark fork) by checking the driver log:
gcloud dataproc jobs wait <job-id> --region $REGION 2>&1 | grep -iE "libvelox|Components registered|Quanton SQL Tab"
You should see:
ai.onehouse.quanton.component.package - Components registered within order: velox, velox-hudi, velox-delta, velox-iceberg
ai.onehouse.quanton.jni.JniLibLoader - Successfully loaded library linux/amd64/libvelox.so
ai.onehouse.quanton.backendsapi.velox.VeloxBackend - Quanton SQL Tab has been attached.
If these are absent, spark.quanton.standalone.mode=true was not set.
Spark UI and the Quanton tab
While the driver runs, port-forward the UI:
kubectl port-forward -n dataproc <driver-pod> 4040:4040 # Spark UI + Quanton SQL tab
kubectl port-forward -n dataproc <driver-pod> 4041:4041 # agent sidebar (spark.quanton.agent.enabled=true)
To keep the UI up for inspection after a query, add
spark.quanton.agent.enabled=true and spark.quanton.agent.await.termination=true.
For finished applications, run a Spark History Server over spark.eventLog.dir.
Troubleshooting
Regional quota
New projects often hit SSD_TOTAL_GB (every node disk counts) and
IN_USE_ADDRESSES before CPU. Keep the node footprint small and use
pd-standard for the system pool.
kube-dns Pending
If kube-dns is Pending, you have no untainted node — add the system pool
(Step 3). Without DNS, executors fail to resolve the driver
(UnknownHostException).
Controller stuck at 0 replicas
If the quanton-controller ReplicaSet reports
insufficient quota to match these scopes: [{PriorityClass ...}], GKE is
blocking its priority class. Remove it (shown in Step 6).
Cleanup
gcloud dataproc clusters delete $VCLUSTER --region $REGION --quiet
gcloud container clusters delete $GKE --zone $ZONE --quiet
gcloud artifacts repositories delete quanton --location $REGION --quiet
gcloud storage rm -r $BUCKET
Next steps
- GKE guide — the fully supported Quanton Operator deployment
- Running Jobs — submit your own
QuantonSparkApplicationresources