Skip to main content

Run Quanton on Dataproc on GKE (Experimental)

This guide runs the Quanton Spark engine inside Google's Dataproc on GKE — jobs are submitted through the Dataproc Jobs API, but the Quanton runtime (with its native Velox engine) executes the work. It uses a custom Spark container image plus the Quanton license credentials minted by the Quanton Operator.

Experimental

This is an experimental integration. Dataproc on GKE provides and manages the Spark runtime, so this guide uses a custom container image and a runtime overlay to bring Quanton's engine into the managed pods. For a fully supported Quanton deployment, use the GKE guide (Quanton Operator on a standard GKE cluster).

How it works

Dataproc on GKE mounts its own Spark at /usr/lib/spark and runs an entrypoint that, before launching Spark, executes an optional /opt/init-script.sh. We use that hook to overlay Quanton's Spark distribution onto SPARK_HOME at runtime, and we mount the Quanton license (JWT + issuer cert + mTLS cert) that the engine validates against the Onehouse control plane.

Prerequisites

Set a few shell variables used throughout:

export PROJECT=<project-id> REGION=us-west2 ZONE=us-west2-a
export GKE=quanton-dpgke VCLUSTER=quanton-vc
export AR=${REGION}-docker.pkg.dev/${PROJECT}/quanton
export BUCKET=gs://quanton-dpgke-staging-${REGION}-${PROJECT}

Cluster setup

Step 1: Enable APIs

gcloud services enable container.googleapis.com dataproc.googleapis.com \
artifactregistry.googleapis.com compute.googleapis.com --project $PROJECT

Step 2: Create a Standard GKE cluster

Use x86 nodes — Quanton's native engine targets x86_64 / Graviton.

gcloud container clusters create $GKE --zone $ZONE --project $PROJECT \
--num-nodes 1 --machine-type n2-standard-4 \
--workload-pool=${PROJECT}.svc.id.goog
gcloud container clusters get-credentials $GKE --zone $ZONE --project $PROJECT

Step 3: Add a small untainted system pool

Dataproc's node pools are tainted (dataproc.googleapis.com/pool), which kube-dns can't tolerate. Add a small untainted pool so cluster DNS stays healthy (use a pd-standard disk to avoid the regional SSD quota):

gcloud container node-pools create system-pool --cluster $GKE --zone $ZONE \
--project $PROJECT --machine-type e2-small --num-nodes 1 \
--disk-type pd-standard --disk-size 30

Step 4: Artifact Registry + staging bucket

gcloud artifacts repositories create quanton --repository-format=docker \
--location=$REGION --project $PROJECT
gcloud auth configure-docker ${REGION}-docker.pkg.dev --quiet
gcloud storage buckets create $BUCKET --location=$REGION --project $PROJECT

Step 5: Create the Dataproc on GKE virtual cluster

gcloud dataproc clusters gke create $VCLUSTER --region $REGION --project $PROJECT \
--gke-cluster=$GKE --gke-cluster-location=$ZONE \
--spark-engine-version=latest --namespace=dataproc \
--pools="name=dp-default,roles=default,machineType=n2-standard-4,min=1,max=3" \
--pools="name=dp-spark,roles=spark-driver;spark-executor,machineType=n2-standard-4,min=1,max=6" \
--setup-workload-identity --staging-bucket=${BUCKET#gs://}

Build the Quanton container image

Find the Dataproc base image your virtual cluster uses (it's the FROM for the custom image):

kubectl get nodes -o json | python3 -c "import sys,json;print([n for img in json.load(sys.stdin)['items'] for i in img['status']['images'] for n in i['names'] if '/spark/dataproc_2.2:' in n][0])"
# e.g. us-west2-docker.pkg.dev/cloud-dataproc/spark/dataproc_2.2:3.5-dataproc-27

Dockerfile — keep /usr/lib/spark content pristine (so the managed runtime accepts it) but make it writable; stage Quanton's Spark, Java 17, and the agent binary; overlay at runtime via the init script:

ARG BASE        # the dataproc_2.2 base image discovered above
ARG QUANTON # dist.onehouse.ai/onehouseai/quanton-spark:quanton-operator-release-v0.26.0-al2023
FROM ${QUANTON} AS q
FROM ${BASE}
USER root
COPY --from=q /opt/spark /opt/quanton-spark
COPY --from=q /usr/lib/jvm/java-17-amazon-corretto /opt/quanton-java17
COPY --from=q /usr/local/bin/spark-agent-runtime /usr/local/bin/spark-agent-runtime
RUN chmod -R a+rwX /usr/lib/spark
ENV JAVA_HOME=/opt/quanton-java17
ENV LD_LIBRARY_PATH=/usr/lib/spark/native
ENV PATH=/opt/quanton-java17/bin:/usr/lib/spark/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
COPY init-script.sh /opt/init-script.sh

init-script.sh — runs before Spark launches; overlays Quanton onto SPARK_HOME:

#!/usr/bin/env bash
echo "[quanton] type=$1 overlaying /usr/lib/spark"
if [ -d /opt/quanton-spark ]; then
rm -rf /usr/lib/spark/* 2>/dev/null || true
cp -a /opt/quanton-spark/. /usr/lib/spark/
fi
exit 0

Build and push (target linux/amd64):

export QUANTON_IMAGE=dist.onehouse.ai/onehouseai/quanton-spark:quanton-operator-release-v0.26.0-al2023
export IMAGE=$AR/dpgke-quanton:v1
echo "<docker-token>" | docker login dist.onehouse.ai -u onehouseai --password-stdin
docker build --platform linux/amd64 --build-arg BASE=<dataproc-base> --build-arg QUANTON=$QUANTON_IMAGE -t $IMAGE .
docker push $IMAGE

Provide the Quanton license

Quanton's engine validates a license (JWT + issuer cert + mTLS cert) against the Onehouse control plane. The Quanton Operator mints these; run it once, then copy the secrets into the Dataproc job namespace.

Step 6: Install the operators and mint the license

helm repo add spark-operator https://kubeflow.github.io/spark-operator && helm repo update
helm upgrade --install spark-operator spark-operator/spark-operator \
--namespace spark-operator --create-namespace --set "spark.jobNamespaces={default}"

echo "<docker-token>" | helm registry login registry-1.docker.io -u onehouseai --password-stdin
helm upgrade --install quanton-operator oci://registry-1.docker.io/onehouseai/quanton-operator \
--namespace quanton-operator --create-namespace \
--set "quantonOperator.jobNamespaces={default}" -f onehouse-values.yaml

# GKE blocks the controller's priority class — remove it (see Troubleshooting):
kubectl patch deployment quanton-controller -n quanton-operator --type=json \
-p='[{"op":"remove","path":"/spec/template/spec/priorityClassName"}]'

# Run one example so the operator mints a per-job token and mounts issuer/mTLS:
kubectl apply -f https://raw.githubusercontent.com/onehouseinc/quanton-operator/main/examples/quanton-application.yaml

Step 7: Copy the license secrets into the Dataproc namespace

JWTSEC=$(kubectl get pod -n default quanton-spark-pi-java-example-driver -o json \
| python3 -c "import sys,json;d=json.load(sys.stdin);print([v['secret']['secretName'] for v in d['spec']['volumes'] if v.get('secret') and 'job-token' in v['secret']['secretName']][0])")
kubectl create namespace dataproc 2>/dev/null || true
for pair in "$JWTSEC:quanton-token" "quanton-operator-cert:quanton-issuer" "quanton-operator-mtls-secret:quanton-mtls"; do
src="${pair%%:*}"; dst="${pair##*:}"
kubectl get secret -n default "$src" -o json \
| jq --arg n "$dst" '.metadata={name:$n,namespace:"dataproc"}' \
| kubectl apply -n dataproc -f -
done

Submit a job

Submit through Dataproc, pointing at the custom image and mounting the license. spark.quanton.standalone.mode=true is required to activate the native engine, and control_plane.cloud=aws selects the (AWS-hosted) Onehouse control plane regardless of where Spark runs:

PROPS="spark.kubernetes.container.image=$IMAGE"
PROPS="$PROPS@spark.executor.instances=1@spark.executor.cores=1@spark.driver.cores=1"
PROPS="$PROPS@spark.quanton.control_plane.cloud=aws@spark.quanton.control_plane.environment=production"
PROPS="$PROPS@spark.quanton.standalone.mode=true"
for r in driver executor; do
PROPS="$PROPS@spark.kubernetes.$r.secrets.quanton-token=/var/run/secrets/quanton/token"
PROPS="$PROPS@spark.kubernetes.$r.secrets.quanton-issuer=/var/run/secrets/quanton/issuer"
PROPS="$PROPS@spark.kubernetes.$r.secrets.quanton-mtls=/var/run/secrets/quanton/mtls"
done

gcloud dataproc jobs submit spark --cluster $VCLUSTER --region $REGION --project $PROJECT \
--class org.apache.spark.examples.SparkPi \
--jars file:///usr/lib/spark/examples/jars/calculate-pi-example_2.12-3.5.0.jar \
--properties="^@^${PROPS}" -- 100

Expected:

Pi is roughly 3.141592...
Job [...] finished successfully.
note

--properties uses the ^@^ delimiter so the @-separated list is parsed correctly. Pass Spark sizing via cores — Dataproc's pod builder rejects Spark-style memory quantities like spark.driver.memory=2g.

Verify the native engine

Confirm Quanton's Velox engine actually loaded (not just the Spark fork) by checking the driver log:

gcloud dataproc jobs wait <job-id> --region $REGION 2>&1 | grep -iE "libvelox|Components registered|Quanton SQL Tab"

You should see:

ai.onehouse.quanton.component.package - Components registered within order: velox, velox-hudi, velox-delta, velox-iceberg
ai.onehouse.quanton.jni.JniLibLoader - Successfully loaded library linux/amd64/libvelox.so
ai.onehouse.quanton.backendsapi.velox.VeloxBackend - Quanton SQL Tab has been attached.

If these are absent, spark.quanton.standalone.mode=true was not set.

Spark UI and the Quanton tab

While the driver runs, port-forward the UI:

kubectl port-forward -n dataproc <driver-pod> 4040:4040   # Spark UI + Quanton SQL tab
kubectl port-forward -n dataproc <driver-pod> 4041:4041 # agent sidebar (spark.quanton.agent.enabled=true)

To keep the UI up for inspection after a query, add spark.quanton.agent.enabled=true and spark.quanton.agent.await.termination=true. For finished applications, run a Spark History Server over spark.eventLog.dir.

Troubleshooting

Regional quota

New projects often hit SSD_TOTAL_GB (every node disk counts) and IN_USE_ADDRESSES before CPU. Keep the node footprint small and use pd-standard for the system pool.

kube-dns Pending

If kube-dns is Pending, you have no untainted node — add the system pool (Step 3). Without DNS, executors fail to resolve the driver (UnknownHostException).

Controller stuck at 0 replicas

If the quanton-controller ReplicaSet reports insufficient quota to match these scopes: [{PriorityClass ...}], GKE is blocking its priority class. Remove it (shown in Step 6).

Cleanup

gcloud dataproc clusters delete $VCLUSTER --region $REGION --quiet
gcloud container clusters delete $GKE --zone $ZONE --quiet
gcloud artifacts repositories delete quanton --location $REGION --quiet
gcloud storage rm -r $BUCKET

Next steps

  • GKE guide — the fully supported Quanton Operator deployment
  • Running Jobs — submit your own QuantonSparkApplication resources