GCP

Deploy the Quanton Operator on Google Cloud Platform using GKE.

tip

New to GKE? Follow the GKE deployment guide for a step-by-step walkthrough from cluster creation to your first Spark job.

GKE

Google Kubernetes Engine (GKE) is the recommended deployment target for Quanton on GCP. The Quanton Operator runs on your GKE cluster and manages the full Spark job lifecycle via Kubernetes.

Prerequisites

GKE cluster running Kubernetes >= 1.28
Helm >= 3.x and kubectl configured for your cluster
onehouse-values.yaml downloaded from the Onehouse console
Outbound network access from your cluster to *.onehouse.ai and *.docker.io

Step 1: Install the Spark Operator

The Quanton Operator builds on top of the kubeflow Spark Operator. Install it first:

helm repo add spark-operator https://kubeflow.github.io/spark-operator
helm repo update

helm install spark-operator spark-operator/spark-operator \
  --namespace spark-operator \
  --create-namespace \
  --set "spark.jobNamespaces={default}"

Verify it's running:

kubectl get pods -n spark-operator

Step 2: Install the Quanton Operator

helm upgrade --install quanton-operator oci://registry-1.docker.io/onehouseai/quanton-operator \
  --namespace quanton-operator \
  --create-namespace \
  --set "quantonOperator.jobNamespaces={default}" \
  -f onehouse-values.yaml

Verify the operator pod is running:

kubectl get pods -n quanton-operator

Step 3: Submit a Spark job

apiVersion: quantonsparkoperator.onehouse.ai/v1beta2
kind: QuantonSparkApplication
metadata:
  name: my-spark-job
  namespace: default
spec:
  sparkApplicationSpec:
    type: Python
    mode: cluster
    image: "dist.onehouse.ai/onehouseai/quanton-spark:release-v1.29.0-al2023"
    mainApplicationFile: "gs://my-bucket/jobs/my_job.py"
    sparkVersion: "3.5.0"
    sparkConf:
      "spark.hadoop.google.cloud.auth.service.account.enable": "true"
    driver:
      cores: 4
      memory: "8192m"
      serviceAccount: spark-operator-spark
    executor:
      cores: 4
      instances: 4
      memory: "8192m"

kubectl apply -f my-spark-job.yaml

GCS access via Workload Identity

Use Workload Identity to authenticate pods to GCS without service account keys:

# Bind the Kubernetes service account to a GCP service account
gcloud iam service-accounts add-iam-policy-binding \
  spark-gcs@<project>.iam.gserviceaccount.com \
  --role roles/iam.workloadIdentityUser \
  --member "serviceAccount:<project>.svc.id.goog[default/spark-operator-spark]"

kubectl annotate serviceaccount spark-operator-spark \
  iam.gke.io/gcp-service-account=spark-gcs@<project>.iam.gserviceaccount.com \
  -n default

The GCP service account needs roles/storage.objectAdmin on your GCS buckets.

Dedicated node pool (optional)

For best performance, run Spark pods on a dedicated node pool:

gcloud container node-pools create spark-pool \
  --cluster=my-cluster \
  --machine-type=n2-standard-8 \
  --num-nodes=4 \
  --node-labels=workload=spark

Set a matching node selector in onehouse-values.yaml:

quantonOperator:
  nodeSelector:
    workload: spark

Then re-apply the Helm install with the updated values file.

GKE​

Prerequisites​

Step 1: Install the Spark Operator​

Step 2: Install the Quanton Operator​

Step 3: Submit a Spark job​

GCS access via Workload Identity​

Dedicated node pool (optional)​

GKE