Skip to main content

Azure

Deploy the Quanton Operator on Microsoft Azure using AKS.

tip

New to AKS? Follow the AKS deployment guide for a step-by-step walkthrough from cluster creation to your first Spark job.

AKS

Azure Kubernetes Service (AKS) is the recommended deployment target for Quanton on Azure. The Quanton Operator runs on your AKS cluster and manages the full Spark job lifecycle via Kubernetes.

Prerequisites

  • AKS cluster running Kubernetes >= 1.28
  • Helm >= 3.x and kubectl configured for your cluster
  • onehouse-values.yaml downloaded from the Onehouse console
  • Outbound network access from your cluster to *.onehouse.ai and *.docker.io

If you don't have a cluster yet — or your Azure subscription is fresh and needs Microsoft.ContainerService / Microsoft.Compute provider registration — follow the AKS deployment guide first.

Step 1: Install the Spark Operator

The Quanton Operator builds on top of the kubeflow Spark Operator. Install it first:

helm repo add spark-operator https://kubeflow.github.io/spark-operator
helm repo update

helm install spark-operator spark-operator/spark-operator \
--namespace spark-operator \
--create-namespace \
--set "spark.jobNamespaces={default}"

Verify it's running:

kubectl get pods -n spark-operator

Step 2: Install the Quanton Operator

helm upgrade --install quanton-operator oci://registry-1.docker.io/onehouseai/quanton-operator \
--namespace quanton-operator \
--create-namespace \
--set "quantonOperator.jobNamespaces={default}" \
-f /path/to/onehouse-values.yaml

Verify the operator pod is running and that the image-pull secret was created from your onehouse-values.yaml:

kubectl get pods -n quanton-operator
kubectl get secret -n quanton-operator | grep onehouse

If the registry secret is missing, the onehouseConfig.imagePullSecrets.accessToken field in your values file is empty or invalid — re-download onehouse-values.yaml from the console.

Step 3: Submit a Spark job

apiVersion: quantonsparkoperator.onehouse.ai/v1beta2
kind: QuantonSparkApplication
metadata:
name: my-spark-job
namespace: default
spec:
sparkApplicationSpec:
type: Python
mode: cluster
image: "dist.onehouse.ai/onehouseai/quanton-spark:release-v1.29.0-al2023"
mainApplicationFile: "abfss://my-container@mystorageaccount.dfs.core.windows.net/jobs/my_job.py"
sparkVersion: "3.5.0"
sparkConf:
"spark.hadoop.fs.azure.account.auth.type.mystorageaccount.dfs.core.windows.net": "OAuth"
"spark.hadoop.fs.azure.account.oauth.provider.type.mystorageaccount.dfs.core.windows.net": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider"
"spark.hadoop.fs.azure.account.oauth2.client.id.mystorageaccount.dfs.core.windows.net": "<client-id>"
"spark.hadoop.fs.azure.account.oauth2.client.secret.mystorageaccount.dfs.core.windows.net": "<client-secret>"
"spark.hadoop.fs.azure.account.oauth2.client.endpoint.mystorageaccount.dfs.core.windows.net": "https://login.microsoftonline.com/<tenant-id>/oauth2/token"
driver:
cores: 4
memory: "8192m"
serviceAccount: spark-operator-spark
executor:
cores: 4
instances: 4
memory: "8192m"
kubectl apply -f my-spark-job.yaml

ADLS Gen2 access

The sparkConf block above uses literal client ID/secret values for simplicity. For production, use AKS Workload Identity so pods authenticate to ADLS Gen2 without storing credentials in your job spec:

  1. Enable Workload Identity on the cluster (az aks update --enable-oidc-issuer --enable-workload-identity ...).
  2. Federate a User-Assigned Managed Identity to the spark-operator-spark service account in default.
  3. Grant that identity Storage Blob Data Contributor on your storage account.
  4. Swap the oauth2.client.secret config for the Workload Identity provider class — see the Hadoop ABFS authentication docs for the exact spark.hadoop.fs.azure.account.oauth.provider.type value.

Dedicated node pool (optional)

For best performance, run Spark pods on a dedicated node pool:

az aks nodepool add \
--resource-group my-rg \
--cluster-name my-cluster \
--name sparkpool \
--node-vm-size Standard_D8s_v3 \
--node-count 4 \
--labels workload=spark

Set a matching node selector in onehouse-values.yaml:

quantonOperator:
nodeSelector:
workload: spark

Then re-apply the Helm install with the updated values file.