Skip to main content

Deploy Quanton on AKS

This guide walks through deploying the Quanton Operator on a new AKS cluster from scratch — from creating the cluster to running your first Spark job.

Prerequisites

Resource and cluster setup

Step 1: Log in and set your subscription

az login

List available subscriptions and set the one you want to use:

az account list --output table
az account set --subscription "<subscription-name-or-id>"

Step 2: Register required providers

If this is a fresh subscription, register the AKS and compute providers:

az provider register --namespace Microsoft.ContainerService
az provider register --namespace Microsoft.Compute

Check registration status before proceeding:

az provider show --namespace Microsoft.ContainerService --query registrationState
az provider show --namespace Microsoft.Compute --query registrationState

Wait until both return "Registered".

Step 3: Create a resource group

az group create --name quanton-test-rg --location westus2

Use any region — westus2, eastus, westeurope, etc.

Step 4: Create the AKS cluster

az aks create \
--resource-group quanton-test-rg \
--name quanton-aks \
--kubernetes-version 1.33.0 \
--node-count 2 \
--node-vm-size Standard_D4s_v3 \
--generate-ssh-keys

This takes ~5 minutes. To check available Kubernetes versions in your region:

az aks get-versions --location westus2 --output table
note

Use a version that lists KubernetesOfficial in the SupportPlan column to avoid requiring a Premium tier cluster.

Step 5: Configure kubectl

az aks get-credentials --resource-group quanton-test-rg --name quanton-aks
kubectl get nodes

You should see 2 nodes in Ready state.

Install the operators

Step 6: Install the Spark Operator

The Quanton Operator extends the kubeflow Spark Operator — install it first:

helm repo add spark-operator https://kubeflow.github.io/spark-operator
helm repo update

helm install spark-operator spark-operator/spark-operator \
--namespace spark-operator \
--create-namespace \
--set "spark.jobNamespaces={default}"

Verify it's running:

kubectl get pods -n spark-operator

Step 7: Install the Quanton Operator

helm upgrade --install quanton-operator oci://registry-1.docker.io/onehouseai/quanton-operator \
--namespace quanton-operator \
--create-namespace \
--set "quantonOperator.jobNamespaces={default}" \
-f /path/to/onehouse-values.yaml

Verify the pods are running (may take ~30–60 seconds to initialize):

kubectl get pods -n quanton-operator

Expected output once ready:

NAME                                   READY   STATUS    RESTARTS   AGE
dp-proxy-deployment-xxxx-xxxxx 1/1 Running 0 60s
quanton-controller-xxxx-xxxxx 3/3 Running 0 60s

If pods show PodInitializing or 0/1, wait a moment and re-run the command.

Submit and monitor job

Step 8: Submit a test job

kubectl apply -f https://raw.githubusercontent.com/onehouseinc/quanton-operator/main/examples/quanton-application.yaml

Confirm the application was created:

kubectl get quantonsparkapplications -n default

Monitor the driver pod (may take 2–3 minutes while the Quanton image is pulled):

kubectl get pods -A | grep driver

You can also track the job in the Onehouse console under Jobs:

Job running in Onehouse console

Once running, check the output:

kubectl logs -f quanton-spark-pi-java-example-driver | grep -i "pi is"

Expected output:

Pi is roughly 3.141592...

Once the job finishes, the console will show it as Completed:

Job completed in Onehouse console

Troubleshooting

ImagePullBackOff on quanton-controller

If kubectl get pods -n quanton-operator shows ImagePullBackOff on the quanton-controller pod, check the events:

kubectl describe pod -n quanton-operator <quanton-controller-pod-name> | grep -A 30 "Events:"

A TLS handshake timeout pulling from dist.onehouse.ai can indicate the AKS nodes can't reach the Onehouse image registry — or may be a transient failure that resolves on retry. First wait a minute and re-check pod status. If it persists, verify connectivity from inside the cluster:

kubectl run nettest --image=busybox --restart=Never -- \
sh -c "wget -qO- https://dist.onehouse.ai 2>&1 || true"
kubectl logs nettest
kubectl delete pod nettest

AKS clusters use a standard load balancer for outbound internet access by default — if you've customized networking (private cluster, custom NSGs, egress firewall), you'll need to allowlist the required endpoints. See Network Configuration for the full list.

Checking the image pull secret

The Helm chart creates an image pull secret from your onehouse-values.yaml. Verify it exists:

kubectl get secret -n quanton-operator | grep onehouse

If the secret is missing, your onehouse-values.yaml may be missing the imagePullSecrets.accessToken field — download a fresh copy from the Onehouse console.

Cleanup

When you're done, delete the resource group to remove all resources:

az group delete --name quanton-test-rg --yes --no-wait

Next steps