Skip to main content

Spark Job Setup

Quanton uses the QuantonSparkApplication CRD — a thin wrapper around the Kubeflow SparkApplication that automatically injects the Quanton runtime image and manages authentication with the Onehouse control plane.

Dependencies

Before submitting jobs, ensure the following are installed on your cluster:

  • Spark Operator (v1.x or v2.x) — the Quanton Operator extends it, not replaces it
  • Quanton Operator — see Project Creation for install instructions

Migrating existing SparkApplication resources

If you have existing Kubeflow SparkApplication resources, use the migration tool to convert them to QuantonSparkApplication format.

What the migration tool does

FieldBefore (SparkApplication)After (QuantonSparkApplication)
apiVersionsparkoperator.k8s.io/v1beta2onehouse.ai/v1beta2
kindSparkApplicationQuantonSparkApplication
metadataPreserved as-isPreserved as-is
specDirect Spark configMoved under spec.sparkApplicationSpec

The tool validates your input before transforming it — apiVersion, kind, metadata.name, spec, spec.type (Java/Scala/Python/R), and spec.mode (cluster/client) must all be present and valid.

Run the migration tool

# Convert a single file
python scripts/transform.py -input my-spark-app.yaml -output my-quanton-app.yaml

# Preview output (stdout)
python scripts/transform.py -input my-spark-app.yaml

Prerequisites: Python 3.9+ and pyyaml (pip install pyyaml).

Example transformation

Input SparkApplication:

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
name: my-etl-job
namespace: data-jobs
spec:
type: Python
mode: cluster
mainApplicationFile: "s3://my-bucket/jobs/etl.py"
sparkVersion: "3.5.0"
driver:
cores: 2
memory: "4096m"
serviceAccount: spark-operator-spark
executor:
cores: 4
instances: 3
memory: "8192m"

Output QuantonSparkApplication:

apiVersion: onehouse.ai/v1beta2
kind: QuantonSparkApplication
metadata:
name: my-etl-job
namespace: data-jobs
spec:
sparkApplicationSpec:
type: Python
mode: cluster
mainApplicationFile: "s3://my-bucket/jobs/etl.py"
sparkVersion: "3.5.0"
driver:
cores: 2
memory: "4096m"
serviceAccount: spark-operator-spark
executor:
cores: 4
instances: 3
memory: "8192m"

The Quanton Operator automatically injects the correct image from your onehouse-values.yaml — you don't need to specify it.

PySpark

The Quanton Spark image ships with Python 3.9, 3.11 (default), and 3.12. To select a specific version, set PYSPARK_PYTHON in your job spec:

driver:
envVars:
PYSPARK_PYTHON: "/usr/bin/python3.12"
PYSPARK_DRIVER_PYTHON: "/usr/bin/python3.12"
executor:
envVars:
PYSPARK_PYTHON: "/usr/bin/python3.12"

Custom integrations

If you're deploying on a cloud Kubernetes platform, see the Integrations section for platform-specific instructions.