Spark Job Setup
Quanton uses the QuantonSparkApplication CRD — a thin wrapper around the Kubeflow SparkApplication that automatically injects the Quanton runtime image and manages authentication with the Onehouse control plane.
Dependencies
Before submitting jobs, ensure the following are installed on your cluster:
- Spark Operator (v1.x or v2.x) — the Quanton Operator extends it, not replaces it
- Quanton Operator — see Project Creation for install instructions
Migrating existing SparkApplication resources
If you have existing Kubeflow SparkApplication resources, use the migration tool to convert them to QuantonSparkApplication format.
What the migration tool does
| Field | Before (SparkApplication) | After (QuantonSparkApplication) |
|---|---|---|
apiVersion | sparkoperator.k8s.io/v1beta2 | onehouse.ai/v1beta2 |
kind | SparkApplication | QuantonSparkApplication |
metadata | Preserved as-is | Preserved as-is |
spec | Direct Spark config | Moved under spec.sparkApplicationSpec |
The tool validates your input before transforming it — apiVersion, kind, metadata.name, spec, spec.type (Java/Scala/Python/R), and spec.mode (cluster/client) must all be present and valid.
Run the migration tool
# Convert a single file
python scripts/transform.py -input my-spark-app.yaml -output my-quanton-app.yaml
# Preview output (stdout)
python scripts/transform.py -input my-spark-app.yaml
Prerequisites: Python 3.9+ and pyyaml (pip install pyyaml).
Example transformation
Input SparkApplication:
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
name: my-etl-job
namespace: data-jobs
spec:
type: Python
mode: cluster
mainApplicationFile: "s3://my-bucket/jobs/etl.py"
sparkVersion: "3.5.0"
driver:
cores: 2
memory: "4096m"
serviceAccount: spark-operator-spark
executor:
cores: 4
instances: 3
memory: "8192m"
Output QuantonSparkApplication:
apiVersion: onehouse.ai/v1beta2
kind: QuantonSparkApplication
metadata:
name: my-etl-job
namespace: data-jobs
spec:
sparkApplicationSpec:
type: Python
mode: cluster
mainApplicationFile: "s3://my-bucket/jobs/etl.py"
sparkVersion: "3.5.0"
driver:
cores: 2
memory: "4096m"
serviceAccount: spark-operator-spark
executor:
cores: 4
instances: 3
memory: "8192m"
The Quanton Operator automatically injects the correct image from your onehouse-values.yaml — you don't need to specify it.
PySpark
The Quanton Spark image ships with Python 3.9, 3.11 (default), and 3.12. To select a specific version, set PYSPARK_PYTHON in your job spec:
driver:
envVars:
PYSPARK_PYTHON: "/usr/bin/python3.12"
PYSPARK_DRIVER_PYTHON: "/usr/bin/python3.12"
executor:
envVars:
PYSPARK_PYTHON: "/usr/bin/python3.12"
Custom integrations
If you're deploying on a cloud Kubernetes platform, see the Integrations section for platform-specific instructions.