Running Jobs
Quanton jobs are submitted as QuantonSparkApplication Kubernetes resources. The operator translates them into Spark jobs running on the Quanton compute engine.
Job run states
| State | Description |
|---|---|
| Queued | Job has been submitted and is waiting for the Spark driver to start |
| Running | Job is actively executing on the cluster |
| Completed | Job finished without errors |
| Failed | Job failed — check driver logs for details |
| Canceled | Job was manually deleted while queued or running |
Submit a job
Apply a QuantonSparkApplication manifest:
kubectl apply -f my-quanton-job.yaml
Check status:
kubectl get quantonsparkapplications -n <namespace>
Minimal Java/Scala job
apiVersion: onehouse.ai/v1beta2
kind: QuantonSparkApplication
metadata:
name: my-spark-job
namespace: default
spec:
sparkApplicationSpec:
type: Java
mode: cluster
mainClass: com.example.MyJob
mainApplicationFile: "s3://my-bucket/jars/my-job.jar"
sparkVersion: "3.5.0"
driver:
cores: 2
memory: "4096m"
serviceAccount: spark-operator-spark
executor:
cores: 4
instances: 3
memory: "8192m"
Minimal Python job
apiVersion: onehouse.ai/v1beta2
kind: QuantonSparkApplication
metadata:
name: my-pyspark-job
namespace: default
spec:
sparkApplicationSpec:
type: Python
mode: cluster
mainApplicationFile: "s3://my-bucket/scripts/my_job.py"
sparkVersion: "3.5.0"
driver:
cores: 2
memory: "4096m"
serviceAccount: spark-operator-spark
executor:
cores: 4
instances: 3
memory: "8192m"
The Quanton Operator automatically injects the correct runtime image — you don't specify it in the manifest.
Default resource configuration
| Component | Spark config key | Default |
|---|---|---|
| Driver cores | spark.driver.cores | 4 |
| Driver memory | spark.driver.memory | 8g |
| Driver memory overhead | spark.driver.memory.overheadFactor | 0.35 |
| Executor cores | spark.executor.cores | 4 |
| Executor memory | spark.executor.memory | 8g |
| Executor memory overhead | spark.executor.memory.overheadFactor | 0.35 |
| Dynamic allocation | spark.dynamicAllocation.enabled | true |
| Min executors | spark.dynamicAllocation.minExecutors | 0 |
| Max executors | spark.dynamicAllocation.maxExecutors | 8 |
Pass Spark configuration
Add Spark conf keys under sparkConf:
spec:
sparkApplicationSpec:
sparkConf:
"spark.sql.shuffle.partitions": "200"
"spark.executor.memoryOverhead": "1g"
"spark.dynamicAllocation.maxExecutors": "16"
Resubmit a job
Jobs can only have one concurrent run. To resubmit:
Delete and re-apply:
kubectl delete quantonsparkapplication my-spark-job -n default
kubectl apply -f my-quanton-job.yaml
Or use a unique name each time:
# Edit metadata.name to be unique, then:
kubectl create -f my-quanton-job.yaml
Select a Python version
The Quanton Spark image includes Python 3.9, 3.11 (default), and 3.12. Override the interpreter with environment variables:
driver:
envVars:
PYSPARK_PYTHON: "/usr/bin/python3.12"
PYSPARK_DRIVER_PYTHON: "/usr/bin/python3.12"
executor:
envVars:
PYSPARK_PYTHON: "/usr/bin/python3.12"
Working with lakehouse formats
Apache Hudi
Hudi is pre-installed on the Quanton Spark image. No extra dependencies needed.
df.write.format("hudi") \
.option("hoodie.table.name", "my_table") \
.mode("append") \
.save("s3://my-bucket/hudi/my_table")
Apache Iceberg
Add the Iceberg dependency to your job JAR, or install via pip for PySpark. Use Iceberg's SparkCatalog extension:
sparkConf:
"spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions"
"spark.sql.catalog.my_catalog": "org.apache.iceberg.spark.SparkCatalog"
"spark.sql.catalog.my_catalog.type": "hadoop"
"spark.sql.catalog.my_catalog.warehouse": "s3://my-bucket/iceberg"
Delta Lake
Add the Delta Spark dependency and configure:
sparkConf:
"spark.sql.extensions": "io.delta.sql.DeltaSparkSessionExtension"
"spark.sql.catalog.spark_catalog": "org.apache.spark.sql.delta.catalog.DeltaCatalog"
Airflow integration
Use the Quanton Operator's Airflow provider to submit jobs from DAGs. This enables scheduling, dependency management, and retry logic on top of Quanton jobs.