Skip to main content

Running Jobs

Quanton jobs are submitted as QuantonSparkApplication Kubernetes resources. The operator translates them into Spark jobs running on the Quanton compute engine.

Job run states

StateDescription
QueuedJob has been submitted and is waiting for the Spark driver to start
RunningJob is actively executing on the cluster
CompletedJob finished without errors
FailedJob failed — check driver logs for details
CanceledJob was manually deleted while queued or running

Submit a job

Apply a QuantonSparkApplication manifest:

kubectl apply -f my-quanton-job.yaml

Check status:

kubectl get quantonsparkapplications -n <namespace>

Minimal Java/Scala job

apiVersion: onehouse.ai/v1beta2
kind: QuantonSparkApplication
metadata:
name: my-spark-job
namespace: default
spec:
sparkApplicationSpec:
type: Java
mode: cluster
mainClass: com.example.MyJob
mainApplicationFile: "s3://my-bucket/jars/my-job.jar"
sparkVersion: "3.5.0"
driver:
cores: 2
memory: "4096m"
serviceAccount: spark-operator-spark
executor:
cores: 4
instances: 3
memory: "8192m"

Minimal Python job

apiVersion: onehouse.ai/v1beta2
kind: QuantonSparkApplication
metadata:
name: my-pyspark-job
namespace: default
spec:
sparkApplicationSpec:
type: Python
mode: cluster
mainApplicationFile: "s3://my-bucket/scripts/my_job.py"
sparkVersion: "3.5.0"
driver:
cores: 2
memory: "4096m"
serviceAccount: spark-operator-spark
executor:
cores: 4
instances: 3
memory: "8192m"

The Quanton Operator automatically injects the correct runtime image — you don't specify it in the manifest.

Default resource configuration

ComponentSpark config keyDefault
Driver coresspark.driver.cores4
Driver memoryspark.driver.memory8g
Driver memory overheadspark.driver.memory.overheadFactor0.35
Executor coresspark.executor.cores4
Executor memoryspark.executor.memory8g
Executor memory overheadspark.executor.memory.overheadFactor0.35
Dynamic allocationspark.dynamicAllocation.enabledtrue
Min executorsspark.dynamicAllocation.minExecutors0
Max executorsspark.dynamicAllocation.maxExecutors8

Pass Spark configuration

Add Spark conf keys under sparkConf:

spec:
sparkApplicationSpec:
sparkConf:
"spark.sql.shuffle.partitions": "200"
"spark.executor.memoryOverhead": "1g"
"spark.dynamicAllocation.maxExecutors": "16"

Resubmit a job

Jobs can only have one concurrent run. To resubmit:

Delete and re-apply:

kubectl delete quantonsparkapplication my-spark-job -n default
kubectl apply -f my-quanton-job.yaml

Or use a unique name each time:

# Edit metadata.name to be unique, then:
kubectl create -f my-quanton-job.yaml

Select a Python version

The Quanton Spark image includes Python 3.9, 3.11 (default), and 3.12. Override the interpreter with environment variables:

driver:
envVars:
PYSPARK_PYTHON: "/usr/bin/python3.12"
PYSPARK_DRIVER_PYTHON: "/usr/bin/python3.12"
executor:
envVars:
PYSPARK_PYTHON: "/usr/bin/python3.12"

Working with lakehouse formats

Apache Hudi

Hudi is pre-installed on the Quanton Spark image. No extra dependencies needed.

df.write.format("hudi") \
.option("hoodie.table.name", "my_table") \
.mode("append") \
.save("s3://my-bucket/hudi/my_table")

Apache Iceberg

Add the Iceberg dependency to your job JAR, or install via pip for PySpark. Use Iceberg's SparkCatalog extension:

sparkConf:
"spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions"
"spark.sql.catalog.my_catalog": "org.apache.iceberg.spark.SparkCatalog"
"spark.sql.catalog.my_catalog.type": "hadoop"
"spark.sql.catalog.my_catalog.warehouse": "s3://my-bucket/iceberg"

Delta Lake

Add the Delta Spark dependency and configure:

sparkConf:
"spark.sql.extensions": "io.delta.sql.DeltaSparkSessionExtension"
"spark.sql.catalog.spark_catalog": "org.apache.spark.sql.delta.catalog.DeltaCatalog"

Airflow integration

Use the Quanton Operator's Airflow provider to submit jobs from DAGs. This enables scheduling, dependency management, and retry logic on top of Quanton jobs.