Agent AI

The Quanton Agent is an AI-powered debugging sidebar injected into every Apache Spark Web UI page. It ships as a self-contained JAR — drop it into /opt/spark/jars/ and it's active.

What it does

The agent adds an overlay panel on top of the Spark UI with five tabs:

Tab	Description
Chat	Stream chat with an LLM about the running job. Full job context — stages, tasks, SQL plans, executor logs — is sent automatically. Supports Anthropic, Google Gemini, and OpenAI.
Recommendations	Auto-generated Spark config recommendations based on observed job behavior.
Diagnostics	Health alerts for spill, GC pressure, data skew, OOM risk, and straggler tasks.
Monitor	Real-time executor metrics: CPU, heap, shuffle IO, disk IO, active executors, failed tasks.
Settings	Configure LLM provider and API key (verified live, persisted in localStorage).

Architecture

Spark Driver JVM
│
└── SparkAgentPlugin (JAR)
      │
      ├── AgentDriverPlugin
      │     Registers filter + servlet on Spark's embedded Jetty server
      │
      ├── SidebarInjectionFilter
      │     Intercepts every HTML response from the Spark UI
      │     Injects sidebar.js + sidebar.css before </body>
      │
      └── AgentAssetsServlet
            Serves /agent-assets/sidebar.js and /agent-assets/sidebar.css

The backend (context builder, health monitor, LLM proxy) lives in spark-connect-server — a separate service. The JAR only handles UI injection and asset serving.

Setup

Step 1: Build the JAR

Clone the spark-agent repo and build:

cd ui && npm install && npm run build   # compile React sidebar
mvn clean package -DskipTests          # package JAR
# Output: target/spark-agent-*.jar

Step 2: Add to your Spark image

Create a Dockerfile based on the Quanton Spark image:

FROM dist.onehouse.ai/onehouseai/quanton-spark:release-v1.29.0-al2023

# Copy the agent JAR
COPY target/spark-agent-*.jar /opt/spark/jars/

# Configure the plugin
RUN echo "spark.plugins=ai.quanton.spark.agent.SparkAgentPlugin" \
    >> /opt/spark/conf/spark-defaults.conf
RUN echo "spark.quanton.agent.server.url=https://your-connect-server.example.com" \
    >> /opt/spark/conf/spark-defaults.conf

Build and push the image:

docker build -t your-registry/quanton-spark-agent:latest .
docker push your-registry/quanton-spark-agent:latest

Step 3: Use the image in your job

Reference your custom image in the QuantonSparkApplication. Override the default image by setting it explicitly:

apiVersion: onehouse.ai/v1beta2
kind: QuantonSparkApplication
metadata:
  name: my-agent-job
  namespace: default
spec:
  sparkApplicationSpec:
    type: Python
    mode: cluster
    image: "your-registry/quanton-spark-agent:latest"
    mainApplicationFile: "s3://my-bucket/jobs/my_job.py"
    sparkVersion: "3.5.0"
    driver:
      cores: 2
      memory: "4096m"
      serviceAccount: spark-operator-spark
    executor:
      cores: 4
      instances: 2
      memory: "8192m"

While the job is running, port-forward the driver pod:

kubectl port-forward <driver-pod-name> 4040:4040 -n <namespace>

Open http://localhost:4040. The Quanton Agent sidebar appears in the bottom-right corner of every Spark UI page.

In Settings, enter your LLM API key (Anthropic, Google Gemini, or OpenAI). The key is stored in browser localStorage and never sent to Onehouse servers.

LLM providers

The agent is provider-agnostic. Configure in the Settings tab or via environment variables:

Provider	Model examples
Anthropic	`claude-opus-4-6`, `claude-sonnet-4-6`
Google Gemini	`gemini-2.0-flash`, `gemini-2.5-pro`
OpenAI	`gpt-4o`, `gpt-4.1`

The agent auto-detects the provider from the API key prefix. Available models are fetched live from the provider API.

Spark configuration reference

Config key	Description
`spark.plugins`	Must include `ai.quanton.spark.agent.SparkAgentPlugin`
`spark.quanton.agent.server.url`	URL of the `spark-connect-server` instance

spark-connect-server

The agent JAR forwards context queries to spark-connect-server, which:

Builds full job context (stages, tasks, SQL, logs) for chat
Runs health checks (spill, GC, skew, OOM, straggler detection)
Proxies LLM API calls with streaming (SSE)
Serves live executor metrics

See the spark-agent repository for spark-connect-server deployment instructions.

What it does​

Architecture​

Setup​

Step 1: Build the JAR​

Step 2: Add to your Spark image​

Step 3: Use the image in your job​

Step 4: Access the AI sidebar​

LLM providers​

Spark configuration reference​

spark-connect-server​