Real Apache Spark. Inside Snowflake.

TL;DR: Offload your Snowflake ETL to real Apache Spark with Quanton for cutting-edge performance and up to 65% compute savings on virtual warehouse compute. Read Iceberg tables via Snowflake Catalog inside your containers, transform with Quanton, and write results back — without switching platforms or touching your governance model.

The Full Story

Our team spent two weeks at Snowflake Summit and Databricks Data+AI Summit, talking with more than 1,200 data engineers, architects, analytics leads, and founders. We took detailed notes. Two numbers stood out.

At Snowflake Summit — despite Snowflake’s sustained effort to brand themselves as the definitive home for a format they didn’t build and don’t own — only 15% of attendees were actively using Iceberg. Nearly the same proportion had never heard of it. 41.8% had no plans to use it at all. At Databricks Data+AI Summit, Delta Lake owned 58.9% of the room. Iceberg sat at 10.7%, Hudi at 8.9%.

State of Table Formats survey results from Snowflake Summit and Databricks Data+AI Summit — Survey of 1,200+ data professionals at Snowflake Summit and Databricks Data+AI Summit. Iceberg adoption remains low at both conferences despite years of industry-wide posturing around open formats.

Those numbers make more sense once you look past the keynotes. While the data community spent several years relitigating the table format wars — with warehouse vendors appointing themselves as judges of which Apache Parquet-based open option should win — none of them actually committed to open formats as their default. They committed to talking about them.

Look at what each platform defaults to in practice. Snowflake defaults to its internal closed file format. BigQuery defaults to Capacitor. Redshift defaults to its own disk-optimized format. Among the vendors who at least run on open formats: Databricks defaults to Delta Lake, after reportedly spending $1B to make Iceberg look more like Delta — their words, not ours. DuckDB defaults to DuckLake. ClickHouse, Pinot, and StarRocks all have their own native formats. Iceberg is everyone’s backup plan, the interoperability hatch you reach for when you need data to be readable outside your own platform. If open formats were genuinely the goal, the default would be Iceberg. It is not, at any of them.

This is open-format washing: the public posture of openness, without the commitment that would actually threaten the moat.

The Real Gap: Open Compute APIs and Spark

The format was never the hardest gap to close. The compute API was.

Diagram showing the gap between open table formats and open compute APIs in the data ecosystem — The open format debate kept the community focused on storage. The real missing layer was always compute.

Apache Spark is to data compute what Iceberg is now pushed to be for table formats: the open standard, with the ecosystem, the format support across Hudi, Iceberg, and Delta, and a decade of production battle-testing behind it. A platform that genuinely supported open data infrastructure would give you real Spark. Snowflake’s answer to this has been Snowpark — a proprietary lookalike.

Snowpark is not Spark

Snowpark was positioned as Spark on Snowflake — a familiar API surface for engineers who already knew PySpark. But familiarity with the API surface is not the same thing as running the engine. Snowpark runs on Snowflake’s own query engine with a Spark-compatible dialect layered on top, making it subject to that engine’s constraints rather than Spark’s.

Snowpark API surface compared to real Apache Spark capabilities — Snowpark supports PySpark syntax. What it doesn't support is the execution model that makes Spark indispensable at scale.

The PySpark syntax is there. DataFrames work. What is absent is the execution model that makes Spark worth running at scale. There are no RDDs. Spark Streaming and MLLib are not supported. UDF execution is constrained. The optimizer controls that experienced Spark engineers reach for — repartitioning strategies, broadcast hints, AQE configuration — are largely absent or behave differently. Snowpark’s execution diverges from real Spark in ways that compound in complex pipelines: skewed joins that Spark’s Adaptive Query Execution would resolve automatically, broadcast thresholds that don’t translate, shuffle behavior that is difficult to predict without deep knowledge of where the two engines part ways. For teams with meaningful Spark investment — production job code, custom libraries, format-specific integrations — Snowpark is not a migration path. It is a parallel product that resembles Spark from the outside and diverges from it where it matters most.

The choice nobody should have to make

Given Snowpark’s limits, the conventional wisdom has pointed Snowflake teams toward a difficult decision: accept the Snowpark ceiling or migrate to Databricks.

The difficult choice between staying on Snowflake with Snowpark limitations or migrating to Databricks — The conventional path to real Spark from Snowflake has required a full platform migration — with everything that entails.

A migration to Databricks means more than switching a compute platform. It means renegotiating contracts, retraining engineering teams, and rebuilding data pipelines that already work. It means taking on a separate per-compute-credit billing relationship with a new vendor on top of your existing Snowflake investment. And it means leaving behind the governance model your organization has built on Snowflake — the data access controls, network policies, audit logs, and SSO integrations that your security and compliance teams depend on. None of that travels to a new platform automatically.

The third option — running open-source Apache Spark directly on Kubernetes or EMR — trades one set of problems for another. You get real Spark, but you also inherit full responsibility for the operational layer: cluster management, version compatibility, driver and executor sizing, shuffle tuning, upgrade cycles. That operational surface is significant, and for teams without dedicated platform engineering capacity, it is often the primary reason Spark exploration stalls before it produces value.

The result is a gap specific to Snowflake users: Iceberg adoption is worth pursuing if you have an interoperability goal, and that interoperability is almost always driven by a need to run a real Spark engine alongside your warehouse. The question becomes whether you can run Spark fast enough and cheaply enough to justify the operational investment — and until now, Snowflake offered no credible answer to that question.

Today: Quanton on Snowpark Container Services

Today we are shipping Quanton on SPCS. This is real Apache Spark — the full engine, not a compatible subset — running directly inside your Snowflake account, accelerated by native vectorized execution, with an embedded AI co-pilot included.

Quanton on SPCS architecture diagram showing Spark running inside Snowflake — Quanton on SPCS: real Apache Spark running inside your Snowflake account on SPCS compute, billed as Snowflake credits.

On TPC-DS benchmarks, Quanton on SPCS delivers 2-5x improved price/performance compared to Databricks with Photon. Against Snowflake’s own warehouse, the same workloads burn 63% fewer Snowflake credits compared to a Gen2 Large Warehouse. Your job code does not change. There is no migration, no second cloud bill, and no new vendor relationship to manage.

Performance comparison between Quanton on SPCS and Databricks with Photon on TPC-DS benchmarks — TPC-DS benchmark results: Quanton on SPCS vs Databricks with Photon. 2-5x improved price/performance across the benchmark suite.

TPC-DS credit consumption comparison: Snowflake Gen2 Large Warehouse vs Quanton on SPCS — 1TB TPC-DS credit consumption: Snowflake Gen2 Large Warehouse at 0.93 credits vs Quanton on SPCS at 0.34 credits — 63% fewer credits burned.

Looking beyond Snowflake’s own warehouse, the price/performance picture across the broader market tells the full story. On TPC-DS, Onehouse lands at $1.00 normalized cost per workload. Databricks All-Purpose sits at $5.27. Snowflake Gen2 at $4.87. EMR Serverless at $3.55. Databricks SQL Serverless at $2.74.

TPC-DS price/performance comparison across Databricks, Snowflake, EMR, and Onehouse — TPC-DS Price/Performance across platforms. Databricks All-Purpose at $5.27, Snowflake Gen2 at $4.87, EMR Serverless at $3.55, Databricks SQL Serverless at $2.74 — and Onehouse at $1.00.

If the industry’s low Iceberg adoption is partly explained by the absence of a fast and affordable Spark engine to run it on, these numbers change that equation directly.

What Quanton actually is

Quanton is a drop-in execution layer for Apache Spark — not a partial accelerator for specific operations, but an engine that accelerates the entire job end-to-end: reading, processing, and writing.

Quanton's Velox-based execution architecture showing end-to-end acceleration — Quanton replaces Spark's JVM runtime with Velox, accelerating the full pipeline — not just isolated operations.

It replaces Spark’s JVM-based runtime with Velox, a vectorized C++ execution engine originally developed at Meta and subsequently open-sourced. The substitution is transparent to your code: the same DataFrame API, the same SQL surface, the same spark-submit invocation, the same support for Hudi, Iceberg, and Delta. What changes is the execution path a query takes once it enters the runtime.

The JVM carries inherent overhead for data-intensive workloads. Row-level processing, object allocation, garbage collection pressure, and Java serialization all impose costs that compound at scale. Velox is designed around columnar execution from the ground up. It uses SIMD vector instructions to process batches of column data simultaneously, manages memory off-heap to avoid GC interference, and eliminates the serialization round-trips that inflate I/O cost in standard Spark. For workloads involving large-scale aggregations, joins, and scans — the core of any analytics pipeline — the difference is material and consistent rather than workload-specific.

Why SPCS is the right deployment model

Snowpark Container Services is Snowflake’s bring-your-own-image runtime: you supply a container image, Snowflake runs it on compute pools inside your account, and the result is billed as Snowflake credits. Quanton slots into this model directly. Your Spark jobs run on SPCS compute, your entitlement is stored as a Snowflake secret encrypted at rest, and job submission happens from a Snowsight SQL worksheet. The experience for operators stays almost entirely inside the Snowflake UI.

The billing model is one part of the value here. Because SPCS compute is Snowflake-billed, running Quanton on SPCS does not require a separate cloud compute contract or a new vendor relationship. Your Spark workloads land in the same credits bucket as the rest of your Snowflake usage.

The governance model is the other part, and arguably more important for many teams. Your data does not leave your Snowflake account. Your existing network policies, private link configurations, audit logging, and access control policies apply to SPCS workloads exactly as they do to warehouse workloads. For teams where Snowflake was chosen in part because of its security and compliance posture, SPCS means Spark workloads inherit that posture rather than requiring a new security review and a new perimeter to manage.

An AI agent for every Spark job

Every Quanton deployment ships with an embedded AI co-pilot that functions as a dedicated Spark SRE watching every job in real time. It continuously ingests logs, stage DAGs, executor metrics, task timelines, shuffle statistics, and JVM/GC pressure into a single live view of what the job is doing.

Quanton AI agent monitoring a Spark job in real time inside the Spark UI — The Quanton AI agent lives inside the Spark UI, watching every job and surfacing specific diagnoses and remedies in real time.

When a job degrades or fails, the co-pilot identifies the specific mechanism rather than surfacing generic error messages. It distinguishes between an OOM caused by insufficient driver memory, an OOM caused by skewed join partitions, and an OOM caused by a broadcast hint applied to a table that has grown beyond the broadcast threshold — and it recommends the appropriate remedy for each. The same specificity applies to skew, fetch failures, broadcast timeouts, and the full range of operational issues that consume engineering time in production Spark environments.

The co-pilot lives inside the Spark UI and is backed by a knowledge server trained on a decade of Spark production experience, including lakehouse-specific patterns for Hudi and Iceberg workloads. You connect it with your own Claude or OpenAI API key, stored in your browser. Prompts and job data do not touch our servers.

How deployment works

The deployment model is designed around the constraint that most Snowflake operators work primarily in Snowsight, not the command line. Everything except a single one-time step runs from SQL worksheets.

That one step is pushing the Quanton container image to your Snowflake image repository. SPCS only runs images from your account’s own registry, and the push requires Docker. Once the image is in your registry, it never needs to be pushed again unless you want to update the version. After that, all job submission, monitoring, and lifecycle management happens in Snowsight.

The entitlement that enables Velox execution comes from your onehouse-values.yaml, downloaded from the Onehouse console when you create a project. You store the entitlement blob once as a Snowflake secret — a single SQL statement — and Quanton reads it from there at startup, deriving its configuration automatically. You never paste credentials into a job spec.

Getting from a new project to a running Iceberg write looks like this:

Create a project in the Onehouse console, select Quanton on Snowflake (SPCS), and download onehouse-values.yaml
Store the entitlement as a Snowflake secret with one SQL statement in Snowsight
Create an image repository, egress network rule, external access integration, and compute pool — all from a SQL worksheet
Push the Quanton image to your SPCS repository (one time, from the command line)
Submit a gate-check job from Snowsight to confirm Velox is live and the entitlement is accepted
Submit your first Iceberg or Hudi write to S3

The complete walkthrough with all SQL and job scripts is at quanton.dev/docs/guides/snowflake. If you have a Snowflake account with ACCOUNTADMIN access, you can be running your first job in under an hour.

Wrapping up: Stay on Snowflake. Beat Databricks.

The industry’s slow adoption of Iceberg is not an indication that data engineers don’t want open formats. It’s an indication that the tooling hasn’t made the switch worth it. Snowpark isn’t a real Spark alternative. Migrating to Databricks to get real Spark is an expensive and disruptive bet. Running Spark yourself carries an operational burden most teams can’t absorb.

Quanton on SPCS is the path that doesn’t require any of those tradeoffs. Real Apache Spark, running inside the account you already have, at a price/performance that beats every alternative in the market — without moving your data, your governance model, or your team.

Get started free — your first 100 GB is on us. The full deployment guide is at quanton.dev/docs/guides/snowflake, or join the community if you want to talk through your workload before deploying.