What happens when you give a distributed SQL engine a trillion-row dataset? You find out what it's really made of.
Last week, we put GizmoEdge—our distributed, IoT-ready data engine—to the test by running the Coiled 1 Trillion Row Challenge on Azure. The goal: process and summarize one trillion records from the measurements dataset as fast as possible.
Infrastructure Setup
We deployed a 1,000-worker GizmoEdge cluster, each worker powered by DuckDB and orchestrated through Kubernetes. Our cluster ran on Azure Standard E64pds v6 nodes, each providing 64 vCPUs and 504 GiB of RAM.
Each GizmoEdge worker pod was provisioned with 3.8 vCPUs (3800 m) and 30 GiB RAM, allowing roughly 16 workers per node—meaning the test required about 63 nodes in total.
Performance Results
Baseline Query
SELECT COUNT(*) FROM measurements;- Execution time: < 0.5 seconds
- Rows counted: 1,000,000,000,000
Aggregation Challenge Query
SELECT station, MIN(measure), MAX(measure), AVG(measure)
FROM measurements
GROUP BY station
ORDER BY station;- Execution time: < 5 seconds
- Result set: 412 rows
Each grouped row represented an aggregation of roughly 2.4 billion rows—and GizmoEdge completed it across all workers in seconds.
Watch GizmoEdge complete the challenge:

How GizmoEdge Works
GizmoEdge's architecture is designed for massive scale, high performance, and secure execution.
SQL Parsing & Planning
The GizmoEdge Server receives a SQL query from the client, parses it, and generates two statements:
- A worker SQL to execute on each distributed node
- A combinatorial SQL to run server-side for final aggregation
Shard Distribution
Each worker requests a data shard from the server. The server responds with:
- A SHA-256 hash of the shard file (to verify download integrity)
- A token-based authentication handshake that ensures only authorized workers can participate
Workers download, decompress, and materialize their shards into DuckDB databases built from Parquet files.
Secure "Trust But Verify" Model
All worker-server communication runs over TLS-encrypted WebSockets, ensuring confidentiality and authenticity. Each worker:
- Authenticates with a signed token validated by the server
- Verifies the shard's SHA-256 hash upon download to ensure it matches what the server issued
- Computes its own MD5 hash of the shard and returns it to the server
- The server compares the hashes—only if they match does it "trust" that worker for subsequent query processing
Parallel Execution & Aggregation
Once trusted, each worker executes its local query through DuckDB and streams intermediate Arrow IPC datasets back to the server over secure WebSockets. The server merges and aggregates all results in parallel to produce the final SQL result—often in seconds.
Heterogeneous Compute: From Cloud to Edge
GizmoEdge isn't limited to Azure VMs. It's designed for heterogeneous computing—running workers across IoT devices, laptops, mobile phones, or cloud clusters simultaneously.
See GizmoEdge distributing queries across AWS, Azure, GCP, and edge devices like iPhones and Kubernetes pods: https://www.youtube.com/watch?v=gIgFKniKAdk
Challenge Details
Want to learn more about the 1 Trillion Row Challenge? You can find full details, including how to access the publicly available dataset, at the official challenge repository: https://github.com/coiled/1trc
The challenge provides a comprehensive benchmark for testing distributed data processing systems at scale, making it an excellent way to evaluate real-world performance capabilities.
GizmoSQL Also Took the Challenge
GizmoEdge isn't the only GizmoData product that tackled the 1 Trillion Row Challenge. GizmoSQL, our single-node DuckDB Arrow Flight SQL server, also completed the challenge with impressive results.
Using a single AWS Graviton 4 instance, GizmoSQL processed the trillion-row dataset in just over 2 minutes. Read about GizmoSQL's approach and results to see how single-node performance compares to distributed execution.
What's Next
GizmoEdge is still pre-production, and we're inviting design partners who want to push the boundaries of distributed analytics.
If your organization works with multi-terabyte or even petabyte-scale data—and wants to see how GizmoEdge can execute your queries in seconds—reach out.
GizmoEdge — The distributed SQL engine for the modern data frontier.