Back to Blog
    Performance
    GizmoData Team
    October 22, 2025
    8 min read

    GizmoEdge Takes on the 1 Trillion Row Challenge

    What happens when you give a distributed SQL engine a trillion-row dataset? GizmoEdge crushed the Coiled challenge with a 1,000-worker cluster on Azure.

    GizmoEdge
    DuckDB
    Big Data
    Performance
    Distributed Computing
    GizmoEdge One Trillion Row Challenge

    What happens when you give a distributed SQL engine a trillion-row dataset? You find out what it's really made of.

    Last week, we put GizmoEdge—our distributed, IoT-ready data engine—to the test by running the Coiled 1 Trillion Row Challenge on Azure. The goal: process and summarize one trillion records from the measurements dataset as fast as possible.

    Infrastructure Setup

    We deployed a 1,000-worker GizmoEdge cluster, each worker powered by DuckDB and orchestrated through Kubernetes. Our cluster ran on Azure Standard E64pds v6 nodes, each providing 64 vCPUs and 504 GiB of RAM.

    Each GizmoEdge worker pod was provisioned with 3.8 vCPUs (3800 m) and 30 GiB RAM, allowing roughly 16 workers per node—meaning the test required about 63 nodes in total.

    Performance Results

    Baseline Query

    SELECT COUNT(*) FROM measurements;
    • Execution time: < 0.5 seconds
    • Rows counted: 1,000,000,000,000

    Aggregation Challenge Query

    SELECT station, MIN(measure), MAX(measure), AVG(measure)
    FROM measurements
    GROUP BY station
    ORDER BY station;
    • Execution time: < 5 seconds
    • Result set: 412 rows

    Each grouped row represented an aggregation of roughly 2.4 billion rows—and GizmoEdge completed it across all workers in seconds.

    Watch GizmoEdge complete the challenge:

    GizmoEdge Demo Video

    How GizmoEdge Works

    GizmoEdge's architecture is designed for massive scale, high performance, and secure execution.

    SQL Parsing & Planning

    The GizmoEdge Server receives a SQL query from the client, parses it, and generates two statements:

    • A worker SQL to execute on each distributed node
    • A combinatorial SQL to run server-side for final aggregation

    Shard Distribution

    Each worker requests a data shard from the server. The server responds with:

    • A SHA-256 hash of the shard file (to verify download integrity)
    • A token-based authentication handshake that ensures only authorized workers can participate

    Workers download, decompress, and materialize their shards into DuckDB databases built from Parquet files.

    Secure "Trust But Verify" Model

    All worker-server communication runs over TLS-encrypted WebSockets, ensuring confidentiality and authenticity. Each worker:

    1. Authenticates with a signed token validated by the server
    2. Verifies the shard's SHA-256 hash upon download to ensure it matches what the server issued
    3. Computes its own MD5 hash of the shard and returns it to the server
    4. The server compares the hashes—only if they match does it "trust" that worker for subsequent query processing

    Parallel Execution & Aggregation

    Once trusted, each worker executes its local query through DuckDB and streams intermediate Arrow IPC datasets back to the server over secure WebSockets. The server merges and aggregates all results in parallel to produce the final SQL result—often in seconds.

    Heterogeneous Compute: From Cloud to Edge

    GizmoEdge isn't limited to Azure VMs. It's designed for heterogeneous computing—running workers across IoT devices, laptops, mobile phones, or cloud clusters simultaneously.

    See GizmoEdge distributing queries across AWS, Azure, GCP, and edge devices like iPhones and Kubernetes pods: https://www.youtube.com/watch?v=gIgFKniKAdk

    Challenge Details

    Want to learn more about the 1 Trillion Row Challenge? You can find full details, including how to access the publicly available dataset, at the official challenge repository: https://github.com/coiled/1trc

    The challenge provides a comprehensive benchmark for testing distributed data processing systems at scale, making it an excellent way to evaluate real-world performance capabilities.

    GizmoSQL Also Took the Challenge

    GizmoEdge isn't the only GizmoData product that tackled the 1 Trillion Row Challenge. GizmoSQL, our single-node DuckDB Arrow Flight SQL server, also completed the challenge with impressive results.

    Using a single AWS Graviton 4 instance, GizmoSQL processed the trillion-row dataset in just over 2 minutes. Read about GizmoSQL's approach and results to see how single-node performance compares to distributed execution.

    What's Next

    GizmoEdge is still pre-production, and we're inviting design partners who want to push the boundaries of distributed analytics.

    If your organization works with multi-terabyte or even petabyte-scale data—and wants to see how GizmoEdge can execute your queries in seconds—reach out.

    GizmoEdge — The distributed SQL engine for the modern data frontier.

    Ready to Try GizmoEdge?

    Join us in redefining distributed analytics at the edge of the data lake