ADBC Flight SQL 1.10.0: Bulk Ingestion for GizmoSQL

We're excited to announce that ADBC Flight SQL Python driver version 1.10.0 has been released, bringing a game-changing feature to Arrow Flight SQL servers like GizmoSQL: bulk ingestion.

This release enables you to easily ingest data from clients directly into GizmoSQL, where it can create tables, append to existing tables, or replace existing tables—all while automatically preserving data types from your source Arrow data.

What is Bulk Ingestion?

Bulk ingestion allows you to push large volumes of data from your client application directly into an Arrow Flight SQL server. With the new adbc_ingest method, you can:

Create new tables from Arrow data with automatic schema inference
Append to existing tables
Replace existing tables with new data

The data types in the destination table automatically match your source Arrow RecordBatch or Table—no manual schema definition required.

Combining with the ADBC Scanner DuckDB Extension

This feature becomes even more powerful when combined with the ADBC Scanner DuckDB extension. You can easily push data from DuckDB (or other GizmoSQL servers) directly into GizmoSQL, enabling seamless data movement across your analytics infrastructure.

Installation

The ADBC Flight SQL driver is available on PyPI. The easiest way to install it is using Columnar's dbc tool:

dbc install flightsql

Or install directly via pip:

pip install adbc-driver-flightsql

Python Dependencies

To run the example below, you'll need these dependencies:

adbc-driver-flightsql
duckdb==1.4.3
pyarrow==22.0.*
codetiming==1.4.0
python-dotenv==1.2.*

Try It Yourself

We've set up a public GizmoSQL server so you can try bulk ingestion right now! Use these credentials:

GIZMOSQL_USERNAME=adbc-scanner
GIZMOSQL_PASSWORD="QueryDotFarmRules!123"

Add these to a .env file in your project directory, and the example code below will pick them up automatically.

Example: Loading TPC-H Data into GizmoSQL

Here's a complete example showing how to generate TPC-H benchmark data in DuckDB and bulk ingest it into GizmoSQL:

1import os
2import time
3
4import duckdb
5from adbc_driver_manager import dbapi
6from codetiming import Timer
7from dotenv import load_dotenv
8
9# Timer logging setup
10TIMER_TEXT = "{name}: Elapsed time: {:.4f} seconds"
11
12
13def main():
14    load_dotenv()
15
16    with Timer(name="Overall program", text=TIMER_TEXT, initial_text=True):
17        # Generate 1GB of TPC-H data in DuckDB
18        with Timer(name="  Generate TPCH data (1GB)", text=TIMER_TEXT, initial_text=True):
19            duckdb_conn = duckdb.connect()
20            duckdb_conn.install_extension("tpch")
21            duckdb_conn.load_extension("tpch")
22            duckdb_conn.execute(query="CALL dbgen(sf=1.0)")
23
24        # Get Arrow reader for the lineitem table
25        with Timer(name="  Get RecordBatch reader", text=TIMER_TEXT, initial_text=True):
26            lineitem_arrow_reader = duckdb_conn.table("lineitem").fetch_arrow_reader(batch_size=10_000)
27
28        # Bulk ingest into GizmoSQL
29        with Timer(name="  Bulk ingest into GizmoSQL", text=TIMER_TEXT, initial_text=True):
30            with dbapi.connect(
31                driver="flightsql",
32                uri="grpc+tls://try-gizmosql-adbc.gizmodata.com:31337",
33                db_kwargs={
34                    "username": os.environ["GIZMOSQL_USERNAME"],
35                    "password": os.environ["GIZMOSQL_PASSWORD"]
36                },
37                autocommit=True
38            ).cursor() as cursor:
39                ingest_start = time.perf_counter()
40                rows_loaded = cursor.adbc_ingest(
41                    table_name="bulk_ingest_lineitem",
42                    data=lineitem_arrow_reader,
43                    mode="replace"
44                )
45                ingest_seconds = time.perf_counter() - ingest_start
46
47                rows_per_sec = rows_loaded / ingest_seconds if ingest_seconds > 0 else float("inf")
48                print(f"Loaded rows: {rows_loaded:,}")
49                print(f"Ingest time: {ingest_seconds:.4f} s")
50                print(f"Rows/sec: {rows_per_sec:,.2f}")
51
52                # Verify the row count
53                cursor.execute("SELECT COUNT(*) FROM bulk_ingest_lineitem")
54                result = cursor.fetchone()[0]
55                print(f"Row count verification: {result:,}")
56
57
58if __name__ == "__main__":
59    main()

Performance Results

Running this example against GizmoSQL, we achieved impressive throughput:

Timer Overall program started
Timer   Generate TPCH data (1GB) started
  Generate TPCH data (1GB): Elapsed time: 2.3585 seconds
Timer   Get RecordBatch reader started
  Get RecordBatch reader: Elapsed time: 0.0040 seconds
Timer   Bulk ingest into GizmoSQL started
Loaded rows: 6,001,215
Ingest time: 33.9456 s
Rows/sec: 176,789.18
Row count verification: 6,001,215
  Bulk ingest into GizmoSQL: Elapsed time: 35.9270 seconds
Overall program: Elapsed time: 38.2903 seconds

6 million rows ingested in under 34 seconds
~177,000 rows/second throughput
Automatic schema inference from the Arrow data

Acknowledgments

GizmoData is proud to have contributed to this Apache Arrow release. Our pull request implementing bulk ingestion for the Flight SQL driver was merged and is now part of Apache Arrow ADBC Release 22.

We'd like to extend our sincere thanks to the Columnar team, who created and help maintain ADBC, for their expertise and collaboration in bringing bulk ingestion support to the Flight SQL driver.

Get Started

Ready to try bulk ingestion with GizmoSQL? Here's how to get started:

Install the driver: dbc install flightsql
Try GizmoSQL or check out the documentation
Explore the GizmoSQL GitHub repository

Have questions or want to learn more? Contact us or reach out on GitHub.