We're excited to announce that ADBC Flight SQL Python driver version 1.10.0 has been released, bringing a game-changing feature to Arrow Flight SQL servers like GizmoSQL: bulk ingestion.
This release enables you to easily ingest data from clients directly into GizmoSQL, where it can create tables, append to existing tables, or replace existing tables—all while automatically preserving data types from your source Arrow data.
What is Bulk Ingestion?
Bulk ingestion allows you to push large volumes of data from your client application directly into an Arrow Flight SQL server. With the new adbc_ingest method, you can:
- Create new tables from Arrow data with automatic schema inference
- Append to existing tables
- Replace existing tables with new data
The data types in the destination table automatically match your source Arrow RecordBatch or Table—no manual schema definition required.
Combining with the ADBC Scanner DuckDB Extension
This feature becomes even more powerful when combined with the ADBC Scanner DuckDB extension. You can easily push data from DuckDB (or other GizmoSQL servers) directly into GizmoSQL, enabling seamless data movement across your analytics infrastructure.
Installation
The ADBC Flight SQL driver is available on PyPI. The easiest way to install it is using Columnar's dbc tool:
dbc install flightsqlOr install directly via pip:
pip install adbc-driver-flightsqlPython Dependencies
To run the example below, you'll need these dependencies:
adbc-driver-flightsql
duckdb==1.4.3
pyarrow==22.0.*
codetiming==1.4.0
python-dotenv==1.2.*Try It Yourself
We've set up a public GizmoSQL server so you can try bulk ingestion right now! Use these credentials:
GIZMOSQL_USERNAME=adbc-scanner
GIZMOSQL_PASSWORD="QueryDotFarmRules!123"Add these to a .env file in your project directory, and the example code below will pick them up automatically.
Example: Loading TPC-H Data into GizmoSQL
Here's a complete example showing how to generate TPC-H benchmark data in DuckDB and bulk ingest it into GizmoSQL:
1import os
2import time
3
4import duckdb
5from adbc_driver_manager import dbapi
6from codetiming import Timer
7from dotenv import load_dotenv
8
9# Timer logging setup
10TIMER_TEXT = "{name}: Elapsed time: {:.4f} seconds"
11
12
13def main():
14 load_dotenv()
15
16 with Timer(name="Overall program", text=TIMER_TEXT, initial_text=True):
17 # Generate 1GB of TPC-H data in DuckDB
18 with Timer(name=" Generate TPCH data (1GB)", text=TIMER_TEXT, initial_text=True):
19 duckdb_conn = duckdb.connect()
20 duckdb_conn.install_extension("tpch")
21 duckdb_conn.load_extension("tpch")
22 duckdb_conn.execute(query="CALL dbgen(sf=1.0)")
23
24 # Get Arrow reader for the lineitem table
25 with Timer(name=" Get RecordBatch reader", text=TIMER_TEXT, initial_text=True):
26 lineitem_arrow_reader = duckdb_conn.table("lineitem").fetch_arrow_reader(batch_size=10_000)
27
28 # Bulk ingest into GizmoSQL
29 with Timer(name=" Bulk ingest into GizmoSQL", text=TIMER_TEXT, initial_text=True):
30 with dbapi.connect(
31 driver="flightsql",
32 uri="grpc+tls://try-gizmosql-adbc.gizmodata.com:31337",
33 db_kwargs={
34 "username": os.environ["GIZMOSQL_USERNAME"],
35 "password": os.environ["GIZMOSQL_PASSWORD"]
36 },
37 autocommit=True
38 ).cursor() as cursor:
39 ingest_start = time.perf_counter()
40 rows_loaded = cursor.adbc_ingest(
41 table_name="bulk_ingest_lineitem",
42 data=lineitem_arrow_reader,
43 mode="replace"
44 )
45 ingest_seconds = time.perf_counter() - ingest_start
46
47 rows_per_sec = rows_loaded / ingest_seconds if ingest_seconds > 0 else float("inf")
48 print(f"Loaded rows: {rows_loaded:,}")
49 print(f"Ingest time: {ingest_seconds:.4f} s")
50 print(f"Rows/sec: {rows_per_sec:,.2f}")
51
52 # Verify the row count
53 cursor.execute("SELECT COUNT(*) FROM bulk_ingest_lineitem")
54 result = cursor.fetchone()[0]
55 print(f"Row count verification: {result:,}")
56
57
58if __name__ == "__main__":
59 main()Performance Results
Running this example against GizmoSQL, we achieved impressive throughput:
Timer Overall program started
Timer Generate TPCH data (1GB) started
Generate TPCH data (1GB): Elapsed time: 2.3585 seconds
Timer Get RecordBatch reader started
Get RecordBatch reader: Elapsed time: 0.0040 seconds
Timer Bulk ingest into GizmoSQL started
Loaded rows: 6,001,215
Ingest time: 33.9456 s
Rows/sec: 176,789.18
Row count verification: 6,001,215
Bulk ingest into GizmoSQL: Elapsed time: 35.9270 seconds
Overall program: Elapsed time: 38.2903 seconds- 6 million rows ingested in under 34 seconds
- ~177,000 rows/second throughput
- Automatic schema inference from the Arrow data
Acknowledgments
GizmoData is proud to have contributed to this Apache Arrow release. Our pull request implementing bulk ingestion for the Flight SQL driver was merged and is now part of Apache Arrow ADBC Release 22.
We'd like to extend our sincere thanks to the Columnar team, who created and help maintain ADBC, for their expertise and collaboration in bringing bulk ingestion support to the Flight SQL driver.
Get Started
Ready to try bulk ingestion with GizmoSQL? Here's how to get started:
- Install the driver:
dbc install flightsql - Try GizmoSQL or check out the documentation
- Explore the GizmoSQL GitHub repository
Have questions or want to learn more? Contact us or reach out on GitHub.