$ pip install pyarrow duckdb
$ python3
>>> import duckdb
>>> import pyarrow.parquet as pq
>>> con = duckdb.connect(database=':memory:')
>>> con.execute("INSTALL tpch; LOAD tpch")
>>> con.execute("CALL dbgen(sf=10)")
>>> print(con.execute("show tables").fetchall())
[('customer',), ('lineitem',), ('nation',), ('orders',), ('part',), ('partsupp',), ('region',), ('supplier',)]
>>> tables = ["customer", "lineitem", "nation", "orders", "part", "partsupp", "region", "supplier"]
>>> for t in tables:
... res = con.query("SELECT * FROM " + t)
... pq.write_table(res.to_arrow_table(), t + ".parquet")
...
-
Notifications
You must be signed in to change notification settings - Fork 2
Generate tpch data in parquet format
License
ljishen/tpch-data
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Generate tpch data in parquet format
Topics
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published