Skip to content

Commit

Permalink
Update README and release 0.2.0
Browse files Browse the repository at this point in the history
Signed-off-by: Evan Wies <[email protected]>
  • Loading branch information
neomantra committed Jan 16, 2025
1 parent f38da6d commit 9a17c0d
Show file tree
Hide file tree
Showing 2 changed files with 25 additions and 4 deletions.
4 changes: 2 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# CHANGELOG

## v0.2.0 (unreleased)
## v0.2.0 (2025-01-15)

* Add `dbn-go-file parquet` tool for processing DBN files with commands:
* `metadata`
Expand Down Expand Up @@ -58,7 +58,7 @@
## v0.0.8 (2024-05-28)

* Add initial Live API support
* Add Mpb1, Mbp10, Mbo, Error, SymbolMapping, System, Statistics
* Add Mbp1, Mbp10, Mbo, Error, SymbolMapping, System, Statistics
* Add Dockerfile
* Minor interface tweaks and bug fixes

Expand Down
25 changes: 23 additions & 2 deletions cmd/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,26 +42,47 @@ Usage:
Available Commands:
completion Generate the autocompletion script for the specified shell
help Help about any command
json Prints the specified file's records as JSON
json Prints the specified files' records as JSON
metadata Prints the specified file's metadata as JSON
parquet Writes the specified files' records as parquet
split Splits Databento download folders into "<feed>/<instrument_id>/Y/M/D/feed-YMD.type.dbn.zst"
Flags:
-h, --help help for dbn-go-file
-v, --verbose Verbose output
Use "dbn-go-file [command] --help" for more information about a command.
```


### `dbn-go-file parquet`

`dbn-go-file split` is a command to generate Parquet files from DBN files. It strives to have the same output as the `to_parquet` function [in Databento's Python SDK](https://databento.com/docs/api-reference-historical/helpers/dbn-store-to-parquet?historical=python&live=python&reference=python). The simple [`dbn_to_parquet.py`](./dbn_to_parquet.py) script uses that to create tests.
`dbn-go-file parquet` is a command to generate [Parquet files](https://parquet.apache.org) from DBN files. This tools strives to have the same output as the `to_parquet` function [in Databento's Python SDK](https://databento.com/docs/api-reference-historical/helpers/dbn-store-to-parquet?historical=python&live=python&reference=python). The included simple [`dbn_to_parquet.py`](./dbn_to_parquet.py) script uses that Python SDK to create tests.

```sh
./dbn_to_parquet.py tests/data/test_data.ohlcv-1s.dbn
parquet-reader tests/data/test_data.ohlcv-1s.dbn.parquet > py.parquet.txt

dbn-go-file parquet tests/data/test_data.ohlcv-1s.dbn
parquet-reader tests/data/test_data.ohlcv-1s.dbn.parquet > go.parquet.txt

diff py.parquet.txt go.parquet.txt
```

Parquet is a common columnar data persistance format. For example, DuckDB [natively supports](https://duckdb.org/docs/data/parquet/overview.html) Parquet files:

```sh
$ dbn-go-file parquet tests/data/test_data.ohlcv-1s.dbn
$ duckdb mycandles.duckdb
D CREATE TABLE mycandles AS SELECT * FROM './tests/data/test_data.ohlcv-1s.dbn.parquet';
D SELECT * FROM mycandles;
┌───────┬──────────────┬───────────────┬──────────┬──────────┬──────────┬──────────┬────────┬─────────┬──────────────────────────┐
│ rtype │ publisher_id │ instrument_id │ open │ high │ low │ close │ volume │ symbol │ ts_event │
│ uint8 │ uint16 │ uint32 │ double │ double │ double │ double │ uint64 │ varchar │ timestamp with time zone │
├───────┼──────────────┼───────────────┼──────────┼──────────┼──────────┼──────────┼────────┼─────────┼──────────────────────────┤
│ 32 │ 1 │ 5482 │ 372025.0 │ 372050.0 │ 372025.0 │ 372050.0 │ 57 │ ESH1 │ 2020-12-28 08:00:00-05 │
│ 32 │ 1 │ 5482 │ 372050.0 │ 372050.0 │ 372050.0 │ 372050.0 │ 13 │ ESH1 │ 2020-12-28 08:00:01-05 │
└───────┴──────────────┴───────────────┴──────────┴──────────┴──────────┴──────────┴────────┴─────────┴──────────────────────────┘
```

### `dbn-go-file split`
Expand Down

0 comments on commit 9a17c0d

Please sign in to comment.