Take JSON in CLI #63

alistaire47 · 2022-09-16T17:11:20Z

Given

a. the number of options for dataset formatting will continue to grow and that having a CLI flag for each will get ungainly, and
b. this package will primarily be used by machines,

it seems like a good idea to accept a JSON blob containing all the bits. Indeed, we pretty much have to to accept schemas, anyway.

There are some options here:

switch out the generate interface entirely (i.e. remove the current one): datalogistik generate '<blob>'
add a separate interface in addition: datalogistik generate --json '<blob' or datalogistik generate-json '<blob>'
accept blobs for certain parameters that can get complicated: datalogistik generate -d fanniemae -f '<blob>' or datalogistik generate -d fanniemae --format-json '<blob>'

There are tradeoffs in maintenance burden and human usability. Regardless, JSON schemas should be

i. well documented in a fashion that will stay in sync as they evolve
ii. not require all fields to be specified where they don't matter (e.g. chunk size for CSVs) or defaults are fine (chunk size for parquet most of the time)

Given the new Dataset class in #62, it probably makes sense to accept part or all of its JSON-serialized form, so we could just unpack it with Dataset(**<blob>) or Dataset(name="fanniemae", format=**<blob>).

Before picking up work for this, we should make a decision of which option we prefer.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Take JSON in CLI #63

Take JSON in CLI #63

alistaire47 commented Sep 16, 2022

Take JSON in CLI #63

Take JSON in CLI #63

Comments

alistaire47 commented Sep 16, 2022