Skip to content

Latest commit

 

History

History
40 lines (34 loc) · 1.82 KB

partition.md

File metadata and controls

40 lines (34 loc) · 1.82 KB

xan partition

Partitions the given CSV data into chunks based on the value of a column

The files are written to the output directory with filenames based on the
values in the partition column and the `--filename` flag.

Note that most operating systems avoid opening more than 1024 files at once,
so if you know the cardinality of the paritioned column is very high, please
sort the file on this column beforehand and use the -S/--sorted flag.

Usage:
    xan partition [options] <column> <outdir> [<input>]
    xan partition --help

partition options:
    --filename <filename>    A filename template to use when constructing
                             the names of the output files.  The string '{}'
                             will be replaced by a value based on the value
                             of the field, but sanitized for shell safety.
                             [default: {}.csv]
    -p, --prefix-length <n>  Truncate the partition column after the
                             specified number of bytes when creating the
                             output file.
    -S, --sorted             Use this flag if you know the file is sorted
                             on the partition column in advance, so the command
                             can run faster and with less memory and resources
                             opened.
    --drop                   Drop the partition column from results.

Common options:
    -h, --help             Display this message
    -n, --no-headers       When set, the first row will NOT be interpreted
                           as column names. Otherwise, the first row will
                           appear in all chunks as the header row.
    -d, --delimiter <arg>  The field delimiter for reading CSV data.
                           Must be a single character.