Command line utility, written in Python, for querying Elasticsearch in Lucene query syntax or Query DSL syntax and exporting result as documents into a CSV file. This tool can query bulk docs in multiple indices and get only selected fields, this reduces query execution time.
From source:
$ pip install git+https://github.com/taraslayshchuk/es2csv.git
From pip:
$ pip install es2csv
$ es2csv [-h] -q QUERY [-u URL] [-a AUTH] [-i INDEX [INDEX ...]]
[-D DOC_TYPE [DOC_TYPE ...]] [-t TAGS [TAGS ...]] -o FILE
[-f FIELDS [FIELDS ...]] [-d DELIMITER] [-m INTEGER]
[-s INTEGER] [-k] [-r] [-e] [--verify-certs]
[--ca-certs CA_CERTS] [--client-cert CLIENT_CERT]
[--client-key CLIENT_KEY] [-v] [--debug]
Arguments:
-q, --query QUERY Query string in Lucene syntax. [required]
-o, --output_file FILE CSV file location. [required]
-u, --url URL Elasticsearch host URL. Default is http://localhost:9200.
-a, --auth Elasticsearch basic authentication in the form of username:password.
-i, --index-prefixes INDEX [INDEX ...] Index name prefix(es). Default is ['logstash-*'].
-D, --doc_types DOC_TYPE [DOC_TYPE ...] Document type(s).
-t, --tags TAGS [TAGS ...] Query tags.
-f, --fields FIELDS [FIELDS ...] List of selected fields in output. Default is ['_all'].
-d, --delimiter DELIMITER Delimiter to use in CSV file. Default is ",".
-m, --max INTEGER Maximum number of results to return. Default is 0.
-s, --scroll_size INTEGER Scroll size for each batch of results. Default is 100.
-k, --kibana_nested Format nested fields in Kibana style.
-r, --raw_query Switch query format in the Query DSL.
-e, --meta_fields Add meta-fields in output.
--verify-certs Verify SSL certificates. Default is False.
--ca-certs CA_CERTS Location of CA bundle.
--client-cert CLIENT_CERT Location of Client Auth cert.
--client-key CLIENT_KEY Location of Client Cert Key.
-v, --version Show version and exit.
--debug Debug mode on.
-h, --help show this help message and exit
Searching on localhost and save to database.csv
$ es2csv -q 'host: localhost' -o database.csv
Same in Query DSL syntax
$ es2csv -r -q '{"query": {"match": {"host": "localhost"}}}' -o database.csv
Very long queries can be read from file
$ es2csv -r -q @'~/query string file.json' -o database.csv
With tag
$ es2csv -t dev -q 'host: localhost' -o database.csv
More tags
$ es2csv -t dev prod -q 'host: localhost' -o database.csv
On custom Elasticsearch host
$ es2csv -u my.cool.host.com:9200 -q 'host: localhost' -o database.csv
You are using secure Elasticsearch with nginx? No problem!
$ es2csv -u http://my.cool.host.com/es/ -q 'host: localhost' -o database.csv
With enabled SSL certificate verification (off by default)
$ es2csv --verify-certs -u https://my.cool.host.com/es/ -q 'host: localhost' -o database.csv
With your own certificate authority bundle
$ es2csv --ca-certs '/path/to/your/ca_bundle' --verify-certs -u https://host.com -q '*' -o out.csv
Not default port?
$ es2csv -u my.cool.host.com:6666/es/ -q 'host: localhost' -o database.csv
With Authorization
$ es2csv -u http://login:[email protected]:6666/es/ -q 'host: localhost' -o database.csv
With explicit Authorization
$ es2csv -a login:password -u http://my.cool.host.com:6666/es/ -q 'host: localhost' -o database.csv
Specifying index
$ es2csv -i logstash-2015-07-07 -q 'host: localhost' -o database.csv
More indexes
$ es2csv -i logstash-2015-07-07 logstash-2015-08-08 -q 'host: localhost' -o database.csv
Or index mask
$ es2csv -i logstash-2015-* -q 'host: localhost' -o database.csv
And now together
$ es2csv -i logstash-2015-01-0* logstash-2015-01-10 -q 'host: localhost' -o database.csv
Collecting all data on all indices
$ es2csv -i _all -q '*' -o database.csv
Specifying document type
$ es2csv -D log -i _all -q '*' -o database.csv
Selecting some fields, what you are interesting in, if you don't need all of them (query run faster)
$ es2csv -f host status date -q 'host: localhost' -o database.csv
Or field mask
$ es2csv -f 'ho*' 'st*us' '*ate' -q 'host: localhost' -o database.csv
Selecting all fields, by default
$ es2csv -f _all -q 'host: localhost' -o database.csv
Selecting meta-fields: _id, _index, _score, _type
$ es2csv -e -f _all -q 'host: localhost' -o database.csv
Selecting nested fields
$ es2csv -f comments.comment comments.date comments.name -q '*' -i twitter -o database.csv
Max results count
$ es2csv -m 6283185 -q '*' -i twitter -o database.csv
Retrieve 2000 results in just 2 requests (two scrolls 1000 each):
$ es2csv -m 2000 -s 1000 -q '*' -i twitter -o database.csv
Changing column delimiter in CSV file, by default ','
$ es2csv -d ';' -q '*' -i twitter -o database.csv
Changing nested columns output format to Kibana style like
$ es2csv -k -q '*' -i twitter -o database.csv
An JSON document example
{
"title": "Nest eggs",
"body": "Making your money work...",
"tags": [ "cash", "shares" ],
"comments": [
{
"name": "John Smith",
"comment": "Great article",
"age": 28,
"stars": 4,
"date": "2014-09-01"
},
{
"name": "Alice White",
"comment": "More like this please",
"age": 31,
"stars": 5,
"date": "2014-10-22"
}
]
}
A CSV file in Kibana style format
body,comments.age,comments.comment,comments.date,comments.name,comments.stars,tags,title Making your money work...,"28,31","Great article,More like this please","2014-09-01,2014-10-22","John Smith,Alice White","4,5","cash,shares",Nest eggs
A CSV file in default format
body,comments.0.age,comments.0.comment,comments.0.date,comments.0.name,comments.0.stars,comments.1.age,comments.1.comment,comments.1.date,comments.1.name,comments.1.stars,tags.0,tags.1,title Making your money work...,28,Great article,2014-09-01,John Smith,4,31,More like this please,2014-10-22,Alice White,5,cash,shares,Nest eggs