es2csv

A CLI tool for exporting data from Elasticsearch into a CSV file

Command line utility, written in Python, for querying Elasticsearch in Lucene query syntax or Query DSL syntax and exporting result as documents into a CSV file. This tool can query bulk docs in multiple indices and get only selected fields, this reduces query execution time.

Quick Look Demo

https://cloud.githubusercontent.com/assets/7491121/12016825/59eb5f82-ad58-11e5-81eb-871a49e39c37.gif

Installation

From source:

$ pip install git+https://github.com/taraslayshchuk/es2csv.git

From pip:

$ pip install es2csv

Usage

$ es2csv [-h] -q QUERY [-u URL] [-a AUTH] [-i INDEX [INDEX ...]]
         [-D DOC_TYPE [DOC_TYPE ...]] [-t TAGS [TAGS ...]] -o FILE
         [-f FIELDS [FIELDS ...]] [-d DELIMITER] [-m INTEGER]
         [-s INTEGER] [-k] [-r] [-e] [--verify-certs]
         [--ca-certs CA_CERTS] [--client-cert CLIENT_CERT]
         [--client-key CLIENT_KEY] [-v] [--debug]

Arguments:
 -q, --query QUERY                        Query string in Lucene syntax.               [required]
 -o, --output_file FILE                   CSV file location.                           [required]
 -u, --url URL                            Elasticsearch host URL. Default is http://localhost:9200.
 -a, --auth                               Elasticsearch basic authentication in the form of username:password.
 -i, --index-prefixes INDEX [INDEX ...]   Index name prefix(es). Default is ['logstash-*'].
 -D, --doc_types DOC_TYPE [DOC_TYPE ...]  Document type(s).
 -t, --tags TAGS [TAGS ...]               Query tags.
 -f, --fields FIELDS [FIELDS ...]         List of selected fields in output. Default is ['_all'].
 -d, --delimiter DELIMITER                Delimiter to use in CSV file. Default is ",".
 -m, --max INTEGER                        Maximum number of results to return. Default is 0.
 -s, --scroll_size INTEGER                Scroll size for each batch of results. Default is 100.
 -k, --kibana_nested                      Format nested fields in Kibana style.
 -r, --raw_query                          Switch query format in the Query DSL.
 -e, --meta_fields                        Add meta-fields in output.
 --verify-certs                           Verify SSL certificates. Default is False.
 --ca-certs CA_CERTS                      Location of CA bundle.
 --client-cert CLIENT_CERT                Location of Client Auth cert.
 --client-key CLIENT_KEY                  Location of Client Cert Key.
 -v, --version                            Show version and exit.
 --debug                                  Debug mode on.
 -h, --help                               show this help message and exit

Examples

Searching on localhost and save to database.csv

$ es2csv -q 'host: localhost' -o database.csv

Same in Query DSL syntax

$ es2csv -r -q '{"query": {"match": {"host": "localhost"}}}' -o database.csv

Very long queries can be read from file

$ es2csv -r -q @'~/query string file.json' -o database.csv

With tag

$ es2csv -t dev -q 'host: localhost' -o database.csv

More tags

$ es2csv -t dev prod -q 'host: localhost' -o database.csv

On custom Elasticsearch host

$ es2csv -u my.cool.host.com:9200 -q 'host: localhost' -o database.csv

You are using secure Elasticsearch with nginx? No problem!

$ es2csv -u http://my.cool.host.com/es/ -q 'host: localhost' -o database.csv

With enabled SSL certificate verification (off by default)

$ es2csv --verify-certs -u https://my.cool.host.com/es/ -q 'host: localhost' -o database.csv

With your own certificate authority bundle

$ es2csv --ca-certs '/path/to/your/ca_bundle' --verify-certs -u https://host.com -q '*' -o out.csv

Not default port?

$ es2csv -u my.cool.host.com:6666/es/ -q 'host: localhost' -o database.csv

With Authorization

$ es2csv -u http://login:[email protected]:6666/es/ -q 'host: localhost' -o database.csv

With explicit Authorization

$ es2csv -a login:password -u http://my.cool.host.com:6666/es/ -q 'host: localhost' -o database.csv

Specifying index

$ es2csv -i logstash-2015-07-07 -q 'host: localhost' -o database.csv

More indexes

$ es2csv -i logstash-2015-07-07 logstash-2015-08-08 -q 'host: localhost' -o database.csv

Or index mask

$ es2csv -i logstash-2015-* -q 'host: localhost' -o database.csv

And now together

$ es2csv -i logstash-2015-01-0* logstash-2015-01-10 -q 'host: localhost' -o database.csv

Collecting all data on all indices

$ es2csv -i _all -q '*' -o database.csv

Specifying document type

$ es2csv -D log -i _all -q '*' -o database.csv

Selecting some fields, what you are interesting in, if you don't need all of them (query run faster)

$ es2csv -f host status date -q 'host: localhost' -o database.csv

Or field mask

$ es2csv -f 'ho*' 'st*us' '*ate' -q 'host: localhost' -o database.csv

Selecting all fields, by default

$ es2csv -f _all -q 'host: localhost' -o database.csv

Selecting meta-fields: _id, _index, _score, _type

$ es2csv -e -f _all -q 'host: localhost' -o database.csv

Selecting nested fields

$ es2csv -f comments.comment comments.date comments.name -q '*' -i twitter -o database.csv

Max results count

$ es2csv -m 6283185 -q '*' -i twitter -o database.csv

Retrieve 2000 results in just 2 requests (two scrolls 1000 each):

$ es2csv -m 2000 -s 1000 -q '*' -i twitter -o database.csv

Changing column delimiter in CSV file, by default ','

$ es2csv -d ';' -q '*' -i twitter -o database.csv

Changing nested columns output format to Kibana style like

$ es2csv -k -q '*' -i twitter -o database.csv

An JSON document example

{
  "title": "Nest eggs",
  "body":  "Making your money work...",
  "tags":  [ "cash", "shares" ],
  "comments": [
    {
      "name":    "John Smith",
      "comment": "Great article",
      "age":     28,
      "stars":   4,
      "date":    "2014-09-01"
    },
    {
      "name":    "Alice White",
      "comment": "More like this please",
      "age":     31,
      "stars":   5,
      "date":    "2014-10-22"
    }
  ]
}

A CSV file in Kibana style format

body,comments.age,comments.comment,comments.date,comments.name,comments.stars,tags,title
Making your money work...,"28,31","Great article,More like this please","2014-09-01,2014-10-22","John Smith,Alice White","4,5","cash,shares",Nest eggs

A CSV file in default format

body,comments.0.age,comments.0.comment,comments.0.date,comments.0.name,comments.0.stars,comments.1.age,comments.1.comment,comments.1.date,comments.1.name,comments.1.stars,tags.0,tags.1,title
Making your money work...,28,Great article,2014-09-01,John Smith,4,31,More like this please,2014-10-22,Alice White,5,cash,shares,Nest eggs

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.gitignore		.gitignore
HISTORY.rst		HISTORY.rst
LICENSE		LICENSE
README.rst		README.rst
es2csv.py		es2csv.py
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

es2csv

A CLI tool for exporting data from Elasticsearch into a CSV file

Quick Look Demo

Installation

Usage

Examples

About

Releases

Packages

Languages

License

jpfourny/es2csv

Folders and files

Latest commit

History

Repository files navigation

es2csv

A CLI tool for exporting data from Elasticsearch into a CSV file

Quick Look Demo

Installation

Usage

Examples

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages