Reading and writing pandas DataFrames in ElasticSearch
This package should work on both python 2(>=2.7) and 3(>=3.4) but has primarily been tested on python 2.7. ElasticSearch is of course required and should be version 2.x.
The package is hosted on PyPi and can be installed with pip:
pip install espandas
Alternatively, the development version from Github can be installed:
pip install git+git://github.com/dashaub/espandas.git
Unit tests can be run with pytest or nosetests. Code coverage can be established with pytest-cov from PyPi:
py.test --cov=espandas
This example assumes ElasticSearch is running on localhost on the
standard port. If different connection infromation needs to be
specified, it can be passed to the Espandas()
constructor as keyward
arguments. The DataFrame to insert *must* have a column that will be
used for the unique identifier _id
in ElasticSearch: the default
value is uid_name = 'indexId'
.
import pandas as pd import numpy as np from espandas import Espandas # Example data frame df = (100 * pd.DataFrame(np.round(np.random.rand(100, 5), 2))).astype(int) df.columns = ['A', 'B', 'C', 'D', 'E'] df['indexId'] = (df.index + 100).astype(str) # Create a client and write the DataFrame. If necessary, connection # information to the ES cluster can be passed in the espandas constructor # as keyword arguments. INDEX = 'foo_index' TYPE = 'bar_type' esp = Espandas() esp.es_write(df, INDEX, TYPE) # Query for the first ten rows and see that they match the original k = df.indexId[0:10] res = esp.es_read(k, INDEX, TYPE) res == df.iloc[0:10].astype('str')
- 2017 David Shaub
This package is free software released under the GPL-3 license.