Reading and writing pandas DataFrames in ElasticSearch
This package should work on both python 2(>=2.7) and 3(>=3.4) but has primarily been tested on python 2.7. ElasticSearch is of course required and should be version 2.x.
The package is hosted on PyPi and can be installed with pip:
pip install espandas
Alternatively, the development version from Github can be installed:
pip install git+git://github.com/dashaub/espandas.git
Unit tests can be run with pytest or nosetests. Code coverage can be established with pytest-cov from PyPi:
py.test --cov=espandas
This example assumes ElasticSearch is running on localhost on the standard port. If different connection infromation needs to be specified, it can be passed to the Espandas()
constructor as keyward arguments. The DataFrame to insert must have a column that will be used for the unique identifier _id
in ElasticSearch: the default value is uid_name = 'indexId'
.
import pandas as pd
import numpy as np
from espandas import Espandas
# Example data frame
df = (100 * pd.DataFrame(np.round(np.random.rand(100, 5), 2))).astype(int)
df.columns = ['A', 'B', 'C', 'D', 'E']
df['indexId'] = (df.index + 100).astype(str)
# Create a client and write the DataFrame. If necessary, connection
# information to the ES cluster can be passed in the espandas constructor
# as keyword arguments.
INDEX = 'foo_index'
TYPE = 'bar_type'
esp = Espandas()
esp.es_write(df, INDEX, TYPE)
# Query for the first ten rows and see that they match the original
k = df.indexId[0:10]
res = esp.es_read(k, INDEX, TYPE)
res == df.iloc[0:10].astype('str')
(c) 2017 David Shaub
This package is free software released under the GPL-3 license.