-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A PySpark API for Gaffer #2
Comments
gh-595-pyspark-api Add to python-api README Include data auths in the python user Change PythonSerialiserConfig to look for the nested json object "serialisers" within the python config json
gh-595-pyspark-api Add to python-api README Include data auths in the python user Change PythonSerialiserConfig to look for the nested json object "serialisers" within the python config json
@m316257 @GCHQ-83497 Hello, please will you tell me the status of this issue? FYI, we are considering the alternative "fishbowl" shell as our way forward; and would be interested in whether anything you have here is complete enough / compatible to lift & reuse. |
@m316257 correct me if I am wrong - been a long old time since I have worked on this - @n3101 idea was to be able interact directly with gaffer across a network, so currently in this can run most if not all queries from python and get the those results back, had jaffer which was a java version of this. This was the same for adding in PySpark, so believe that runs in a sort of remote mode as well (sorry its been nearly 2 years!). Last time I worked on this had added in some features so that you could hook into Authentication and Policy type stuff - cannot for the life of me remember if that works or not. I also think there was the first draft attempt at containerising Gaffer in this as well |
Gaffer has a Spark library with Scala and Java APIs for accessing data using Spark; generating RDDs and Spark DataFrames from Gaffer graphs.
Gaffer also has a python shell with implementations of standard Gaffer operations that can be executed on the graph using Gaffer's rest service.
Extending the python API to support spark operations - producing RDDs and DataFrames - would open Gaffer up to a lot of useful python and spark data science and machine learning libraries
The text was updated successfully, but these errors were encountered: