-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Enable concurrent read from IPFS in replay #425
base: master
Are you sure you want to change the base?
Conversation
@ibnesayeed Why were the rates with threading so much lower? I would expect them to be higher on average. |
ipwb/replay.py
Outdated
fetchPayload.start() | ||
fetchHeader.join(IPFSTIMEOUT) | ||
fetchPayload.join(IPFSTIMEOUT - (time.time() - fetch_start)) | ||
header = message['header'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove superfluous space to comply with pycodestyle.
ipwb/replay.py
Outdated
if os.name != 'nt': # Bug #310 | ||
signal.alarm(0) | ||
message = {'header': None, 'payload': None} | ||
fetchHeader = Thread(target=load_from_ipfs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove superfluous space to comply with pycodestyle.
ipwb/replay.py
Outdated
# asynchronous nature of threads which is being utilized to call this function. | ||
def load_from_ipfs(digest, message, key): | ||
message[key] = IPFS_API.cat(digest) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra CRLF needed here to comply with pycodestyle.
ipwb/replay.py
Outdated
# The key here could either be 'header' or 'payload'. | ||
# Using the mutable 'message' dict instead of returning a value due to the | ||
# asynchronous nature of threads which is being utilized to call this function. | ||
def load_from_ipfs(digest, message, key): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have been using camelCaseFunctionNames whereas here you used an_underscore_delimited_function_name. Please make this consistent to match the style of the rest of the package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not consistent about camelCasing. I initially named that way, but found some functions defined with underscores, so changed it just before committing the code. Changing it back to camelCase now.
I am not sure about that yet. I did note that in my initial comment that it is counter-intuitive. It can be due to some threading overhead or our current thread implementation might not be how it should be. A more fine-grained profiling is needed to find out which step is taking how much time. |
Codecov Report
@@ Coverage Diff @@
## master #425 +/- ##
==========================================
+ Coverage 23.29% 23.38% +0.09%
==========================================
Files 6 6
Lines 1112 1116 +4
Branches 169 167 -2
==========================================
+ Hits 259 261 +2
- Misses 836 838 +2
Partials 17 17
Continue to review full report at Codecov.
|
@ibnesayeed Should we wait until we can show that this is a more efficient solution before merging or go ahead and do it? This PR should resolve #310 but I would like to check it on a Windows machine to verify that before merging anyway. |
This PR brings some functional changes, so we should sit on it a little longer and test it in different ways in different environments first. Also, it is important to profile the code to identify the cause of unexpected slowdown. If the slow down is due to thread overhead then we can find out how to make it more performant. If it is due to the fact that the IPFS server is running on the same machine, then it might perform better when a lookup is performed in the broader IPFS network. |
I have been trying to profile this and observed some really strange behaviors. We need to isolate certain things in a separate script and test them. Also, before merging we need to move some exception handling in threads, because the main thread would not be aware of those. |
Please document them here.
Which things? As we talked about in-person, the Py threading mechanism ought to be compared to the base but that is likely not the culprit. |
Created a separate benchmarking repository (IPFS API Concurrency Test) and reported the observation in the API repo (ipfs-shipyard/py-ipfs-http-client#131). |
maybe i'm missing a point, but have you considered to put a wsgi daemon in front of ipwb? |
No, currently we are serving directly from the built-in Flask server. However, I am not sure how a web server used has anything to do with how the content will be read from the IPFS server. |
not at all. from your benchmark i had the impression that the aim is to improve response time to web requests. a threading wsgi host process would facilitate that. if you are onto reading header and payload concurrently, trio is an option for Python 3.5+. (i know, but maybe that would motivate the port to Python 3.) |
Thanks for the input @funkyfuture. I was pushing for Py3 support from the day one, but @machawk1 felt there are many people still using systems that only support Py2. However, now we are well-motivated to completely drop the support for Py2 (as per #51). Once we are on Py3, we will explore asyncio for sure. |
Note that we are trying to optimize asynchronous requests to IPFS, not the Web. Thanks for your input, @funkyfuture, and as @ibnesayeed said, we will look into using that library as applicable when we can get the rest of the code ported to Py3. |
This is an initial implementation towards #379. Currently, multi-threading is only implemented in replay. However, initial benchmarking is counter-intuitive.
Without Threading
With Threading