You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We can eaisly support multithreading here by having multiple threads for for the download method of the xblock_extractor objects. However, we do have videos from youtube_dl which need to be in a separate queue (as that's throttled). So, I think we need to handle that in a good way here as multithreading drastically improves performance of this very scraper. Maybe we can have a main multithreaded process (because it has many HTTP requests) and handle youtube separately.
The text was updated successfully, but these errors were encountered:
Agrees. Thanks for your experiments with multiprocessing.
This is very similar to other scrapers in that we have concurrent usages:
long cpu-intensive stuff we don't want to supervise (ffmpeg)
cpu-intensive stuff we want to supervise (images optimization)
unthrottled downloads
throttled downloads
unthrottled uploads
It's a lot of requirements that calls for flexibility. Also, we definitely want to assess our S3 performance before getting into this as we need to know where are the bottlenecks and which methods delivers best for those download/upload use cases.
This all renders this quite complex which is why I think we shall attempt to solve it on a less fragile scraper (youtube?) first and document/replicate onto others.
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.
We can eaisly support multithreading here by having multiple threads for for the download method of the xblock_extractor objects. However, we do have videos from youtube_dl which need to be in a separate queue (as that's throttled). So, I think we need to handle that in a good way here as multithreading drastically improves performance of this very scraper. Maybe we can have a main multithreaded process (because it has many HTTP requests) and handle youtube separately.
The text was updated successfully, but these errors were encountered: