Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ctakes containers are still too big #4

Open
tmills opened this issue Jun 28, 2017 · 5 comments
Open

ctakes containers are still too big #4

tmills opened this issue Jun 28, 2017 · 5 comments

Comments

@tmills
Copy link
Owner

tmills commented Jun 28, 2017

@MatthewVita asked in his last pull request:

One question for you (I can do this in a separate PR): should we just commit in the cTAKES zip artefacts (can COPY them in via Dockerfile)? The download site takes forever to pull them down. I realize this may not be a best practice, but...

I would like to do something about this, but not crazy about adding even more jars (I'd like to remove the jars currently checked in at some point). It might be possible to just pick the individual jars we need with wget from apache servers? Still (maybe) slow servers but avoiding the dependency parser alone would cut 250Mb from the download size.

@tmills
Copy link
Owner Author

tmills commented Jun 28, 2017

Or mavenize everything and let maven figure out which jars to grab? IDK if it's standard to include maven in containers, it's certainly has a heavy enough footprint on its own.

@MatthewVita
Copy link
Contributor

I've used Maven in a containerized setting. Sounds like a great idea because Maven central servers are fast. However, it may not help at all with the container size problem. Hmm.

@tmills
Copy link
Owner Author

tmills commented Jul 5, 2017

Looked into maven a bit, it can help us with the jars but probably not with the uima and ctakes downloads. Since it downloads the entire internet to compile one java class, I doubt it's faster or smaller than the way it's set up now.

@MatthewVita
Copy link
Contributor

Agreed

@MatthewVita
Copy link
Contributor

Taking a step back, I don't think the container size is actually the issue here. It's more the download times. For instance, the Apache servers that we download from for the pipeline image take forever. Perhaps we can just "pull the pain forward" and commit the files into the repo and COPY them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants