- Status: Completed
- Type: Specific
- Work Package: WP3
- Coordinators: Iris Hendrickx (Radboud University)
- Participating Institutes: Radboud University, ICT-Research Platform Netherlands , NWO
- End-users: Participants in the workshop
- Developers: Iris Hendrickx (Radboud), Maarten van Gompel (Radboud)
- Interest Groups: DevOps, Text
- Task IDs: T098 (LaMachine)
Citation from M. van Gompel & I. Hendrickx (2018):
The ICT-Research Platform Netherlands and NWO organise a yearly one-week workshop ‘ICT with Industry’ to stimulate collaboration between industry and academia. The industrial partner provides a problem and a team of researchers from different backgrounds and universities collaborate to come up with solutions. We participated in the 2019 edition on the case study by the Dutch Royal Library who wanted to investigate automatic methods for cataloguing of textual cultural heritage objects, in this particular case a large collection of digital dissertations.
None, LaMachine was taken as an out-of-the-box solution.
LaMachine was used as the solution to bring the tools to the data. Better integration between host and VM, with regards to shared data space, was implemented.
Various "common scientific data-related packages" as made available through LaMachine.
Citation from M. van Gompel & I. Hendrickx (2018):
LaMachine offered a convenient platform for a range of different explorations and experiments in the area of NLP and text mining. However, for some situations LaMachine, or rather Linux in general, was not a good fit for the audience of the workshop: for team members who did not have experience with a non-Windows environment, LaMachine was not a suitable or useful tool. The limit of LaMachine was also reached for members who wanted to use desktop text editors with a graphical user interface as this is not offered by LaMachine. Moreover, we did not manage to get X-forwarding working in the Ubuntu Linux VM and after a few attempts the team gave up on resolving this issue due to time pressure. This, also demonstrates that fine-tuning the configuration of certain aspects of LaMachine, but especially beyond LaMachine, is beyond the reach of a data scientist without system administration skills. This certainly also applies also to the installation as a whole in the SURFsara context, which involved things like the partitioning, formatting and mounting of (virtual) drives and setting up user accounts on the shared VM, all of which require some system administration skills and are too context-specific to be within the scope of LaMachine. LaMachine was convenient and speeded up writing code as the most common scientific data-related packages are already present in LaMachine
References to related resources and publications and especially links to related use-cases:
- M. van Gompel & I. Hendrickx (2018). LaMachine: A meta-distribution for NLP software. CLARIN Annual Conference 2018.