-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Abstract Config file parsing in R/config.R #230
Comments
1. PurposeThe purpose of this issue is to create get functions for configuration files. The motivation for this is that in each notebook in the vignettes folder, it loads in the config files by using a variable assignment (i.e. 2. ProcessThis will be done by creating get methods for all of the config files that will be used in the notebooks. Then, I replace the direct variable assignment with the get methods in R/config.R. 3. Task List
Updated API NomenclatureAll get functions now take in a parameter called Ex: |
@anthonyjlau Looks great! |
@anthonyjlau One more thing: Make sure you post here the function API nomenclature (i.e. update the first issue following the template). It would be a lot of time waste on your end if you went with a function nomenclature I would like to modify, as you would need to modify every notebook to fix that thereafter. |
Added get functions for mbox, jira, and github.
Functions that require something from a list (ie. conf[["version_control"]][["branch"]][4]) now have an index parameter. (ie. conf[["version_control"]][["branch"]][branch_index]) Also finished the rest of the get methods for all notebooks.
I normalized the comments by having it follow a format. I added if wrappers to catch null values.
I changed the documentation of the JIRA functions to return a list of project keys and storage paths (HADOOP case). I also changed the documentation of the mbox function to return a list of storage paths.
@Ssunoo2 @ian-lastname Ensure that the configs that are used on this project are in the final version and work with config.R. This includes the projects for the mailing lists (@ian-lastname), the ones used for Jira and Github (@Ssunoo2) and the ones for Bugzilla (@anthonyjlau). |
1. PurposeCurrently, the Notebooks (.Rmd files in the /vignettes folder) have hard coded file paths connected to the .yml files in the /conf folder. 2. ProcessThe process will be to follow the previous solution, creating get() functions within the R/config.R file and then replace the direct variable assignments within the Notebooks with the R/config.R get() functions. I (@crepesAlot) will be primarily focusing on documentation work, which includes updating the notebooks and ensuring that the formatting for the conf/ files match that of the ones from issue #286. 3. Task List
conf/ specificationsAn example of a line for the issue tracker for github would be: The projectName being listed before github would allow the user to save, find, or send all of the data of the project easier, rather than having to search in each data source for files related to the project. The concern with Anomaly Case 1 was that the refresher function may break if it assumes that all files within a folder belong together (ex: if it were to assume that all issues in the issue folder belong to the same issue_id) Both Anomaly Case 1 & 2 would be addressed by adding an extra depth to the file hierarchy to specify each project key. For the conf/kaiaulu.yml file, this would look like:
The unused portions are commented out while still maintaining the formatting. |
1. PurposeThe Notebook files (.Rmd files in the /vignettes folder) have hard-coded file paths connected to the config files in the /conf folder. For example, notebooks load in config files using variable assignment, such as 2. ProcessThe process will be to follow the previous solution, creating get() functions within the R/config.R file and then replace the direct variable assignments within the Notebooks with the R/config.R get() functions. I will primarily focus on implementing the get() functions in R/config.R and updating the notebooks (.Rmd files) in vignettes/ with the get() functions from R/config.R. 3. Task ListStep 1
Step 2
API NomenclatureThe API nomenclature is as follows: most get() functions will take in a parameter Ex: Step 3
|
I now realize the other half of the issue lies on #286, which is why this task requires two of you to work on it, sorry for the confusion. You need to read through that to get a better idea of why this task requires two of you. The purpose statement in #230 (comment) is a bit fuzzy. For example:
What is the problem that occurs by hardcoding paths instead of using get() from a config.R?
I am not sure what you mean by config.R automatically updating the Notebooks. How would this occur? Consider the purpose for Anthony's task to refine your specification on purpose. As for the get methods you need, which Anthony implemented a couple, To put in a nutshell: The other issue mentions that the way Kaiaulu specify project configuration files should be extended, to account for more edge cases. To extend the specification of configs, that means all configs on conf/ should be updated. This is purely a documentation task, no code involved. What you will notice, however, is that the moment you update these config files in conf/ another problem cascades: Because the paths in every notebook are hardcoded to specific paths in the config files, now every notebook has to be updated. If we ever need to extend the specification of the config again (which will happen to include other tools parameters), then once more, every notebook has to be updated. This is not a good organization. To address this "config specification change cascades into every notebook being updated", we want to instead call the get() functions of conf.R at the start of the notebooks, which is the goal of this issue, instead of hardcode the content of the config files. This means if all conf/ files have to be updated again, then only the implementation of the get() functions in conf.R have to be updated for the portions of the config files that are changed (so a subset of the functions), the notebooks do not need to do so, since the get functions implementation change, but the function definition and return are the same. The lesson here is Information Hiding. See: https://en.wikipedia.org/wiki/Information_hiding With that understanding, I would like you both to have a task specification edited above, with what each of you will be working on. Your purpose statement is the same, but you should make it more clear who wil work on conf.R and who will update the notebooks (or you may split this in half and half, it is your call). Note @daomcgill and @RavenMarQ will need to understand your work, as they will have a notebook for the mail downloader, and one for scitools understand that depend on the new config format and your get functions. p.s.: I would be mindful of pinging others on this issue who are not working on the task, as that may trigger e-mail notifications for them! |
@carlosparadis May you review the updated specifications? |
@beydlern sorry, edits do not send notifications, did you just update or did I miss this for a few days? |
I edited my specifications today, would you like me to revert the edit and post it as new one? |
No, it's alright. We can actually see the edits if we click the "edited" word in the message. Keep them in one comment. I will get back to you on it then, wanted to make sure I did not overlook! |
Unfortunately I attempted that, but it seems there was a limit to how low I'm allowed to expand the google images, and am unable to add anything below the tools section. I was hesitant to start shrinking everything down to make it fit, but if you'd prefer that, I can do so. |
@carlosparadis The GitHub PR #289 notebooks were updated to use the getter functions in config.R. These notebooks are currently using the new specification. Currently waiting for @crepesAlot to finish the config specification for Bugzilla so I may ensure that the Bugzilla notebook in the Bugzilla PR #293 follows the correct new specification. |
@beydlern thank you for the update. Somewhere in the issue sea of comments I recall answering a question about the difference of issue search and just |
- config.R functions are organized to follow the same formatting as the conf files - All conf files issue_trackers have been updated.
- get_mbox_mailing_list() and get_mbox_archive_type() were removed as the keys that they are grabbing are scheduled for removal.
- added two new bugzilla getter functions get_bugzilla_issue_comment_path and get_bugzilla_issue_path
- Implemented two pipermail mailing list getters get_pipermail_domain and get_pipermail_path
- Implemented two more getter functions: get_mbox_input_file() and get_pipermail_input_file()
- Refactored the download_mail.Rmd notebook to expect the use of the getters from R/config.R (i #230 contains the getter functions in R/config.R).
- Added Scitools understand section to tools - Updated mailing_list
- Implemented 4 Getter functions: get_understand_code_language(), get_understand_keep_dependencies_type(), get_understand_project_path(), and get_understand_output_path(). - Changed get_code_language() and get_keep_dependencies_type() to get_depends_code_language() and get_depends_keep_dependencies_type(), respectively as well as updated the appropriate notebooks that used these getters.
- Refactored the understand_showcase.Rmd notebook to expect the use of the getters from R/config.R (i #230 contains the getter functions in R/config.R).
- Updated NEWS.md - Minor change to mailing_list
@crepesAlot the updated #230 (comment) is not here yet, did you create it? |
@carlosparadis Here is a screenshot of the powerpoint version of the folder organization. The google drawings version has also been updated. |
- The project configuration sections of each notebook was incorrectly using the project directory (kaiaulu/) as their working directory rather than the directory that they reside in (/vignettes/) as their working directory. - Uncommented the output_filepath line for pattern4 tool in junit5.yml configuration file. - Uncommented the mod_mbox mailing list section for mailing_list in openssl.yml configuration file.
Originally, it was assumed that .git could reside folder-less to be accessed. While this is true, it adds inconvenience to the user since when data is obtained with .git, it is gererally through git clone, which also adds the source code. The consequential effect is that the path will be project/git_repo/project/.git rather than project/git_repo/.git. Not all configs were updated to adjust for this, only the ones used in the notebooks. Signed-off-by: Carlos Paradis <[email protected]>
This commits perform a major refactoring of how Kaiaulu interface with config files, and the suggested folder organization to store rawdata and analysis. The configuration files are generalized to account for anomaly cases when performing project analysis. For instance, long-lived projects may contain multiple repositories, issue trackers, mailing list, etc. The new template of the configuration file allows to account for this information. Moreover, changes to the config template cascaded in changes to all notebooks, as the access to the config was hardcoded to the file organization. A new set of get_ functions should make this the last commit that change in template cascades into notebooks. All actively maintained notebooks (not prefixed by underline under vignettes/) have been updated to use the get functions. Future changes, therefore, will only affect the get() functions in R/config.R. The folder organization of the filepaths has also been modified. Previously, filepaths assumed as default in the versioned config files suggested organizing code as rawdata/git_repo/projectX ; rawdata/jira/projectY. This organization was not practical for sharing data manually, as the user would need to zip several folders individually. The new organization is now rawdata/projectX/git_repo ; rawdata/projectX/jira. This means users only need to zip projectX and that will contain all the data wanted to be shared. A minor typo on graph.R was also fixed for merge function calls from `sorted=` to `sort=`.
This commits perform a major refactoring of how Kaiaulu interface with config files, and the suggested folder organization to store rawdata and analysis. The configuration files are generalized to account for anomaly cases when performing project analysis. For instance, long-lived projects may contain multiple repositories, issue trackers, mailing list, etc. The new template of the configuration file allows to account for this information. Moreover, changes to the config template cascaded in changes to all notebooks, as the access to the config was hardcoded to the file organization. A new set of get_ functions should make this the last commit that change in template cascades into notebooks. All actively maintained notebooks (not prefixed by underline under vignettes/) have been updated to use the get functions. Future changes, therefore, will only affect the get() functions in R/config.R. The folder organization of the filepaths has also been modified. Previously, filepaths assumed as default in the versioned config files suggested organizing code as rawdata/git_repo/projectX ; rawdata/jira/projectY. This organization was not practical for sharing data manually, as the user would need to zip several folders individually. The new organization is now rawdata/projectX/git_repo ; rawdata/projectX/jira. This means users only need to zip projectX and that will contain all the data wanted to be shared. A minor typo on graph.R was also fixed for merge function calls from `sorted=` to `sort=`. Signed-off-by: Carlos Paradis <[email protected]> Co-authored-by: Carlos Paradis <[email protected]> Co-authored-by: Anthony Lau <[email protected]> Co-authored-by: Nicholas Beydler <[email protected]> Co-authored-by: Mark Burgess <[email protected]>
@crepesAlot I had to make a few modifications on the actual config. Could you take a look on Kaiaulu and Helix config on my last two commits to see the differences? Here's two examples. It has mostly to do with the inner folders. Also the current use of /.git towards the end without it being in a folder with git log. While editing I realized that would not work well, because when a user downloads .git data, they generally use git clone. That means the source code comes along with the .git, which in turn means the folder of the source code always sit above one level of .git (I do not have a screenshot for this here, if not clear you can ask me on call). In addition to that you can see I reverted the deepest folder: We may need to do some edits on the diagram to reflect this. Let me know what you think. @beydlern pinging you too on this since you worked on this on M1 to make sure we are on the same page. This may also affect @daomcgill PR. Will take a look on it next. This may need another pass on the configs to fix the consistency (I only had time to fix the ones Kaiaulu needs on the notebooks that run). |
- Moved .git file into a subfolder within the git_repo folder, named after the project. - Made changes for class_folder_path in junit5.yml consistent with all configs. - Made changes for issue_tracker folder paths to be consistent across all configs.
@carlosparadis I pushed a commit where I matched the changes across all configs to make them consistent.
The part I am particularly confused on is the: |
Currently, modifying the /conf file schema requires checking every Notebook in Kaiaulu to make sure it still parses the project's yaml file correctly. The logic behind understanding the config file should be abstracted in a set of get methods, to de-couple all the Notebooks from it.
The coupling doesn't occur if it is adding fields, but any substantial re-organization will cause a mass co-change cascade. This should also help document the config file as R functions, and in the future write some utilities to analyze config files (e.g. a paper's project demographics could go here, since it would borrow info from the config to compute the latex table).
The text was updated successfully, but these errors were encountered: