Skip to content

Commit

Permalink
i #230 Refactored More Notebooks
Browse files Browse the repository at this point in the history
- Refactored most of the notebooks to replace the hard coded paths with the getter functions in R/config.R
- Updated a few of the configuration files (.yml extension) in conf/ fixing some syntax and indentation errors
- Added another getter function in R/config.r called get_github_issue_event_path
- Edited DESCRIPTION to incorporate myself as a contributor
- Edited NEWS.md to describe refactoring getter function feature.
  • Loading branch information
beydlern committed Oct 3, 2024
1 parent 3202887 commit fda90ae
Show file tree
Hide file tree
Showing 25 changed files with 205 additions and 200 deletions.
3 changes: 2 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,8 @@ Authors@R: c(
person('Nicole', 'Hoess', role = c('ctb')),
person('Anthony', 'Lau', role = c('ctb')),
person('Sean', 'Sunoo', role = c('ctb')),
person('Ian Jaymes', 'Iwata', role= c('ctb'))
person('Ian Jaymes', 'Iwata', role= c('ctb')),
person('Nicholas', 'Beydler', role = c('ctb'))
)
Maintainer: Carlos Paradis <[email protected]>
License: MPL-2.0 | file LICENSE
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ export(get_filter_commit_size)
export(get_git_branches)
export(get_git_repo_path)
export(get_github_commit_path)
export(get_github_issue_event_path)
export(get_github_issue_or_pr_comment_path)
export(get_github_issue_path)
export(get_github_issue_search_path)
Expand Down
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ __kaiaulu 0.0.0.9700 (in development)__

### NEW FEATURES

* `config.R` now contains a set of getter functions used to centralize the gathering of configuration data and these getter functions are used to refactor configuration file information gathering. For example, loading configuration file information with variable assignment is as follows `git_repo_path <- config_file[["version_control"]][["log"]]` but the refactored with a config.R getter function becomes `git_repo_path <- get_git_repo_path(config_file)`. [#230](https://github.com/sailuh/kaiaulu/issues/230)
* `refresh_jira_issues()` had been added. It is a wrapper function for the previous downloader and downloads only issues greater than the greatest key already downloaded.
* `download_jira_issues()`, `download_jira_issues_by_issue_key()`, and `download_jira_issues_by_date()` has been added. This allows for downloading of Jira issues without the use of JirAgileR [#275](https://github.com/sailuh/kaiaulu/issues/275) and specification of issue Id and created ranges. It also interacts with `parse_jira_latest_date` to implement a refresh capability.
* `make_jira_issue()` and `make_jira_issue_tracker()` no longer create fake issues following JirAgileR format, but instead the raw data obtained from JIRA API. This is compatible with the new parser function for JIRA. [#277](https://github.com/sailuh/kaiaulu/issues/277)
Expand Down
31 changes: 28 additions & 3 deletions R/config.R
Original file line number Diff line number Diff line change
Expand Up @@ -410,7 +410,7 @@ get_mbox_path <- function(config_file, project_key_index) {
#' @export
get_mbox_domain <- function(config_file, project_key_index) {

mbox_url <- config_file[["mailing_list"]][["mod_mbox"]][[project_key_index]][["archive_url"]]
mbox_url <- config_file[["mailing_list"]][["mod_mbox"]][[project_key_index]][["mailing_list"]]

if (is.null(mbox_url)) {
warning("Attribute does not exist in the configuration file.")
Expand All @@ -433,7 +433,7 @@ get_mbox_domain <- function(config_file, project_key_index) {
#' @export
get_mbox_mailing_list <- function(config_file, project_key_index) {

mailing_list <- config_file[["mailing_list"]][["mod_mbox"]][[project_key_index]][["mailing_list"]]
mailing_list <- config_file[["mailing_list"]][["mod_mbox"]][[project_key_index]][["mailing_list_type"]]

if (is.null(mailing_list)) {
warning("Attribute does not exist in the configuration file.")
Expand Down Expand Up @@ -631,6 +631,31 @@ get_github_pull_request_path <- function(config_file, project_key_index) {
return(pull_request_path)
}

#' Returns the local folder path for GitHub issue events for a specific project
#' key.
#'
#' @description This function returns the local folder path for GitHub issue
#' events for a specific project key, that is specified in the input
#' parameter `config_file`. The input, `config_file` must be a parsed
#' configuration file. The function will inform the user if the local folder
#' path for the issue events exists in the parsed configuration file,
#' `config_file`.
#'
#' @param config_file The parsed configuration file.
#' @param project_key_index The name of the index of the project key (e.g. "project_key_1" or "project_key_2").
#' @return The local folder path for GitHub issue events for project specified by key `project_key_index`.
#' @export
get_github_issue_event_path <- function(config_file, project_key_index) {

issue_event_path <- config_file[["issue_tracker"]][["github"]][[project_key_index]][["issue_event"]]

if (is.null(issue_event_path)) {
warning("Attribute does not exist in the configuration file.")
}

return(issue_event_path)
}

#' Returns the local folder path for GitHub commits for a specific project key.
#'
#' @description This function returns the local folder path for GitHub commits
Expand Down Expand Up @@ -783,7 +808,7 @@ get_jira_issues_comments_path <- function(config_file, project_key_index) {
#' @export
get_bugzilla_project_key <- function(config_file) {

bugzilla_key <- config_file[["issue_tracker"]][["bugzilla"]][["project_key_index"]][["project_key"]]
bugzilla_key <- config_file[["issue_tracker"]][["bugzilla"]][["project_key"]]

if (is.null(bugzilla_key)) {
warning("Attribute does not exist in the configuration file.")
Expand Down
15 changes: 8 additions & 7 deletions conf/camel.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ version_control:
- camel-1.0.0

mailing_list:
mod_mbox:
mod_mbox:
project_key_1:
mailing_list: http://mail-archives.apache.org/mod_mbox/camel-dev
mbox: ../../rawdata/camel/mod_mbox/camel-dev/
Expand All @@ -72,12 +72,13 @@ mailing_list:

issue_tracker:
jira:
# Obtained from the project's JIRA URL
# domain: https://issues.apache.org/jira
project_key: CAMEL
# Download using `download_jira_data.Rmd`
issues: ../../rawdata/camel/jira/issues/camel/
issue_comments: ../../rawdata/camel/jira/issue_comments/camel/
project_key_1:
# Obtained from the project's JIRA URL
domain: https://issues.apache.org/jira
project_key: CAMEL
# Download using `download_jira_data.Rmd`
issues: ../../rawdata/camel/jira/issues/camel/
issue_comments: ../../rawdata/camel/jira/issue_comments/camel/
# github:
# project_key_1:
# # Obtained from the project's GitHub URL
Expand Down
2 changes: 1 addition & 1 deletion conf/helix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ mailing_list:
mailing_list_type: helix-dev
archive_type: apache
project_key_2:
mailing_list_type: http://mail-archives.apache.org/mod_mbox/helix-user
mailing_list: http://mail-archives.apache.org/mod_mbox/helix-user
mbox: ../../rawdata/helix/mod_mbox/helix-user/
mailing_list_type: helix-user
archive_type: apache
Expand Down
9 changes: 5 additions & 4 deletions conf/kaiaulu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ version_control:
- master

# mailing_list:
# mod_mbox:
# mod_mbox:
# project_key_1:
# mailing_list: http://mail-archives.apache.org/mod_mbox/geronimo-dev
# mbox: ./../../../tse_motif_2021/dataset/mbox/apex-dev.mbox
Expand Down Expand Up @@ -84,7 +84,8 @@ issue_tracker:
# Download using `download_github_comments.Rmd`
issue_or_pr_comment: ../../rawdata/kaiaulu/github/issue_or_pr_comment/sailuh_kaiaulu/
issue: ../../rawdata/kaiaulu/github/issue/sailuh_kaiaulu/
issue_search: ../..rawdata/kaiaulu/github/issue_search/sailuh_kaiaulu/
issue_search: ../../rawdata/kaiaulu/github/issue_search/sailuh_kaiaulu/
issue_event: ../../rawdata/kaiaulu/github/issue_event/sailuh_kaiaulu/
pull_request: ../../kaiaulu/github/pull_request/sailuh_kaiaulu/
commit: ../../rawdata/kaiaulu/github/commit/sailuh_kaiaulu/
# project_key_2:
Expand All @@ -101,10 +102,10 @@ issue_tracker:
# project_key: kaiaulu


#vulnerabilities:
vulnerabilities:
# Folder path with nvd cve feeds (e.g. nvdcve-1.1-2018.json)
# Download at: https://nvd.nist.gov/vuln/data-feeds
#nvd_feed: rawdata/nvdfeed
nvd_feed: rawdata/nvdfeed

# Commit message CVE or Issue Regular Expression (regex)
# See project's commit message for examples to create the regex
Expand Down
24 changes: 12 additions & 12 deletions conf/openssl.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,18 +45,18 @@ version_control:
- master

mailing_list:
# mod_mbox:
# project_key_1:
# mailing_list: http://mail-archives.apache.org/mod_mbox/geronimo-dev
# mbox: ./../../../tse_motif_2021/dataset/mbox/apex-dev.mbox
# mailing_list_type: geronimo-dev
# archive_type: apache
# project_key_2:
# mailing_list: http://mail-archives.apache.org/mod_mbox/geronimo-user
# mbox: ../../rawdata/geronimo/mod_mbox/geronimo-user/
# mailing_list_type: geronimo-user
# archive_type: apache
pipermail:
mod_mbox:
project_key_1:
mailing_list: http://mail-archives.apache.org/mod_mbox/geronimo-dev
mbox: ./../../../tse_motif_2021/dataset/mbox/apex-dev.mbox
mailing_list_type: geronimo-dev
archive_type: apache
project_key_2:
mailing_list: http://mail-archives.apache.org/mod_mbox/geronimo-user
mbox: ../../rawdata/geronimo/mod_mbox/geronimo-user/
mailing_list_type: geronimo-user
archive_type: apache
pipermail:
project_key_1:
mailing_list: https://mta.openssl.org/pipermail/openssl-dev/
mbox: ../../rawdata/openssl/pipermail/openssl-dev/
Expand Down
25 changes: 25 additions & 0 deletions man/get_github_issue_event_path.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion vignettes/causal_flaws.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ keep_dependencies_type <- get_keep_dependencies_type(conf)
# Mailing List
# Specify project_key_index in get_mbox_path() (e.g. "project_key_1")
mbox_path <- get_mbox_path(conf, "mail_key_1")
mbox_path <- get_mbox_path(conf, "project_key_1")
# DV8 parameters
project_path <- get_dv8_folder_path(conf)
Expand Down
36 changes: 8 additions & 28 deletions vignettes/download_github_comments.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,13 @@ Therefore, in this Notebook we have to rely on three endpoints from the GitHub A
To use the pipeline, you must specify the organization and project of interest, and your token. Obtain a github token following the instructions [here](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token).

```{r}
conf <- yaml::read_yaml("../conf/kaiaulu.yml")
save_path <- path.expand(conf[["issue_tracker"]][["github"]][["replies"]]) # Path you wish to save all raw data. A folder with the repo name and sub-folders will be created.
owner <- conf[["issue_tracker"]][["github"]][["owner"]] # Has to match github organization (e.g. github.com/sailuh)
repo <- conf[["issue_tracker"]][["github"]][["repo"]] # Has to match github repository (e.g. github.com/sailuh/perceive)
conf <- get_parsed("conf/kaiaulu.yml")
save_path_issue_or_pr_comments <- path.expand(get_github_issue_or_pr_comment_path(conf, "project_key_1"))
save_path_issue <- get_github_issue_path(conf, "project_key_1")
save_path_pull_request <- get_github_pull_request_path(conf, "project_key_1")
save_path_commit <- get_github_commit_path(conf, "project_key_1")
owner <- get_github_owner(conf, "project_key_1") # Has to match github organization (e.g. github.com/sailuh)
repo <- get_github_repo(conf, "project_key_1") # Has to match github repository (e.g. github.com/sailuh/perceive)
# your file github_token (a text file) contains the GitHub token API
token <- scan("~/.ssh/github_token",what="character",quiet=TRUE)
```
Expand All @@ -56,19 +59,10 @@ token <- scan("~/.ssh/github_token",what="character",quiet=TRUE)

In this section we obtain the raw data (.json) containing all information the GitHub API endpoint provides. We parse the information of interest in the subsequent section.

```{r eval = FALSE}
dir.create(paste0(save_path))
```

## Issues

First we will obtain all the issues (i.e. "first comments").

```{r}
save_path_issue <- paste0(save_path,"/issue/")
```


```{r Collect all issues, eval = FALSE}
gh_response <- github_api_project_issue(owner,repo,token)
dir.create(save_path_issue)
Expand All @@ -81,11 +75,6 @@ github_api_iterate_pages(token,gh_response,

Next we obtain the "first comment" of every pull request.

```{r}
save_path_pull_request <- paste0(save_path,"/pull_request/")
```


```{r Collect all pull requests, eval = FALSE}
gh_response <- github_api_project_pull_request(owner,repo,token)
dir.create(save_path_pull_request)
Expand All @@ -98,11 +87,6 @@ github_api_iterate_pages(token,gh_response,

Finally we obtain the comments of both issue and pull requests (which does not contain the data obtained in the prior two endpoints).

```{r}
save_path_issue_or_pr_comments <- paste0(save_path,"/issue_or_pr_comment/")
```


```{r Collect all issue and pull request comments, eval = FALSE}
gh_response <- github_api_project_issue_or_pr_comments(owner,repo,token)
dir.create(save_path_issue_or_pr_comments)
Expand All @@ -117,10 +101,6 @@ The three endpoints used above do not contain author and e-mail information, onl

To do so, we can use the committer endpoint.

```{r}
save_path_commit <- paste0(save_path,"/commit/")
```

```{r Collect all authors and committers name and e-mail, eval = FALSE}
gh_response <- github_api_project_commits(owner,repo,token)
dir.create(save_path_commit)
Expand Down Expand Up @@ -190,7 +170,7 @@ Note because we obtain the authors and committers name and e-mail, **only commen
Below we show the result of such merge, including the name and e-mail fields obtained from the commit table. As before, we do not display the body column to prevent breaking the HTML format.

```{r}
replies <- parse_github_replies(save_path)
replies <- parse_github_replies(save_path_issue_or_pr_comments)
tail(replies,2) %>%
gt(auto_align = FALSE)
Expand Down
6 changes: 3 additions & 3 deletions vignettes/download_mod_mbox.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -35,13 +35,13 @@ As usual, the first step is to load the project configuration file.
```{r}
conf <- get_parsed("conf/helix.yml")
#Specify project_key_index in get_mbox_path() (e.g. "project_key_1")
save_path_mbox <- get_mbox_path(conf, "mail_key_1")
save_path_mbox <- get_mbox_path(conf, "project_key_1")
#Specify project_key_index in get_mbox_domain() (e.g. "project_key_1")
mod_mbox_url <- get_mbox_domain(conf, "mail_key_1")
mod_mbox_url <- get_mbox_domain(conf, "project_key_1")
#Specify project_key_index in get_mbox_mailing_list() (e.g. "project_key_1")
mailing_list <- get_mbox_mailing_list(conf, "mail_key_1")
mailing_list <- get_mbox_mailing_list(conf, "project_key_1")
start_year <- 2017
end_year <- 2018
Expand Down
27 changes: 6 additions & 21 deletions vignettes/github_api_showcase.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -45,10 +45,12 @@ The goal of the following steps is to obtain the data when a project started ass
To use the pipeline, you must specify the organization and project of interest, and your token.

```{r}
conf <- yaml::read_yaml("../conf/kaiaulu.yml")
owner <- conf[["issue_tracker"]][["github"]][["owner"]] # Has to match github organization (e.g. github.com/sailuh)
repo <- conf[["issue_tracker"]][["github"]][["repo"]] # Has to match github repository (e.g. github.com/sailuh/perceive)
save_path <- path.expand(conf[["issue_tracker"]][["github"]][["replies"]]) # Path you wish to save all raw data. A folder with the repo name and sub-folders will be created.
conf <- get_parsed("conf/kaiaulu.yml")
owner <- get_github_owner(conf, "project_key_1") # Has to match github organization (e.g. github.com/sailuh)
repo <- get_github_repo(conf, "project_key_1") # Has to match github repository (e.g. github.com/sailuh/perceive)
save_path_issue_or_pr_comments <- path.expand(get_github_issue_or_pr_comment_path(conf, "project_key_1"))
save_path_issue_event <- get_github_issue_event_path(conf, "project_key_1")
save_path_commit <- get_github_commit_path(conf, "project_key_1")
# your file github_token contains the GitHub token API obtained in the steps above
token <- scan("~/.ssh/github_token",what="character",quiet=TRUE)
```
Expand All @@ -57,38 +59,21 @@ token <- scan("~/.ssh/github_token",what="character",quiet=TRUE)

In this section we obtain the raw data (.json) containing all information the GitHub API endpoint provides. We parse the information of interest in the subsequent section.

```{r eval = FALSE}
dir.create(paste0(save_path))
```


## Issue Events

First we obtain all issue events of the project, so we may later subset issue assignments.

```{r}
save_path_issue_event <- paste0(save_path,"/issue_event/")
```


```{r Collect all issue events, eval = FALSE}
gh_response <- github_api_project_issue_events(owner,repo,token)
dir.create(save_path_issue_event)
github_api_iterate_pages(token,gh_response,save_path_issue_event,prefix="issue_event")
```

## Commits

Next we download commit data from GitHub API. This will be used to know which users in the issue events have or not merge permissions.

```{r}
save_path_commit <- paste0(save_path,"/commit/")
```


```{r Collect all project commit messages, eval = FALSE}
gh_response <- github_api_project_commits(owner,repo,token)
dir.create(save_path_commit)
github_api_iterate_pages(token,gh_response,save_path_commit,prefix="commit")
```

Expand Down
Loading

0 comments on commit fda90ae

Please sign in to comment.