Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making Parsed Source Code Data Available Externally #314

Open
4 of 6 tasks
daomcgill opened this issue Oct 10, 2024 · 6 comments
Open
4 of 6 tasks

Making Parsed Source Code Data Available Externally #314

daomcgill opened this issue Oct 10, 2024 · 6 comments
Assignees

Comments

@daomcgill
Copy link
Collaborator

daomcgill commented Oct 10, 2024


Purpose

This issue is an extension of issue #313. The purpose here is to create configurable /exec scripts that make data tables available externally. The new scripts will add usability to the syntax extraction process by providing a usable way to perform source code annotations and XML querying.

Process

  1. Create script for annotating source code using srcML.
  2. Create script for querying the annotated data. This will accept a predefined query or a user-defined XPath query.
  3. Documentation

New Scripts

Scripts for running the syntax extractor using existing functions in R/src.R. The functionality for this is split into two parts:

  1. exec/annotate.R: Takes in a source code folder and uses srcML to generate an annotated XML file.
  2. exec/query.R: Accept predefined XPath queries to extract syntactic elements from the XML files. Allows custom XPath queries to be specified by the user. Outputs the query results.

Task List

  • Prerequisite: completion of issue Expanding the Syntax Extractor #313
  • Add annotate.R
  • Add query.R
  • Implement multiple types for queries
  • Add option for custom XPath queries
  • Documentation: explain how to use exec scripts, configuration and parameters

@daomcgill
Copy link
Collaborator Author

@carlosparadis part II

@carlosparadis
Copy link
Member

@daomcgill For this one I would consider making two execs, one that annotates, and the other that can query the file. Annotating can take a long time depending on the size of the project, hence the split.

Otherwise, I think this is good! We can take another pass once #313 is done.

Thanks!

daomcgill added a commit that referenced this issue Nov 5, 2024
- WIP

Signed off by: Dao McGill <[email protected]>
daomcgill added a commit that referenced this issue Nov 6, 2024
- annotate.R generates the XML file
- query.R calls the query functions, depending on options

Note: may have to revisit output format after getting into the fasttext notebook

Signed-off-by: Dao McGill <[email protected]>
@daomcgill
Copy link
Collaborator Author

@carlosparadis I have added exec scripts for annotating and parsing. I have, however, had issues with defining a generic function that can take any XPath query as an argument. The generic solution would have to accommodate distinct xml structures with differing hierarchical relationships. I had to make some changes for each query function. For example, certain queries required that I define and pass the namespace, while others worked without (the preexisting functions did not use require this). Do you think I should continue to pursue this?
In the meantime, I will use these exec scripts to move on to the issue in the code_embedding repo.

@carlosparadis
Copy link
Member

Are you saying you can't use this function?

kaiaulu/R/src.R

Lines 327 to 341 in 7e7afba

query_src_text <- function(srcml_path,xpath_query,srcml_filepath){
srcml_path <- path.expand(srcml_path)
xpath_query <- path.expand(xpath_query)
srcml_filepath <- path.expand(srcml_filepath)
#srcml --xpath "//src:class/src:name" depends.xml
srcml_output <- system2(srcml_path,
args = c('--xpath',paste0('"',xpath_query,'"'),
srcml_filepath),
stdout = TRUE,
stderr = FALSE)
return(srcml_output)
}

I expect the other functions to be more specific:

kaiaulu/R/src.R

Line 356 in 7e7afba

query_src_text_class_names <- function(srcml_path,srcml_filepath){

and

kaiaulu/R/src.R

Line 399 in 7e7afba

query_src_text_namespace <- function(srcml_path,srcml_filepath){

Have their own code logic. I do not expect them to reuse each other. The only reused function is query_src_text

Let me know which you are referring a defining a generic function.

Also, your issue specification should be updated to reflect the planned function signatures. That will help disambiguate.

@daomcgill
Copy link
Collaborator Author

@carlosparadis I understand that query_src_text is the reusable generic function meant to handle the execution of XPath queries. I can have add a way to call that to the exec script. This will result in an unstructured string representation of the query passed in. Does this sound right to you?

daomcgill added a commit that referenced this issue Nov 7, 2024
daomcgill added a commit that referenced this issue Nov 8, 2024
- Renamed query.R to src_content_parser.R
- Edited description
- Added descriptions for options
- Changed output path slightly
- Added a temp config file for easy testing for fasttext issue
NOTE: current output_path is a temporary solution that is useful for me right now. This will be fixed pre-merge.

Signed-off-by: Dao McGill <[email protected]>
@carlosparadis carlosparadis added this to the ics496-fall24-m3 milestone Nov 11, 2024
daomcgill added a commit that referenced this issue Nov 15, 2024
Signed-off-by: Dao McGill <[email protected]>
@carlosparadis
Copy link
Member

I just noticed this last week message now. I am assuming you clarified this on call!

daomcgill added a commit that referenced this issue Nov 18, 2024
daomcgill added a commit that referenced this issue Dec 1, 2024
Signed-off-by: Dao McGill <[email protected]>
daomcgill added a commit that referenced this issue Dec 9, 2024
Signed-off-by: Dao McGill <[email protected]>
daomcgill added a commit that referenced this issue Dec 9, 2024
Signed-off-by: Dao McGill <[email protected]>
daomcgill added a commit that referenced this issue Dec 9, 2024
daomcgill added a commit that referenced this issue Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants