preamble_collection_tool

Tool to scan through a given source code base and detect, process, analyze, and catalog, all source code name identifiers, then determine if the those identifier names contain preambles.

This tool is the implmentation of the Graduate Capstone Project: Automated Detection and Analysis of Common Preambles and Their Meanings in Source Code Identifiers

Capstone Project Abstract

The interpretation of identifier names is a significant problem in software development. Programmers must interpret identifier names before performing any software development or maintenance task, code search and analysis techniques use identifier names to provide useful services to developers. Having high quality identifier names reduces the amount of time developers spend reading code and the increases the accuracy of techniques that use natural language information to support developers. Preambles are a particular type of token found in identifiers. Unlike typical tokens, Preambles add no new information to the meaning of am identifier’s name—but instead specify certain types of behavior (e.g., pointers) or help namespace (e.g., in the case of the C programming language) identifiers to a specify module. Because preambles add no new information, they should be removed from, or at least identified in, identifiers before code analysis tries to interpret identifier name meaning. The goal of my project is to validate and augment the types of preambles described in prior work and create a technique that can automatically detect preambles.

Dependencies & Installation

Before using preamble_collection_tool, the following dependencies must be installed:

It is important to note that as of now, preamble_collection_tool, is not platform independent, and will only run on Unix based operating systems (ie. Linux, or OSX)

After properly installing all necessary dependencies, create 3 new directories/folders in the main working directory:

projects/
reports/
analysis/

After creating these directories, clone/download source code projects that you desire to analyze into the newly created projects directory.

NOTE: This tool currently only detects identifiers from source code files written in C, C++, C#, and Java

Usage & Operation

The tool is run through the following command: python3 preamble_tool.py [OPTION]

There are a number of commands to run for operation:

preamble_tool.py --help or preamble_tool.py -h - Displays the help/options menu for reference.
preamble_tool.py --collect-project data or preamble_tool.py -cpd - Collects the name identifiers from all source files in all projects directory. Collected raw data from projects will be put into the reports directory.
preamble_tool.py --collect-first-terms or preamble_tool.py -cft - Scans all files in the reports directory and collects the most commonly used first terms of identifiers in all projects. Outputs two CSV files into the analysis directory. One of these files, dictionary_first_terms.csv contains first terms that are in the English dictionary, while other_first_terms.csv contains all others. These files contain all individual first terms of identifiers used in all analyzed projects, as well as how often these first terms are used in all instances, as well as an example identifier, and location of said example.
preamble_tool.py --analyze-project-data or preamble_tool.py -apd - Analyzes all gathered data in the reports directory, and runs full analysis to determine which identifiers in all projects contain preambles. Outputs final analysis results into preamble_identifiers.csv, located in the analysis directory.

Research References & Sources

The heuristics and scientific basis from which this tool collects, gathers, and analyzes source code name identifiers is based on research conducted by various scientific sources.

These sources are:

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
__pycache__		__pycache__
.gitignore		.gitignore
README.md		README.md
analyzer.py		analyzer.py
collector.py		collector.py
heuristics_terms.txt		heuristics_terms.txt
identifier.py		identifier.py
preamble_tool.py		preamble_tool.py
project.py		project.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

preamble_collection_tool

Capstone Project Abstract

Dependencies & Installation

Usage & Operation

Research References & Sources

About

Releases

Packages

Languages

hkeena98/preamble_collection_tool

Folders and files

Latest commit

History

Repository files navigation

preamble_collection_tool

Capstone Project Abstract

Dependencies & Installation

Usage & Operation

Research References & Sources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages