GitHub Statistics Study

An Software Engineering Statistics Study using the GitHub API.

Abstract

Our team conducted a statistical survey on the language type preferences of large software projects using a proprietary algorithm that queries the GitHub API. We found that 56% of large software projects prefer explicitly typed languages over implicitly typed languages.

Authors and Affiliations

Getting Started

Begin by installing the latest version of Python 3. Clone the repo to the folder where you would like to run it. Run the command python main.py

Purpose of Study

Our goal is to determine whether large software projects prefer to use statically typed programming languages (like C, C++, Java) or dynamically typed languages (like JavaScript, Python, PHP). It is often said that as software systems increase in size that static type checking becomes a useful feature for eliminating entire classes of bugs, specifically compile-time type errors.

Our interest is to test whether this feature of statically typed languages makes it a more common choice for large software projects.

Hypothesis

We hypothesize that more than 50% of open source large software projects use statically type programming languages as opposed to dynamically typed languages.

Definitions

Determining whether project is large

We consider a large software project to have over 1,000,000 bytes and over 10 contributors. The size of a project refers to these metrics.

Determining whether statically typed or dynamically typed

We will assemble a hash-table which maps the most common languages to either 'explicit' or 'implicit'. The list should be comprehensive and can be found in the file language-types.json. Research into every language has been conducted to assert that data is correct. Language names were taken from linguist provided by Github.

Statistical Population

Our statistical population is the large open source software engineering projects found on Github.

Sample Size

N = 50

How Data Sampling is conducted

Data is sampled using the public GitHub API.

The script queries a random project and determines the size of the project. Verify the project id in not already in the sample data set (to preserve independence)
If the project meets the size criteria of a large software project they our added to our sample data set.
Continue until N projects are inserted into the sample data set.
For each project in the sample data set check its most prominent language.
Take that language and check whether statically typed or dynamically typed.
Insert the project information into its corresponding data set.
The script calculates the sample mean, sample median, sample variance and sample standard deviation.

Analysis

Our findings found that 28 of the 50 projects that were inspected were using explicitly typed languages. For more information about the math that we used to determine this please refer to https://drive.google.com/file/d/1lbfze5P0Y2RtJqXF_VUFQSkdVS3xtguR/view?usp=sharing

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.vscode		.vscode
archived		archived
language_data		language_data
reports		reports
.gitignore		.gitignore
README.md		README.md
main.py		main.py
stats.py		stats.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GitHub Statistics Study

Abstract

Authors and Affiliations

Getting Started

Purpose of Study

Hypothesis

Definitions

Determining whether project is large

Determining whether statically typed or dynamically typed

Statistical Population

Sample Size

How Data Sampling is conducted

Analysis

About

Releases

Packages

Languages

jonthemango/github_statistics_study

Folders and files

Latest commit

History

Repository files navigation

GitHub Statistics Study

Abstract

Authors and Affiliations

Getting Started

Purpose of Study

Hypothesis

Definitions

Determining whether project is large

Determining whether statically typed or dynamically typed

Statistical Population

Sample Size

How Data Sampling is conducted

Analysis

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages