Skip to content

A nifty script that fetches the licenses for all your third-party libraries

License

Notifications You must be signed in to change notification settings

EricBioIT/license-cop

 
 

Repository files navigation

License Cop

Build Status

A nifty script that fetches the licenses for all your third-party libraries.

Dog cop meme

Supported Platforms

The following platforms are supported:

System Requirements

You will need:

Using Docker

Docker allows you to build whole environment required for running license-bot in an easy way.

Building a Docker image

Enter directory with project, then run:

docker image build . -t license-bot-image

Running license bot

The following command will collect all licenses for given repository in report.txt

docker run -e GITHUB_TOKEN='YOUR_GH_TOKEN' --entrypoint ./license-cop license-bot-image  https://github.com/toptal/some-repository report.txt

Running tests and linter

docker run --entrypoint ./test.sh license-bot-image
docker run --entrypoint ./lint.sh license-bot-image

Local installation

Installing Pipenv

It's advisable to install Pipenv locally, but in most systems installing it system-wide should work just fine. If you're using a homebrewed macOS:

$ pip3 install pipenv

Make sure your shell profile (eg: ~/.profile or ~/.bash_profile) exports the following environment variables, otherwise Pipenv will not work:

export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8

The GITHUB_TOKEN environment variable

You need to have a valid GitHub personal access token with enough permissions to read the repositories you want.

This token needs to be exported to the GITHUB_TOKEN environment variable.

Installing Dependencies

This will install runtime dependencies only:

$ pipenv install

To install development dependencies (eg: test frameworks), you should also run:

$ pipenv install -d

Running License Cop

Once everything is set, run the ./license-cop script. It will print its usage instructions.

Development

Running Tests

We use pytest to execute our automated test suite, which is installed by Pipenv.

To run the entire test suite, just invoke the ./test.sh script.

PEP-8

This project must adhere to the PEP-8 style guide.

You can check if your changes adhere to this style by invoking the ./lint.sh script.

Before Commiting Your Changes

Please use the ./pre-commit.sh script before checking-in your code in order to run the checks above and ensure nothing breaks in master.

Architecture and Domain

Source Code Repository

Source code repositories, or simple, repositories, are file hierarchies that stores and versions source code.

Currently, only GitHub repositories are supported, being represented by instances of the GithubRepository class. If necessary, support for different version control platforms can be easilly added.

Package, Version and Dependency

Packages are binary artifacts. A project, when fully assembled, is ultimately a package itself. Packages have versions (ex: 2.5.3).

A package may depend on other packages. Package dependencies, or just dependencies, describe the requirements that a package must be shipped with.

A package version is represented by the PackageVersion class, and has the following data:

  • name (eg: httparty).
  • version number (eg: 2.5.3).
  • runtime dependencies — dependencies required by production code.
  • development dependencies — dependencies required only for testing and local development.
  • licenses (eg: BSD and Apache 2.0).

A dependency has the following information, and it is represented by the Dependency class:

  • package name (eg: httparty).
  • kind (runtime or development).
  • version requirements (eg: a version higher or equal than 2.0.3 but less than 2.1).

Dependencies are resolved by a package manager, which will query and download them from a package registry.

Package Registry

Package registries are online hubs to store and share package versions. Examples of registries are RubyGems and PyPI.

The PackageRegistry module is responsible for interacting with the package registry API of the given platform, fetching information about package versions, dependencies and licenses. For instance, RubyGems provides a nice REST API.

Manifest

A manifest is a set of one or several files that describe the dependencies a project relies on. They are processed by the package manager of a given platform.

For example, the Ruby platform has the Gemfile manifest, which usually sits at the root of the project and is processed by the bundler tool. This file has a structure like this:

source 'http://rubygems.org/'

gemspec

gem 'httparty', '~> 2.0.3'

group :test do
  gem 'rspec'
end

group :development do
  gem 'rake'
  gem 'rubocop'
end

Here it's being specified that RubyGems is the package registry that should be used.

Parsing this file should result in a list of runtime and development dependencies. Here httparty is a runtime dependency, and its version should be higher or equal than 2.0.3 but less than 2.1. Likewise, rspec, rake and rubocop are all development dependencies, and any version can be used, preferably the latest.

The Manifest class has the following data:

  • platform (eg: Python).
  • repository (eg: https://github.com/requests/requests).
  • paths on that repository (eg: requirements.txt and requirements-test.txt).
  • runtime dependencies.
  • development dependencies.

Repository Matcher

The RepositoryMatcher module is responsible for browsing a repository's file structure, detecting manifest files for a given platform and parsing them. The result is a list of dependencies.

It works as follows:

let P be a platform
let R be a repository
let T be the file tree of R
for each file F from T:
    if F matches a format specified by P:
        then parse F for a list of runtime and development dependencies

Dependency Resolver and Resolution

Each manifest is fed to a DependencyResolver instance, which will query the package registry of the given platform in order to obtain information about pacakge versions. It will then find a set of package versions that match all dependency requirements specified in the manifest.

The result of this step is a tree, represented by the DependencyResolution class. This is an example:

+ pytest-mock:1.6.3 → MIT
⎮--= [runtime] mock:2.0.0 → BSD-2-Clause
⎮--+ [runtime] pytest:3.2.2 → MIT license
⎮  ⎮--= [runtime] colorama:0.3.9 → BSD
⎮  ⎮--= [runtime] ordereddict:1.1 → <no licenses found>
⎮  ⎮--= [runtime] argparse:1.4.0 → Python Software Foundation License
⎮  ⎮--• [runtime] setuptools:36.5.0 → MIT
⎮  ⎮--= [runtime] py:1.4.34 → MIT license

Adding Support for a Platform

In order to support a platform, you need to do three things:

  1. Implement the platform's PackageRegistry class.
  2. Implement the platform's RepositoryMatcher class.
  3. Register the platform.

Implement the PackageRegistry

Suppose you want to support the Foobar platform. You would then create a FoobarPackageRegistry class inside app/platforms/foobar/package_registry.py.

This class should extend PackageRegistry and implement all of its abstract methods:

def _fetch_version(self, name, number)
def _fetch_latest_version(self, name)

These methods should each return an instance of PackageVersion.

The superclass already defines a _session attribute that contains an instance of a requests session. You can use this session to make HTTP requests to the platform's package registry.

Most package registry APIs are able to retrieve the license for a given package version. Very often these licenses are not properly filled, or absent. However, if the package registry is able to inform a GitHub repository for the given version, we can leverage GitHub's license API to determine the license.

To make this process as easy as possible, the superclass defines the _find_licenses_in_code_repository_urls method, which receives a list of (possible) urls, check if they reference valid GitHub repositories, and if they do, retrieve their licenses.

def _find_licenses_in_code_repository_urls(self, urls)

Also, please make sure you cover your implementation with tests using pytest and VCR. These tests should be placed under test/platforms/foobar/test_foobar_package_registry.py (it's necessary to include the platform name in the file name because of a pytest limitation).

Implement the RepositoryMatcher

Likewise, you should create a FoobarRepositoryMatcher class inside app/platforms/foobar/repository_matcher.py.

This class should extend RepositoryMatcher and implement all of its abstract methods. It should be initialized with a list of unix shell-style wildcard patterns to be match in the repository's file tree.

Subclasses of RepositoryMatcher should pass to the super's __init__ block a list of patterns that will be matched against a repository. For example:

class FoobarRepositoryMatcher(RepositoryMatcher):
    def __init__(self):
        super().__init__(['Foofile', '*.foospec'])

FoobarRepositoryMatcher should also override the _fetch_manifest method. This method receives the repository and a match object ( ManifestMatch).

def _fetch_manifest(self, repository, match)

The match object will have a list of GitNode instances that match one of the specified patterns.

You can then use the repository to fetch the contents of the files you need. This method should return an instance of Manifest.

Don't forget to cover your FoobarRepositoryMatcher with tests. They should be placed in test/platforms/foobar/test_foobar_repository_matcher.py.

Register the Platform

First, create a app/platforms/foobar/__init__.py file. Then build an instance of Platform as follows:

from app.platforms.foobar.package_registry import *
from app.platforms.foobar.repository_matcher import *
from app.platform import *


INSTANCE = Platform('Foobar', FoobarRepositoryMatcher(), FoobarPackageRegistry())

Finally, just register this platform instance at the app initialization file, following the structure already in place.

About

A nifty script that fetches the licenses for all your third-party libraries

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.7%
  • Other 0.3%