A nifty script that fetches the licenses for all your third-party libraries.
The following platforms are supported:
- Ruby — RubyGems:
- Python — PyPI:
- Requirements files (eg:
requirements.txt
,requirements-test.txt
...) -
Pipfile
-
setup.py
- Requirements files (eg:
- Node.js — NPM:
-
package.json
-
- Scala:
-
build.sbt
-
project
folder with*.scala
or*.sbt
files
-
- JVM (Java, Scala...):
- iOS (Swift, Objective-C) — CocoaPods:
- Elixir — Hex:
You will need:
- Python 3.6+
- Pipenv
Docker allows you to build whole environment required for running license-bot in an easy way.
Enter directory with project, then run:
docker image build . -t license-bot-image
The following command will collect all licenses for given repository in report.txt
docker run -e GITHUB_TOKEN='YOUR_GH_TOKEN' --entrypoint ./license-cop license-bot-image https://github.com/toptal/some-repository report.txt
docker run --entrypoint ./test.sh license-bot-image
docker run --entrypoint ./lint.sh license-bot-image
It's advisable to install Pipenv locally, but in most systems installing it system-wide should work just fine. If you're using a homebrewed macOS:
$ pip3 install pipenv
Make sure your shell profile (eg: ~/.profile
or ~/.bash_profile
) exports the
following environment variables, otherwise Pipenv will not work:
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
You need to have a valid GitHub personal access token with enough permissions to read the repositories you want.
This token needs to be exported to the GITHUB_TOKEN
environment variable.
This will install runtime dependencies only:
$ pipenv install
To install development dependencies (eg: test frameworks), you should also run:
$ pipenv install -d
Once everything is set, run the ./license-cop
script. It will print
its usage instructions.
We use pytest to execute our automated test suite, which is installed by Pipenv.
To run the entire test suite, just invoke the ./test.sh
script.
This project must adhere to the PEP-8 style guide.
You can check if your changes adhere to this style by invoking the
./lint.sh
script.
Please use the ./pre-commit.sh
script before checking-in your code in order
to run the checks above and ensure nothing breaks in master.
Source code repositories, or simple, repositories, are file hierarchies that stores and versions source code.
Currently, only GitHub repositories are supported, being represented by instances of the GithubRepository class. If necessary, support for different version control platforms can be easilly added.
Packages are binary artifacts. A project, when fully assembled, is ultimately a
package itself. Packages have versions (ex: 2.5.3
).
A package may depend on other packages. Package dependencies, or just dependencies, describe the requirements that a package must be shipped with.
A package version is represented by the PackageVersion class, and has the following data:
- name (eg:
httparty
). - version number (eg:
2.5.3
). - runtime dependencies — dependencies required by production code.
- development dependencies — dependencies required only for testing and local development.
- licenses (eg: BSD and Apache 2.0).
A dependency has the following information, and it is represented by the Dependency class:
- package name (eg:
httparty
). - kind (runtime or development).
- version requirements (eg: a version higher or equal
than
2.0.3
but less than2.1
).
Dependencies are resolved by a package manager, which will query and download them from a package registry.
Package registries are online hubs to store and share package versions. Examples of registries are RubyGems and PyPI.
The PackageRegistry
module is responsible for
interacting with the package registry API of the given platform, fetching
information about package versions, dependencies and licenses. For instance,
RubyGems provides a nice REST API.
A manifest is a set of one or several files that describe the dependencies a project relies on. They are processed by the package manager of a given platform.
For example, the Ruby platform has the Gemfile
manifest, which
usually sits at the root of the project and is processed by the
bundler
tool. This file
has a structure like this:
source 'http://rubygems.org/'
gemspec
gem 'httparty', '~> 2.0.3'
group :test do
gem 'rspec'
end
group :development do
gem 'rake'
gem 'rubocop'
end
Here it's being specified that RubyGems is the package registry that should be used.
Parsing this file should result in a list of runtime
and development dependencies. Here httparty
is a runtime dependency, and its
version should be higher or equal than 2.0.3
but less than 2.1
. Likewise,
rspec
, rake
and rubocop
are all development dependencies, and any version
can be used, preferably the latest.
The Manifest
class has the following
data:
- platform (eg: Python).
- repository (eg: https://github.com/requests/requests).
- paths on that repository (eg:
requirements.txt
andrequirements-test.txt
). - runtime dependencies.
- development dependencies.
The RepositoryMatcher
module is responsible for
browsing a repository's file structure, detecting manifest files for
a given platform and parsing them. The result is a list of dependencies.
It works as follows:
let P be a platform let R be a repository let T be the file tree of R for each file F from T: if F matches a format specified by P: then parse F for a list of runtime and development dependencies
Each manifest is fed to a
DependencyResolver
instance, which will
query the package registry of the given platform in order to obtain
information about pacakge versions. It will then find a set of package versions
that match all dependency requirements specified in the manifest.
The result of this step is a tree, represented by the
DependencyResolution
class.
This is an example:
+ pytest-mock:1.6.3 → MIT
⎮--= [runtime] mock:2.0.0 → BSD-2-Clause
⎮--+ [runtime] pytest:3.2.2 → MIT license
⎮ ⎮--= [runtime] colorama:0.3.9 → BSD
⎮ ⎮--= [runtime] ordereddict:1.1 → <no licenses found>
⎮ ⎮--= [runtime] argparse:1.4.0 → Python Software Foundation License
⎮ ⎮--• [runtime] setuptools:36.5.0 → MIT
⎮ ⎮--= [runtime] py:1.4.34 → MIT license
In order to support a platform, you need to do three things:
- Implement the platform's
PackageRegistry
class. - Implement the platform's
RepositoryMatcher
class. - Register the platform.
Suppose you want to support the Foobar
platform. You would then create
a FoobarPackageRegistry
class inside app/platforms/foobar/package_registry.py
.
This class should extend PackageRegistry
and
implement all of its abstract methods:
def _fetch_version(self, name, number)
def _fetch_latest_version(self, name)
These methods should each return an instance of
PackageVersion
.
The superclass already defines a _session
attribute
that contains an instance of a
requests
session.
You can use this session to make HTTP requests to the platform's package registry.
Most package registry APIs are able to retrieve the license for a given package version. Very often these licenses are not properly filled, or absent. However, if the package registry is able to inform a GitHub repository for the given version, we can leverage GitHub's license API to determine the license.
To make this process as easy as possible, the superclass defines
the _find_licenses_in_code_repository_urls
method, which receives a list of
(possible) urls, check if they reference valid GitHub repositories, and if they
do, retrieve their licenses.
def _find_licenses_in_code_repository_urls(self, urls)
Also, please make sure you cover your implementation with tests using
pytest and
VCR. These tests should be placed
under test/platforms/foobar/test_foobar_package_registry.py
(it's necessary
to include the platform name in the file name because of a pytest limitation).
Likewise, you should create a FoobarRepositoryMatcher
class inside
app/platforms/foobar/repository_matcher.py
.
This class should extend RepositoryMatcher
and
implement all of its abstract methods. It should be initialized with a list
of unix shell-style wildcard patterns to be match in the repository's file tree.
Subclasses of RepositoryMatcher
should pass to the super's __init__
block a list of patterns that will be matched against a repository. For example:
class FoobarRepositoryMatcher(RepositoryMatcher):
def __init__(self):
super().__init__(['Foofile', '*.foospec'])
FoobarRepositoryMatcher
should also override the _fetch_manifest
method. This method receives the repository and a match object (
ManifestMatch
).
def _fetch_manifest(self, repository, match)
The match object will have a list of GitNode
instances that match one of the specified patterns.
You can then use the repository to fetch the contents of
the files you need. This method should return an instance of
Manifest
.
Don't forget to cover your FoobarRepositoryMatcher
with tests. They should
be placed in test/platforms/foobar/test_foobar_repository_matcher.py
.
First, create a app/platforms/foobar/__init__.py
file. Then build an instance
of Platform
as follows:
from app.platforms.foobar.package_registry import *
from app.platforms.foobar.repository_matcher import *
from app.platform import *
INSTANCE = Platform('Foobar', FoobarRepositoryMatcher(), FoobarPackageRegistry())
Finally, just register this platform instance at the app initialization file, following the structure already in place.