First, thank you for contributing to Vector! The goal of this document is to provide everything you need to start contributing to Vector. The following TOC is sorted progressively, starting with the basics and expanding into more specifics.
- You're familiar with Github and the pull request workflow.
- You've read Vector's docs.
- You know about the Vector community. Please use this for help.
- Ensure your change has an issue! Find an
existing issue or open a new issue.
- This is where you can get a feel if the change will be accepted or not.
Changes that are questionable will have a
needs: approval
label.
- This is where you can get a feel if the change will be accepted or not.
Changes that are questionable will have a
- One approved, fork the Vector repository in your own Github account.
- Create a new Git branch.
- Review the Vector workflow and development.
- Make your changes.
- Submit the branch as a pull request to the main Vector repo.
All changes must be made in a branch and submitted as pull requests. Vector does not adopt any type of branch naming style, but please use something descriptive of your changes.
Please ensure your commits are small and focused; they should tell a story of your change. This helps reviewers to follow your changes, especially for more complex changes.
Your commits must include a DCO signature. This is simpler than it sounds; it just means that all of your commits must contain:
Signed-off-by: Joe Smith <[email protected]>
Git makes this easy by adding the -s
or --signoff
flags when you commit:
git commit -sm 'My commit message'
We also included a make signoff
target that handles this for you if
you forget.
Once your changes are ready you must submit your branch as a pull
request.
The pull request title must follow the format outlined in the conventional
commits spec.
Conventional commits is a standardized
format for commit messages. Vector only requires this format for commits on
the master
branch. And because Vector squashes commits before merging
branches, this means that only the pull request title must conform to this
format. Vector performs a pull request check to verify the pull request title
in case you forget.
A list of allowed sub-categories is defined here.
The following are all good examples of pull request titles:
feat(new sink): new `xyz` sink
feat(tcp source): add foo bar baz feature
fix(tcp source): fix foo bar baz bug
chore: improve build process
docs: fix typos
We generally discourage large pull requests that are over 300-500 lines of diff. This is usually a sign that the pull request is addressing multiple concerns. If you would like to propose a larger change we suggest coming onto our chat channel and discuss it with one of our engineers. This way we can talk through the solution and discuss if a change that large is even needed! This overall will produce a quicker response to the change and likely produce code that aligns better with our process.
All pull requests must be reviewed and approved by at least one Vector team member. The review process is outlined in the Review guide.
All pull requests are squashed and merged. We generally discourage large pull requests that are over 300-500 lines of diff. If you would like to propose a change that is larger we suggest coming onto our gitter channel and discuss it with one of our engineers. This way we can talk through the solution and discuss if a change that large is even needed! This overall will produce a quicker response to the change and likely produce code that aligns better with our process.
Currently Vector uses CircleCI. The build process
is defined in /.circleci/config.yml
. This delegates heavily to the
distribution/docker
folder where Docker images are
defined for all of our testing, building, verifying, and releasing.
Tests are run for all changes, and Circleci is responsible for releasing updated versions of Vector through various channels.
-
Install Rust via
rustup
:curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
-
Install Docker. Docker containers are used for mocking Vector's integrations.
-
Install Ruby and Bundler 2. They are used to build Vector's documentation.
/benches
- Internal benchmarks./config
- Public facing Vector config, included in releases./distribution
- Distribution artifacts for various targets./lib
- External libraries that do not depend onvector
but are used within the project./proto
- Protobuf definitions./scripts
- Scripts used to generate docs and maintain the repo./src
- Vector source./tests
- Various high-level test cases./website
- Website and documentation files.
Vector includes a Makefile
in the root of the repo. This serves
as a high-level interface for common commands. Running make
will produce
a list of make targets with descriptions. These targets will be referenced
throughout this document.
We use rustfmt
on stable
to format our code and CI will verify that your
code follows
this format style. To run the following command make sure rustfmt
has been
installed on the stable toolchain locally.
# To install rustfmt
rustup component add rustfmt
# To format the code
make fmt
When a new component (a source, transform, or sink) is added, it has to be put
behind a feature flag with the corresponding name. This ensures that it is
possible to customize Vector builds. See the features
section in Cargo.toml
for examples.
In addition, during development of a particular component it is useful to
disable all other components to speed up compilation. For example, it is
possible to build and run tests only for console
sink using
cargo test --lib --no-default-features --features sinks-console sinks::console
In case if the tests are already built and only the component file changed, it is around 4 times faster than rebuilding tests with all features.
Documentation is extremely important to the Vector project. Ideally, all contributions that will change or add behavior to Vector should include the relevant updates to the documentation website.
The project attempts to make documentation updates as easy as possible, reducing most of it down to a few small changes which are outlined in DOCUMENTING.md.
Regardless of whether your changes require documentation updates you should
always run make generate
before attempting to merge your commits.
Developers do not need to maintain the Changelog
. This is
automatically generated via the make release
command. This is made possible
by the use of conventional commit titles.
Dependencies should be carefully selected and avoided if possible. You can see how dependencies are reviewed in the Reviewing guide.
If a dependency is required only by one or multiple components, but not by
Vector's core, make it optional and add it to the list of dependencies of
the features corresponding to these components in Cargo.toml
.
Sinks may implement a health check as a means for validating their configuration against the environment and external systems. Ideally, this allows the system to inform users of problems such as insufficient credentials, unreachable endpoints, non-existent tables, etc. They're not perfect, however, since it's impossible to exhaustively check for issues that may happen at runtime.
When implementing health checks, we prefer false positives to false negatives. This means we would prefer that a health check pass and the sink then fail than to have the health check fail when the sink would have been able to run successfully.
A common cause of false negatives in health checks is performing an operation that the sink itself does not need. For example, listing all of the available S3 buckets and checking that the configured bucket is on that list. The S3 sink doesn't need the ability to list all buckets, and a user that knows that may not have permitted it to do so. In that case, the health check will fail due to bad credentials even through its credentials are sufficient for normal operation.
This leads to a general strategy of mimicking what the sink itself does. Unfortunately, the fact that health checks don't have real events available to them leads to some limitations here. The most obvious example of this is with sinks where the exact target of a write depends on the value of some field in the event (e.g. an interpolated Kinesis stream name). It also pops up for sinks where incoming events are expected to conform to a specific schema. In both cases, random test data is reasonably likely to trigger a potentially false-negative result. Even in simpler cases, we need to think about the effects of writing test data and whether the user would find that surprising or invasive. The answer usually depends on the system we're interfacing with.
In some cases, like the Kinesis example above, the right thing to do might be nothing at all. If we require dynamic information to figure out what entity (i.e. Kinesis stream in this case) that we're even dealing with, odds are very low that we'll be able to come up with a way to meaningfully validate that it's in working order. It's perfectly valid to have a health check that falls back to doing nothing when there is a data dependency like this.
With all that in mind, here is a simple checklist to go over when writing a new health check:
- Does this check perform different fallible operations from the sink itself?
- Does this check have side effects the user would consider undesirable (e.g. data pollution)?
- Are there situations where this check would fail but the sink would operate normally?
Not all of the answers need to be a hard "no", but we should think about the likelihood that any "yes" would lead to false negatives and balance that against the usefulness of the check as a whole for finding problems. Because we have the option to disable individual health checks, there's an escape hatch for users that fall into a false negative circumstance. Our goal should be to minimize the likelihood of users needing to pull that lever while still making a good effort to detect common problems.
You can run Vector's tests via the make test
command. Our tests use Docker
compose to spin up mock services for testing, such as
localstack.
We use flog
to build a sample set of log files to test sending logs from a
file. This can be done with the following commands on mac with homebrew.
Installation instruction for flog can be found
here.
flog --bytes $((100 * 1024 * 1024)) > sample.log
This will create a 100MiB
sample log file in the sample.log
file.
If you are developing a particular component and want to quickly iterate on unit tests related only to this component, the following approach can reduce waiting times:
-
Install cargo-watch.
-
(Only for GNU/Linux) Install LLVM 9 (for example, package
llvm-9
on Debian) and setRUSTFLAGS
environment variable to uselld
as the linker:export RUSTFLAGS='-Clinker=clang-9 -Clink-arg=-fuse-ld=lld'
-
Run in the root directory of Vector's source
cargo watch -s clear -s \ 'cargo test --lib --no-default-features --features=<component type>-<component name> <component type>::<component name>'
For example, if the component is
add_fields
transform, the command above turns intocargo watch -s clear -s \ 'cargo test --lib --no-default-features --features=transforms-add_fields transforms::add_fields'
All benchmarks are placed in the /benches
folder. You can
run benchmarks via the make benchmarks
command. In addition, Vector
maintains a full test hardness for complex
end-to-end integration and performance testing.
Please see the SECURITY.md
file.
To protect all users of Vector, the following legal requirements are made.
Vector requires all contributors to agree to the DCO. DCO stands for Developer Certificate of Origin and is maintained by the Linux Foundation. It is an attestation attached to every commit made by every developer. It ensures that all committed code adheres to the Vector license (Apache 2.0).
Trivial changes, such as spelling fixes, do not need to be signed.
It is important to note that the DCO is not a license. The license of the project – in our case the Apache License – is the license under which the contribution is made. However, the DCO in conjunction with the Apache License may be considered an alternate CLA.
The existence of section 5 of the Apache License is proof that the Apache License is intended to be usable without CLAs. Users need for the code to be open-source, with all the legal rights that imply, but it is the open source license that provides this. The Apache License provides very generous copyright permissions from contributors, and contributors explicitly grant patent licenses as well. These rights are granted to everyone.
It's simpler, clearer, and still protects users of Vector. We believe the DCO more accurately embodies the principles of open-source. More info can be found here:
Nope! The DCO confirms that you are entitled to submit the code, which assumes that you are authorized to do so. It treats you like an adult and relies on your accurate statement about your rights to submit a contribution.
No probs! We made this simple with the signoff
Makefile target:
make signoff
If you prefer to do this manually:
https://stackoverflow.com/questions/13043357/git-sign-off-previous-commits