Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scripts: zephyr_module: Add CPE support #66495

Conversation

tgagneret-embedded
Copy link
Contributor

This allows the possibility of:

  • Adding CPE for modules when generating the SBOM (SPDX)
  • Monitoring CVE by adding a CPE related to modules (e.g a module is a forked of an other repository)

cpe and version fields are extracted from module.yml and included in generated kernel meta file (making it available for SBOM generation).

This PR is related to issue #53479 and follows PR #66182 that aims to improve the SBOM output and better monitoring of vulnerabilities.


.. code-block:: yaml

vulnerabilities:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure vulnerabilities is the best key at this level.
It can give the impression that it will contain information regarding actual vulnerabilities, which is not the case.

You already propose a cpe entry, so why can we not provide the remote as a sub-field under there ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An evolution to vulnerabilities would be to add the field vex which would point to a vex file allowing to filter CVE.
Do you think it would fit in cpe field or should we find an other name (not sure what it would be though) ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for sharing some thoughts for future direction of this.

With a vex (or vdr) field, then vulnerabilities starts to make more sense as name.
I don't have a better suggestion for another name.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about simplysecurity:? security_info:?
I do agree with @tejlmand that "vulnerabilities" makes it sound like it lists vulnerabilities or something

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like security_info more.
Broad enough to cover the vex info, while still specific enough to be useful for readers to understand the context.

Copy link
Collaborator

@tejlmand tejlmand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this addition.

Notice some minor parts to adjust before +1, namely version and first time use of abbreviations.

@@ -569,6 +569,36 @@ Build files located in a ``MODULE_EXT_ROOT`` can be described as:
This allows control of the build inclusion to be described externally to the
Zephyr module.

SBOM integration
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we turn this around.

Instead of having SBOM integration, then describe what content is (example: Vulnerability monitoring), and then instead describe that this information will be included in the sbom.

The vulnerability monitoring / information can be of interest, even without an SBOM.

(Just a thought, not a request for a change)

The module description file :file:`zephyr/module.yml` can be used to improve the
generated SBOM and vulnerability monitoring.

The version and the CPE associated to the module can be provided using
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when using abbreviations like: CPE and CVE (and soon VEX), we should write them in full once, so that even users unfamiliar with the terms know what they mean.


.. code-block:: yaml

version: <module-version>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move the version out of SBOM integration.

version has value on its own, and should be considered first class citizen in the module yaml.
Just like name is independent: https://docs.zephyrproject.org/latest/develop/modules.html#module-name

@kartben kartben requested a review from kestewart January 8, 2024 15:34
@tgagneret-embedded tgagneret-embedded force-pushed the feature/update_meta_file_with_additional_information branch from c8f98bd to b1ad1b6 Compare January 9, 2024 13:45
@zephyrbot zephyrbot added the area: West West utility label Jan 9, 2024
@tgagneret-embedded
Copy link
Contributor Author

@tejlmand I've taken into account your review. I also added the integration of CPE and version in the SBOM since documentation talk about the SBOM.

I wonder if we should have some testing on the output SBOM ? For example we could generate it for each project built in the pipeline and then run pyspdxtools to validate them. That would be a start. What do you think ?

I'm not familiar with zephyr pipeline, could you point me to the correct files if you think testing is a requirement for this.

Thanks

@tgagneret-embedded tgagneret-embedded force-pushed the feature/update_meta_file_with_additional_information branch 2 times, most recently from 39be3c1 to 8fdd25f Compare January 10, 2024 14:33
@tgagneret-embedded
Copy link
Contributor Author

I wonder if I should add remote_purl to the vulnerabilities field ? That would make a lot of sense for project who only have github advisory database support (MCUBoot for example).

Maybe I could rename remote_cpe to external_references and the list could be either purl or cpe. Parsing would detect automatically which one it is.

I would keep cpe field but I don't know if a new purl field is necessary right now, since it would only make sense if someone mirrors modules of the project. Do you think it should be added as well ?

What do you think @tejlmand ?

@tgagneret-embedded tgagneret-embedded force-pushed the feature/update_meta_file_with_additional_information branch from d35e433 to 9512f31 Compare January 16, 2024 16:24
@tgagneret-embedded
Copy link
Contributor Author

I've added a new commit with the modification from my previous comment. If it's okay with you, I can squash them at the end.

tejlmand
tejlmand previously approved these changes Jan 19, 2024
Copy link
Collaborator

@tejlmand tejlmand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

So minor doc suggestions but nothing blocking.

Module version
==============

Each Zephyr module should have a version number to improve tracking.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should seems a bit strong in this case, especially as it would indicate that modules are having some form of release cycles. This may be true for some modules, but other modules revision might be considered to be identified (thus indirectly follow the Zephyr version) through the SHA in the manifest, which again is identified by the Zephyr revision / version.

Suggested change
Each Zephyr module should have a version number to improve tracking.
Each Zephyr module may have a version number to improve tracking.


The module description file :file:`zephyr/module.yml` can be used for vulnerability monitoring
and improve the generated SBOM output.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good.

Maybe at this location specify the version: field should be used.
For example:

Suggested change
When a module monitors vulnerabilities then the module should also define the `version` field, :ref:`module_version`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your comment makes me think this version field creates confusion. There was a misunderstanding from my part in the original issue.

The goal was to provide a version for the module and help complete the SBOM for the module (and not searching vulnerabilities from the forked repository for example). The external-references field must be used for that (and CPE and PURL specifications allow to include the version)

I think we should remove the version field from this PR and I will clarify this for a next PR. Are you okay with this ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that there are some clarification (raised by your comment) to be done on this PR with the security group. In the meantime, I converted it into draft and I'll update it when I have all the information I need.

I'm sorry for this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you believe it clarifies the usage and intentions of the field, then it's fine.

The clearer the intention of a PR, the better.

@ceolin
Copy link
Member

ceolin commented Jan 29, 2024

I tried

version: "3.5"
vulnerabilities:
  external-references: 
    - "cpe:2.3:a:arm:mbed_tls:*:*:*:*:*:*:*:*"
    - "pkg:https://github.com/Mbed-TLS/mbedtls/"
build:
  cmake-ext: True
  kconfig-ext: True 

in mbedTLS module and I couldn't see the cpe in the output, if I do:

version: "3.5"
cpe: "cpe:2.3:a:arm:mbed_tls:*:*:*:*:*:*:*:*"
build:
  cmake-ext: True
  kconfig-ext: True 

I can see the cpe, but my understanding is that we should see cpe for downstream and upstream packages right ? Though we will probably be using (at least, in beginning) the upstream CPE.

@tgagneret-embedded tgagneret-embedded force-pushed the feature/update_meta_file_with_additional_information branch from 9512f31 to 391ab83 Compare January 30, 2024 15:05
@tgagneret-embedded
Copy link
Contributor Author

@ceolin, the PR was not updated yet, I was waiting for your approval before updating it.

You can now try it.

To clarify, cpe and version fields were removed because it does not really make sense now. Only external-references has been added to the module.yml and my understanding from our last conversation, it's enough for our current use case.

@tejlmand I think you can use #53479 (comment) for a better understanding of the changes.

@tgagneret-embedded tgagneret-embedded force-pushed the feature/update_meta_file_with_additional_information branch from 0e5326c to b25a00c Compare April 15, 2024 10:06
@tgagneret-embedded
Copy link
Contributor Author

@ceolin @kartben It seems the PR is stalled. I think every comments have been taken into account. It's been over a month and there is no new review or comment.

Can you help me with this ?

Thanks

@kartben
Copy link
Collaborator

kartben commented Apr 16, 2024

@tgagneret-embedded Thank you for your patience. It's "unfortunate" you just pushed a new commit as this dismissed the existing reviews and will only contribute to merge taking even longer, as this now needs another (hopefully much faster though) round of reviews. That being said, it doesn't justify why things have been taking so long so I hope we'll get this merged soon.

@tejlmand, please do try to have a look at this PR again as it's been several months since it's been waiting for your approval as the assignee.

Comment on lines 57 to 62
if len(revision) == 0 or url.startswith("git://"):
return url

url = url.replace("https://", "git+https://")
url = url.replace("http://", "git+http://")
url = "@".join([url, revision])
Copy link
Collaborator

@kartben kartben Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure this handles properly the case where url already starts with "git+http"? Shouldn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

URL si extracted from meta file. The URL is the one for cloning the repository. I don't think it can have the url already starting with "git+".
This function is not generic and takes into account that this script is run in zephyr: only git is supported, and url is parsed from git remote command (and maybe using west list values later)

So, do you think I should add a special case for the URL that starts with "git+" (even though I don't think it can happen in our case) ?
Maybe you see a special case I did not see.

@kartben
Copy link
Collaborator

kartben commented Apr 16, 2024

@tgagneret-embedded #70581 was just merged causing the need for this PR to be rebased.

Copy link
Collaborator

@tejlmand tejlmand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better. Still a couple of observations.

As for the last two commits, can you please squash those into the appropriate parent commits ?
When reviewing commit-by-commit, then it's not nice to suddenly see code that one comment suddenly being changed in later commit in same PR.

Comment on lines 98 to 94
def _normalize_module_name(self, module_name):
# Replace "_" by "-" since it's not allowed in spdx ID
return module_name.replace("_", "-")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that the normalized path goes into the cfgPackageZephyrModule.

I understand that if _ is not allowed directly in the spdx IP, then we must normalize, but then I as minimum think we should have a field which provides the module name without normalization, so that tools can process this information.

The problem with _ -> -, is that you cannot go back once the normalization has been performed. (some modules has - in their name).

@@ -216,39 +249,67 @@ def setupZephyrDocument(self, modules):
cfgPackageZephyr.spdxID = "SPDXRef-zephyr-sources"
cfgPackageZephyr.relativeBaseDir = relativeBaseDir

zephyr_url = zephyr.get("remote", "")
if zephyr_url:
cfgPackageZephyr.url = zephyr_url
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if looking at the examples given in 7.7.3, then it states:

For Git:

SPDX supported schemes are: git, git+git, git+https, git+http, and git+ssh. git and git+git are equivalent.

Here are the supported forms:

PackageDownloadLocation: git://git.myproject.org/MyProject

PackageDownloadLocation: git+https://git.myproject.org/MyProject.git

PackageDownloadLocation: git+http://git.myproject.org/MyProject
...

so http:// or https:// without git, doesn't seem to be sufficient.

remotes_name = stdout.rstrip().split('\n')

remote_url = None
if len(remotes_name) == 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe the approach of looking for a single remote is correct.

west actually uses the url for fetching (although it does create a remote for user convenience if none exists).

This means that the url used by west update may be different than the url described by a single remote.

Thus, a much safer approach is to use the url returned by west list when the Zephyr module is also a west project.

Note, a Zephyr module doesn't have to be a west project, also it doesn't even need to be a git repo.
So those cases must also be properly handled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so http:// or https:// without git, doesn't seem to be sufficient.

The URL is transformed in writer.py because it's related to SPDX.
Just like you said that - should be replaced in writer.py and not in walker.py.

Note, a Zephyr module doesn't have to be a west project, also it doesn't even need to be a git repo.
So those cases must also be properly handled.

I agree, but before my PR, only git repository was supported. The only way to get the revision was through the git_revision function. So even if I agree with you, I would prefer to keep the current state of support and extend it later if possible.

I will have a look at how to get the same data returned by west list, however I would need to know how I should handle the case where one or more remotes are added manually to a module.
It echoes a previous comment but:
What if a repository is checked out to a commit presents in a different remote than the one provided by west list. What if the commit is in multiple remotes (and not necessarily in the one listed in west list) ? Which remote should I choose ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if a repository is checked out to a commit presents in a different remote than the one provided by west list.

That is why the build output meta appends -off to the commit when the sha used is different from the sha defined in the manifest.
https://docs.zephyrproject.org/latest/kconfig.html#CONFIG_BUILD_OUTPUT_META

Even with your proposal of relying on a single remote will provide wrong information in case a user does a local commit, because in such a case the sha is not available at the remote.

Also users can do a git fetch <url> followed by a git checkout FETCH_HEAD.
In such a case there may be a single remote, but a commit from different url is checked out locally.
Such workflow is very common for testing PRs or branches.

What if the commit is in multiple remotes (and not necessarily in the one listed in west list) ? Which remote should I choose ?

When west is used, then use the west remote if the sha is identical to the sha defined by the current manifest.
Because in that case we know the sha can be reached by that remote.
If it's not identical, then consider the sha -off. No reason to re-invent what CONFIG_BUILD_OUTPUT_META already provides. (Have you looked at this feature at all ?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will use the -off. Thanks.
So, if I understand correctly, even though there is multiple remotes, I will use the remote returned by west list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tejlmand I don't think there is any remote provided by west for the manifest repository.

If running west init -m https://github.com/zephyrproject-rtos/zephyr.git, west list does not provide any remote for the manifest. I didn't find anything in west code to have this information (command or function), you can only extract the manifest directory. The only solution I have is to run git remote in the manifest directory to get it.

So if I have two remotes (for example, zephyr remotes and the forked zephyr to add my feature), I don't have any clue on which remotes I should use.

What should I do in this case ? No remote ? The first one ?

Thanks.


revision, dirty = git_revision(path)
workspace_dirty |= dirty
remote, dirty = git_remote(path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that a revision can be dirty, but what does dirty mean in context of a remote ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have multiple remotes in one module, how you make sure the current commit is from one remote or an other ?

One solution could be to usegit branch -a --contains <commit id>, but I'm not sure it's reliable to determine if it's one commit is from a specific remote.
What happens if the commit is present in multiple remote ? Which one should I choose ?

So if I detect multiple remote, I mark the repository has "dirty" and no remote is selected.

How do you think we should handle this case (where remotes are added manually) ?

Copy link
Collaborator

@tejlmand tejlmand Apr 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have multiple remotes in one module, how you make sure the current commit is from one remote or an other ?

You can't. A given commit SHA is unique and can exists in multiple repositories, that's the nature of git.
But that doesn't make the remote "dirty".

In fact, you should not rely on the remote at all for this.
If you want to rely on something, then you should the remote from the west project, if the module is also a west project.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't.

I was thinking of a fork. So it's possible :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't.

I was thinking of a fork. So it's possible :)

Not necessarily, and especially not with github.

If forking a repo using github's fork feature, then you can actully fetch SHAs directly regardless of the fork to which it was pushed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I misread your first message. I completely agree with you.


.. code-block:: yaml

vulnerabilities:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like security_info more.
Broad enough to cover the vex info, while still specific enough to be useful for readers to understand the context.

@kartben
Copy link
Collaborator

kartben commented Apr 23, 2024

re: @tejlmand's comment below (which I can't reply to as it's "outdated)

I like security_info more.
Broad enough to cover the vex info, while still specific enough to be useful for readers to understand the context.

It's now security, actually, but it looks like the commit that introduced it hasn't been squashed properly so the initial name also shows up in the history.

@tgagneret-embedded
Copy link
Contributor Author

@tejlmand I don't think there is any remote provided by west for the manifest repository.

If running west init -m https://github.com/zephyrproject-rtos/zephyr.git, west list does not provide any remote for the >manifest. I didn't find anything in west code to have this information (command or function), you can only extract the >manifest directory. The only solution I have is to run git remote in the manifest directory to get it.

So if I have two remotes (for example, zephyr remotes and the forked zephyr to add my feature), I don't have any clue on >which remotes I should use.

What should I do in this case ? No remote ? The first one ?

Thanks.

On top of this question, should we use projects instead of modules in the following code (for the content of zephyr.spdx) ?

https://github.com/zephyrproject-rtos/zephyr/blob/b306e80e00d71196913228cc96cecb717d5357a1/scripts/west_commands/zspdx/walker.py#L292C44-L292C60

Maybe zephyr.spdx should contains only the modules. Thus should we make a new spdx for the projects ?

For example, when we use an external repository for the manifest, this manifest won't appear in any SPDX/SBOM. So where should it appears ? In a broader sense, if someone add new projects in manifest, it's not listed in any SPDX/SBOM.

By the way it seems that all modules are listed as a project. I don't know if it's possible to have a module not listed as a project ?

I think the creation of a new SPDX projects.spdx makes sense, however I'm not sure zephyr.spdx would still be useful.
Does anyone have a suggestion about this ?

Thanks

@tgagneret-embedded tgagneret-embedded force-pushed the feature/update_meta_file_with_additional_information branch 2 times, most recently from afc19fe to b145252 Compare May 15, 2024 11:14
@tgagneret-embedded
Copy link
Contributor Author

@tejlmand

  • I now support the "-off" flag when revision is not the same as the one present in the manifest.
  • I use the URL from the manifest if I have no remote, more than one or a different remote (and add the "-off" flag to the revision)
  • Modules now support the "-off" (Since modules provided by west does not provide the URL, I check if the module is not in the project list to get the required information)
  • I fixed the "normalization of the SPDX field" and move it to writer.py (instead of walker.py). I had to normalize the relationships too, so it's not very pretty in my opinion. It's in a separate commit (b145252) and I will drop it if you are not satisfied.

Improve the SPDX with the current values:
 - URL: extracted from `git remote`. If more than one remote, URL is not
 set.
 - Version: extracted from `git rev-parse` (commit id).
 - PURL and CPE for Zephyr: generated from URL and version.

For zephyr, the tag is extracted, if present, and replace the commit id for
the version field.
Since official modules does not have tags, tags are not yet extracted for
modules.

To track vulnerabilities from modules dependencies, a new SBOM,
`modules-deps.spdx` was created. It contains the `external-references`
provided by the modules. It allows to easily track vulnerabilities from
these external dependencies.

Signed-off-by: Thomas Gagneret <[email protected]>
Since `writer.py` is the one writting the SPDX file, it should normalize
the name field and not `walker.py` which generates the SBOM components.

Signed-off-by: Thomas Gagneret <[email protected]>
@tgagneret-embedded tgagneret-embedded force-pushed the feature/update_meta_file_with_additional_information branch from b145252 to 9799c08 Compare May 28, 2024 11:45
@tgagneret-embedded
Copy link
Contributor Author

Much better. Still a couple of observations.

As for the last two commits, can you please squash those into the appropriate parent commits ? When reviewing commit-by-commit, then it's not nice to suddenly see code that one comment suddenly being changed in later commit in same PR.

I fixed the "normalization of the SPDX field" and move it to writer.py (instead of walker.py). I had to normalize the
relationships too, so it's not very pretty in my opinion. It's in a separate commit (b145252) and I will drop it if you are not > satisfied.

@tejlmand rebase done except for the last commit (after rebase: 9799c08). Let me know what I should do with it.

@tgagneret-embedded
Copy link
Contributor Author

@tejlmand Can you tell me if there anything missing from my part for you to review this PR ? If so, let me know, I would be happy to fix it. This PR is opened for a long time now 😄.

Thanks

@kartben
Copy link
Collaborator

kartben commented Jun 14, 2024

ping @tejlmand

@kartben kartben added this to the v3.7.0 milestone Jun 14, 2024
@nashif nashif dismissed tejlmand’s stale review June 14, 2024 23:07

concerns addressed

@nashif nashif merged commit f5df063 into zephyrproject-rtos:main Jun 14, 2024
31 checks passed
@kartben
Copy link
Collaborator

kartben commented Jun 24, 2024

@tgagneret-embedded although it's getting late in the release cycle for doing this, would it make sense to submit updates to some of the manifests of the modules where this new feature would be useful (i.e CPE/PURLs actually exist)? mbedTLS certainly comes to mind. Thanks!

@tgagneret-embedded
Copy link
Contributor Author

I was thinking of this but I wasn't sure it's the best time. I'm afraid it would break some build. If we add CPE information in mbedTLS module for example and it's used with older version of Zephyr, an error will occurs when parsing the module.yml (because it will find unknown field).

I don't know how modules are handled, and if my use case is possible. If you think it's safe, I can update some module.yml.
What do you think ?

@tgagneret-embedded
Copy link
Contributor Author

I was thinking of this but I wasn't sure it's the best time. I'm afraid it would break some build. If we add CPE information in mbedTLS module for example and it's used with older version of Zephyr, an error will occurs when parsing the module.yml (because it will find unknown field).

I don't know how modules are handled, and if my use case is possible. If you think it's safe, I can update some module.yml. What do you think ?

@kartben

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants