July 10th | Open Source Gardening | Live Stream

:wave: Hello everyone!

We’re back with the Anchore Open Source team running a live stream to discuss issues, pull requests, and future roadmap planning in our SBOM and vulnerability tools.

:alarm_clock: Starts at 2025-07-10T19:00:00Z for about an hour.

Expect engineering and project management discussions, a bit of GitHub issue gardening on Syft, Grype, and the rest of the family.

Join us today for a relaxed, educational, and productive live stream.

Topics

Here are the notes from our recent Open Source community stream, where we tackled some interesting issues in Syft and Grype.

It was great to have Will, Chris, Alex, and Keith from the team on the stream. Here’s a summary of what we discussed.


Syft Issue #4031: Extract package-level usage from go binary for more accurate license information

The team kicked off with a complex Go licensing issue. The problem arises when a Go module contains sub-packages with different licenses. Syft currently catalogs Go dependencies at the module level, so it might only report the primary license of the module, potentially missing more restrictive (or permissive) licenses of specific packages used within the binary.

A key part of the discussion, led by Will Murphy, revolved around the terminology clash between Syft/SBOMs and the Go ecosystem. In Syft, a “package” is the discoverable unit of software, which for Go, is a “module”. However, Go modules themselves contain Go “packages”.

The team explored two main solutions:

  1. Start cataloging individual Go packages as distinct entries in the SBOM.
  2. Continue cataloging at the Go module level but attach all discovered licenses (from all sub-packages) to that single module entry.

Chris Phillips pointed out that vulnerability matching happens at the module level, which supports keeping the cataloging at that level. The team agreed that creating SBOM entries for every single Go package could lead to an “explosion” in the size of the SBOM, which isn’t ideal.

The root of the user’s issue is likely related to license policies. They may want to use a module that has a generally restrictive license (like AGPL) but contains some sub-packages with a more permissive license (like Apache 2.0). By only identifying the top-level license, their tooling might incorrectly block usage.

Outcome: The consensus was to avoid creating entries for every Go package. Instead, the team will propose enhancing the Go module artifact in Syft to include a list of all licenses found within its sub-packages. Will is going to follow up on the GitHub issue with this proposal.


Grype Issue #2707: Change behaviour in Grype

Next, we looked at an issue where a Grype update caused a significant change in vulnerability scan results.

Will Murphy identified the cause: a change in how Grype handles the distro field in Package URLs (purls). The user’s SBOM contained purls with distro=buster. Grype was taking this literally and searching for vulnerabilities for a distro named “Buster”, instead of resolving it to the official “Debian 10”. Since “Buster” isn’t a known distro ID in the Grype-DB, no matches were found.

The team agreed this was incorrect and that Grype should be smart enough to resolve common distro code names. This capability is data-driven and already exists within Grype’s underlying libraries.

Outcome: Will confirmed the intended behavior is to resolve distro code names. He left a comment on the issue to confirm this and removed the “needs-discussion” label.


Syft Issue #3955: Overriding / skipping of OS detection

This issue proposes an option to bypass OS detection in Syft. This is primarily for scenarios like scanning a source code repository where OS detection isn’t relevant and can produce an empty, unhelpful distro object in the SBOM.

The team discussed whether this was a hypothetical problem or if the current behavior was causing a real-world issue for the user.

Outcome: The team decided more information was needed to understand the user’s specific pain point. Keith Zantow is going to ask for clarification on the issue to understand what problem is caused by the current behavior.


Syft Issue #4005: merge file digests

The final topic was a big one: merging SBOMs and how to handle file digests. When Syft scans a file directly, it generates a SHA256 hash of the content to use as a unique ID. The user who filed the issue was concerned about the performance cost of this hashing.

Will Murphy and Alex Goodman discussed the philosophy here. The content hash is crucial because it makes the artifact ID “content addressable,” which is the only reliable way to know if the file you’re scanning today is the exact same one you scanned yesterday.

Keith Zantow suggested a potential optimization: if the file digest cataloger is enabled, Syft might already be hashing the file for other reasons, and that hash could be reused to avoid re-computing it.

Alex raised a broader point: Syft’s primary goal is to report what it finds during a scan. When we start merging data from other SBOMs (like file lists and digests), we are no longer strictly reporting on the scanned source material. While there is precedent for this with software packages, the bar is higher for files, which should represent direct evidence.

Outcome: This is a complex issue with philosophical implications for Syft’s behavior. The team agreed that the ability to merge SBOM data is powerful but needs to be handled carefully. The discussion will continue on the GitHub issue, with Keith taking the lead on summarizing the conversation.