I’m currently working on an architecture where we generate SBOMs using Syft from a large number of different systems, Docker images, and full operating systems. We may have hundreds of them, and we want to run scans regularly (for example, periodic vulnerability re-evaluation).
From both a performance and correctness perspective, what would be the recommended workflow for large-scale recurring scans?
At the moment, we store part of the SBOM content inside a database. By storing more information from each SBOM, we are considering reconstructing a “global” SBOM and scanning it once per day. The idea would be to reduce the number of scans by scanning only one SBOM for the entire system instead of scanning each individual system separately.
However, this raises several questions:
Does Grype rely strictly on a single distribution (distro) context per SBOM?
What happens if a reconstructed SBOM accidentally contains packages from different Linux distributions?
Would Grype misinterpret vulnerability matches if the distro metadata does not align with all OS packages included in the SBOM?
Is there any supported multi-distro SBOM model, or should each distro context always remain strictly isolated?
More generally, do you recommend a completely different approach for handling large numbers of SBOMs efficiently and correctly?
We talked about this on the livestream this week, there’s probably some information I left out, but to answer your specific questions:
Does Grype rely strictly on a single distribution (distro) context per SBOM?
Is there any supported multi-distro SBOM model, or should each distro context always remain strictly isolated?
Grype does not require a single distro – each package is able to specify the distro as part of the Package URL when Grype reads SBOMs (including PURL lists). Any package-level distro should override a global distro, but keep me honest and try it out. If you specify a --distro it may override anything found in the SBOM; I’m not even sure what behavior would be ideal in the various cases when multiple distros are specified across package, SBOM, and CLI.
What happens if a reconstructed SBOM accidentally contains packages from different Linux distributions?
Would Grype misinterpret vulnerability matches if the distro metadata does not align with all OS packages included in the SBOM?
Grype will not match well if packages are searched using incorrect distros. Versions differ across NVD, GHSA, and even across distros quite a bit. Always use the correct distro so the correct versions and version schemes are compared.
More generally, do you recommend a completely different approach for handling large numbers of SBOMs efficiently and correctly?
The livestream discussion has a lot more discussion here, but essentially there are 2 factors: performance and correctness. If you have lots of duplication in packages, combining and deduplicating them would be good for performance – and a simple way to do that is with PURL lists (a file with PURLs separated by newlines, which could easily be deduplicated using uniq or similar; Syft is able to convert SBOMs to purl-list) but PURLs are not as good as full SBOMs in all cases: some matching/exclusion logic relies on having related packages, which only exist in the full SBOM presently. I don’t think there is much other metadata used in matching today that is not captured in the PURLs, in the future there may be binary symbols or other metadata that cannot easily be expressed in a PURL, so using the full SBOM will always be better for accuracy.