Hello!
I’m scanning some images, and since the SBOMs might be very heavy for my machine, sometime i try to remove some files from the SBOM (not from the artifacts field, only files). I’m doing it using a script after the SBOM is genarated.
In some cases, after I scan it with grype i receive warnings:
[0000] WARN "relationship mapping to key e5e73179f615e551 is not a valid artifact.Identifiable type: <nil>" occurred 1 time(s)
[0000] WARN "relationship mapping to key 1f8c328c4e610285 is not a valid artifact.Identifiable type: <nil>" occurred 1 time(s)
my question is - does this mapping can impact the results of grype (number of CVEs, etc)?
the error itself comes from syft’s function named “Decode”.
It sounds like you’ve removed some files that are referenced in the SBOM. Depending on which files are removed and which distro you’re on, this could reintroduce some false positives in the following way:
Grype detects that an OS package (rpm, dpkg, apk, etc) “owns” a language package by detecting that the files owned by the OS package overlap with the files that provide evidence of the language package. (For example, if you yum install python3-urllib Syft will find an RPM and a PyPI package).
For some distros, Grype will remove the language package in favor of the OS package in this situation, because the distro feed for the OS package may know about a back-ported fix that the GHSA data about PyPI packages doesn’t have. (For example, Red Hat might have backported a fix to the python3-urllib RPM that’s not present in the PyPI package even at the same version.)
If you removed the files that were part of that overlap, Grype will not be able to remove the PyPI package in favor of the OS package, and you could get false positives if the distro has back-ported fixes that aren’t present in GHSA.
Which files do you want to include? I think by default, Syft includes only files which are referenced by packages and includes both SHA-1 and SHA-256 hashes of these files, but does not include other files present in the scan source. Wouldn’t this be what you want instead of telling Syft to include all file hashes?
I’m doing some manipulation on the SBOM after it is being generated, I want to take some metadata on some files, so i try to remove other files which are not always relevant to reduce the size of the SBOM, so generally I want to get all files and after that filter them. I wonder if there is any field which can indicate which file is connected to a package so I will not remove it.