I am trying to understand why Grype uses rpmSource \ source fields. seems like it duplicates some CVEs in some cases on same similar packages
Hi @TimBrown1611, thanks for the question!
In general, some vendors report vulnerabilities against a source RPM instead of against every individual RPM. For this reason, Grype considers source RPMs when matching. This type of match is called an “exact-indirect-match” in Grype’s JSON output, and an “Indirect match” in the output of grype explain. You can run grype explain on the IDs that surprise you, like this grype -o json my-image | grype explain --id CVE-1234-1234 - that will ask grype to tell you why it matched.
If you’d like to discuss this in more detail, could you share some particular examples? Thanks!
Hi, thanks for the answer!
I’m trying to scan some VMs, and sometimes I see the same CVEs appears in different packages. for example:
I have 1481 CVEs on “kernel” package, and 1481 on “kernel-core”, “kernel-modules”, and some more.
It causes lots of duplications, and I wonder if it is possible to show the CVEs only once on the main package “kernel”.
This sounds similar to the issue we add the match-upstream-kernel-headers option for. Is that option set to false in Grype config? You can check with grype config --load | grep match-upstream-kernel-headers.
Hi! I am using the default configuration.
Is it defined by default?
will it appear in the JSON with additional field? can you please elaborate more about this option? 
This config false by default (meaning packages that are kernel header files will not be matched against CVEs against the kernel itself) but I think since you’re scanning a VM, you’re finding actual kernel-modules and kernel-core, and not just header files whose source RPM is the kernel.
I think in this case Grype’s output is correct, it’s just really noisy because there are a lot of CVEs involving the kernel.
This is might a special case of making Grype’s output more focused. In particular, we’re starting to think about ways to make Grype’s output be grouped by CVE, or by package, or something. (Right now it’s grouped by “match,” that is, by each package<>CVE pair.)
Hi @willmurphy !
Thanks for the answer, it does make sense now.
I agree the results are noisy, I think package is a better way to group the results in this case, however in case of rpm maybe the group should be the rpmSource?
by the way,
i see sometimes a lot of noise caused by the logic of the direct \ indirect \ cpe matching.
I wonder if grype always needs to publish them all or to do some prioritize?
Can you help me understand what you mean?
It’s important to have all 3 types of matching, for different reasons:
- Direct is the best - obviously if a vulnerability is reported directly about some package, and we find that package in a vulnerable version, we need to report the vulnerability.
- Indirect matching is necessary because many Linux distros report vulnerabilities against upstream dpkgs or source RPMs. So if Grype finds an RPM or dpkg, and its metadata shows that it is downstream of some other RPM/dpkg, we need to look at vulns reported against the upstream package, because they might not be reported against the package we found, but only against the upstream package.
- CPE matching isn’t the best, but it’s the only way to match against the NVD database, and the NVD database is essentially a superset of all the others. So if there’s a package type where we don’t have any better data feed, it’s important to fall back to CPE matching.
When you say “noise,” do you mean excess log messages, or confusing docs, or what?
Recently i have started to scan some VMs (most of them of amazon) and I get lots of duplicates vulnerabilities from different packages. meaning, that from package A i get 1000 vulnerabilities (kernel-a) and then from package B I get the same amount and kind of vulnerabilities (kernel-b).
I want to find a way to focus the results and reduce the amount of duplicate CVEs, so I am trying to find any logic which will reduce it.