Indirect matches (vulnerabilities affecting the upstream packages) are sometimes too broad

Hello,

I want to ask about upstream matches in Grype. Currently Grype results seem to
include all vulnerabilities affecting the upstream package. For example,
python-perf package with upstream of kernel includes all vulnerabilities
affecting the kernel, and php8.3-xml includes all vulnerabilities of php8.3:

grype 'pkg:rpm/centos/python-perf@3.10.0-1160.119.1.el7?arch=x86_64&distro=centos-7&upstream=kernel-3.10.0-1160.119.1.el7.src.rpm'
#  ✔ Scanned for vulnerabilities     [3639 vulnerability matches]
#    ├── by severity: 0 critical, 14 high, 2318 medium, 1294 low, 13 negligible
#    └── by status:   0 fixed, 3639 not-fixed, 0 ignored

grype 'pkg:deb/ubuntu/php8.3-xml@8.3.0-ubuntu0.24.04.5?distro=ubuntu-24.04&upstream=php8.3' -o json | grype explain --id CVE-2025-1219
#       Match explanation(s):
#           - ubuntu:distro:ubuntu:24.04:CVE-2025-1219 Indirect match; this CVE is reported against php8.3 (version 8.3.0-ubuntu0.24.04.5), the origin of this deb package.

Certainly python-perf doesn’t include all vulnerabilities from the kernel, and
php8.3-xml doesn’t include all vulnerabilities from php8.3.

Specifying upstream packages is necessary – Grype just needs it to work
correctly. But sometimes it produces many false positives and a ton of
vulnerable packages affected by the same CVE (every php8.3-anything would
share all vulnerabilities with the upstream php8.3 package).

So the question is, to you as devs and to the broader community, how do you deal
with that situation? We cannot turn upstream matching off, but keeping it on
produces FPs, maintaining a list of exceptions would be a Herculean task.

Thanks for your thoughts.

I certainly have to agree with your assessment that python-perf upstream of kernel is probably not going to result in an accurate set of vulnerabilities and similar for php8.3-xml. The challenge is that we do need to use upstream in some cases: openssl, for example, I believe is upstream of many libs which embed it and results in valid vulnerabilities.

I think we’re definitely open to improving this, I’m just not clear how yet. Without digging in too much, I could think of a couple possibilities that we could investigate:

  • having some sort of data set in GrypeDB that flags certain upstreams as too broad?
  • inspecting the type of the upstream if it’s a library like openssl, include it, but not if it’s distributed separately? I’m not entirely sure how to determine this…

Any change here will require some research to determine the impact to matching. Suggestions here are welcome, keeping in mind that many of the upstreams are important to use!

This seems like a good thing to bring up with the team during our livestream – I’ll try to remember to add it next week :+1:

I think the approach needs to be the opposite -

most of the time the upstream provides lots of FP. In some specific cases it does provide some relevant CVEs like openssh.

so I think the approach needs to be like in the CPE dictionary, white list when to use upsteam.

I disagree that matching by upstream is useless most of the time. If my understanding is correct, CVEs are published against upstream packages, but Linux distributions often rename or split upstream packages. So to correctly map CVEs to packages, Grype needs the upstream information.

So without upstream matches we would get only vulnerabilities from security trackers of distributions and their quality varies.

hi!

high percentage of the CVEs are coming from kernel or linux kernel which is not accurate. Basing the vulnerabilities on NVD results causing lots of noise and FP, which makes it difficult to focus on the relevant CVEs (from my experience)

in VM machines (like ec2) results were too noisy

By the way, it is possible to filter out all indirect matches (matches based on upstream) with a Grype config option. It isn’t documented very well (see the GitHub issue) but it is there for you.