Revisiting ownership-by-file-overlap relationships

kzantow · September 16, 2024, 7:05pm

Today, in Syft we have a number of catalogers that end up having overlap in the packages they surface and the files they read. An example of this is: a user installs curl with a package manager and the RPM cataloger finds this package, but the binary cataloger also finds the curl binary and creates a second package of a different type. When this happens, Syft by default will create an ownersip-by-file-overlap relationship between these packages, and a post-cataloging process by default will remove the secondary packages based on this relationship, so users don’t typically see the binary version but only the more correct package-manager version.

As far as I can tell, this deduplication is useful in order to:

provide the best SBOM to the end user, and by association:
avoid vulnerability false positives due to more accurate information from package manager vulnerability feeds and the versions used

Beyond these purposes, I’m having a hard time finding the value of the relationships. Maybe Syft has an option to disable the deduplication if a user really wants that, but why not just have a specific function to do so without adding relationships only to later remove them?

We really want Grype to only match on the package manager versions, but there’s already a custom function to filter these out, but this only works because Grype knows which feeds provide better results and making part of that determination in Syft seems like the wrong place to do it.

So, this is all pretty long-winded way of asking: should we get rid of the ownership-by-file-overlap relationship altogether and simply have deduplication functionality that operates without these relationships?

willmurphy · November 26, 2024, 10:05pm

As an aside, it looks like from Exclude-binary-overlap-by-ownership flag is not working some users expect that the exclude binary files by ownership overlap will already remove other packages.

I think the concern here is that if I’m on a distro that only reports vulns after they’re patched, and I install an OS package that brings with it a Python package, then GHSA on the Python package is the best source of vulnerability data I have until the distro releases a patch or changes their reporting policy.

Topic		Replies	Views
Exclude-binary-overlap-by-ownership flag is not working General discuss	16	104	May 14, 2025
Question about relationship and the impact on grype results Grype	4	39	November 27, 2024
Why syft version 1.20 is now listing files in the SBOM as default? General	3	83	February 25, 2025
Reducing number of duplications from VM \ kernel scan Grype	0	12	February 5, 2025
Does Syft automaticaly detects existing SBOM files? Syft	1	29	March 20, 2025

Revisiting ownership-by-file-overlap relationships

Related topics