Grype scans in batches large SBOMs

I’ve raised a feature request which I think worth discussing:

I think in cases of large SBOMs with lots of CVEs we may consider a way to scan it in batches and save it into the disk, so we will not need a lot of memory.

what do you think?

Is there any indication what is using memory? You can run Grype in profiling mode using the GRYPE_DEV_PROFILE=mem env var, I think, and get a pprof file that will provide some insight about where allocations happen, etc… I don’t really see a lot in Grype that should inherently be using large amounts of memory if it’s reading an SBOM and not performing a Syft scan.

I’ll try and send you the relevant files.
I saw this issue - Command terminated by signal 9 due to OOM (Out of Memory) · Issue #1509 · anchore/grype · GitHub
and it seems related (but it closed)
can you please re-open it? I think I’m having a similar issue when using grype in a task

i’ve tried to run grype on an sbom (which i can’t share) on an ec2 (with the profile configuration):
we scan sbom file
command:
grype sbom:sbom.json -c config.yaml -o json

1st time:
16GB 4CPU → exit code 137

2nd time:
32GB 8CPU → success

here is the file.

BTW the env flag doesn’t work, only when I put it inside the config file.

more information:
image

the grype result size is 1.8gb
the sbom size is 140mb~

can’t share the sbom, but i wonder if a can filter in runtime the results (maybe only high \ critical, maybe can remove duplicate CVEs)

this is example of top packages and number of vulnerabilities:
3971 linux-modules-5.3.0-1017-aws
3971 linux-image-5.3.0-1017-aws
3971 linux-aws-5.3-headers-5.3.0-1017
3960 linux-modules-5.3.0-1019-aws
3960 linux-image-5.3.0-1019-aws
3960 linux-aws-5.3-headers-5.3.0-1019
3956 linux-modules-5.3.0-1028-aws
3956 linux-modules-5.3.0-1023-aws
3956 linux-image-5.3.0-1028-aws
3956 linux-image-5.3.0-1023-aws

I’m afraid that even after this pr will be merged - fix upstream match for linux-.*-headers-.* by barnuri · Pull Request #2320 · anchore/grype · GitHub

the file will be big, since we include the ignored matches.
so I have few suggestions here -

  1. add the option to filter out ignored matches to reduce the file size
  2. don’t include the ignore vulnerabilities in the memory, since it can be thousands of results.
  3. adding complex ignores so we will not need to merge to grype (and each user can make his own ignores)
  4. filter out vulnerabilities by severity (in runtime)

According to the community chat,
I’ve added the file here - scan a large sbom in batches · Issue #2357 · anchore/grype · GitHub

@wagoodman @kzantow

Thanks @TimBrown1611!

A quick look at this shows that… the Grype scan uses approximately 1GB and the JSON writing is using over 3 GB!

I don’t think batching the results will help here, since they all need to be aggregated to the final JSON report.

… and if we look, it’s using almost 1 GB just for indenting?

First suggestion is to just disable pretty JSON somewhere… I thought that was the default: compact JSON, but alas it is not. It would be interesting to see if disabling indent helps this noticeably by updating this to:

enc.SetIndent("", "")

thanks for your response @kzantow !
is it configurable?

moreover, please notice the diffs between the sbom and grype results size, in a quick look, it contains lots of duplicates CVEs which in my opinion can be removed.

It is not currently configurable. We made this the default and configurable in Syft, but it looks like not in Grype. Someone will need to port that change over to Grype.

There are lots of things we can do to improve memory usage and other efficiencies like deduplicating results, but it’s important to understand where the problems are first so we can spend our time adjusting the tools as appropriate.

ok, I can try and open a PR in grype if a similar behave is already implemented in syft :slight_smile:

FYI I’m referring to the “pretty” option: syft/cmd/syft/internal/options/format.go at main · anchore/syft · GitHub

hi!
please look at -

it will be really helpful

Did you run Grype with this change and profiling enabled to get a pprof file, to see if it actually uses less memory?

couldn’t check the memory, but can say the file size was reduced by 30%

What were the original and new file sizes? Was this 1.8gb → 1.26gb? That seems good, though doesn’t align with the memory usage directly. Getting the memory profile will help understand if that change did any good.

hi @kzantow !

added to the github issue 2 files, one with the fix and one without.
the final size of output was reduced 15%.

please let me know if any other details are needed :slight_smile:

It looks like it was a red herring – it does hot appear to have helped overall memory usage:


If I’m reading this right: json.Present is still using 4 GB.

But, printing JSON in compact form is still a good change to reduce file sizes, I’ll leave more feedback on the PR.

ok, sounds good.
however, I still facing the same issues. Unfortunately I can’t share the file results of grype & SBOM, but if you scan any default AWS machine you will see the duplicates CVEs which makes the results very large (and in my opinion - without any additional value).
So I would like to think about other solution to reduce the memory :slight_smile: