Adding statistics on syft scan

I had an idea to add “Stats” attribute to the schema, which will includes like “metadata” on the scan. for example: adding how much time each cataloger took or how much time the indexing took.
I think this kind of information can make Syft tool better, since users could analyze better which tasks took longer, and tune it according to their needs (enable \ disable catalogers for example).
Do you think this kind of feature can be considered in the future? since it includes adding information to the final schema.

Hi @TimBrown1611 - this is an interesting suggestion!

Are there stats you’d be interested in besides how long each cataloger took to run? It might be interesting to just log, “RPM Cataloger finished in 37.32 seconds” or something after each cataloger finished.

As for putting the stats in the SBOM, I’m not sure they’d belong there. There are 2 reasons I feel this way: reproducible SBOMs, and keeping the information specific.

By “reproducible SBOM” I mean this: running Syft twice against the same artifact should produce the same document. This would let users do things like store a hash of the SBOM and use it to detect if the artifact had changed.

By “keeping the information specific,” I mean that an SBOM is really a manifest of what’s in an artifact, and stats about how the scanner performed aren’t really part of this information.

Does that make sense? Would having log lines that says how long each cataloger took as the different catalogers finish address your use case?

Hi!
It does make sense. However, analyzing logs sometimes can be less convenient (since you need to search for a specific pattern from lots of logs) & when you run the program with -vvv the performance is weaker.
Maybe adding this attribute only in debug mode?