Version is parsed from Jar file name instead of Manifest file

As per the code, i understood that version is parsed from Manifest file and Implementation-Version parameter is available in the manifest file. Still, syft is trying to parse version from jar file name. Is this normal?

I have downloaded jar from here, renamed the jar file name.

@willmurphy Any idea about this?

Hi @santhosh great question!

@kzantow can provide better details than I can, but in general, identifying JARs is challenging because Java doesn’t have a single, widely-used standard way of encoding metadata like package name and version into a JAR file.

Right now, Syft tries to get info about a JAR from the following places:

  1. pom.properties
  2. pom.xml
  3. Various manifest keys
  4. The filename

See syft/syft/pkg/cataloger/java/archive_parser.go at c3619422bb0b39149af2528ea3816abe83aa5995 · anchore/syft · GitHub

I think what you’re seeing on that JAR is that we’re falling back to the filename. Based on a little poking around, you might have found a bug where we’re skipping a manifest value we shouldn’t be.

@kzantow it seems incorrect that selectName checks the filepath before checking the manifest:

Am I missing something?

@willmurphy Thanks for your response. I did follow the code, I have extracted the jar file and verified that

  1. No pom.properties file
  2. No pom.xml file
  3. Multiple keys are available in manifest file
  4. Purl contains GroupId which is from manifest file. So, syft is parsing manifest file for sure.

Small correction, code you posted is about bug parsing name but I think bug is also in the selectVersion.

So, I think you are right, parsing from file name should be last resort but looks like version is parsed from file name before checking manifest file.

I do think that it is a bug.

Syft attempts to get the identity of a JAR file in various ways, as @willmurphy noted above – the selectName is what you would want to look at.

When I’ve looked at this in the past I’ve definitely found it a bit convoluted and hard to follow. I wonder if it would be helpful to somehow write the identification rules in English or a decision tree, refine as much as necessary, and make sure that the behavior matches. An example is: if we find a pom.xml, what do we do? what about with or without a pom.properties or when we find multiple pom.xml files? IIRC, we don’t use information from a pom.xml if there’s no corresponding pom.properties. But is that right? I don’t really think so, but how do we describe exactly what to do across all the different JARs we find in the wild?

It looks like we’ve already done that, see:

But I think currently the code isn’t behaving in line with this comment.

1 Like