Version is parsed from Jar file name instead of Manifest file

santhosh · November 27, 2024, 5:24pm

As per the code, i understood that version is parsed from Manifest file and Implementation-Version parameter is available in the manifest file. Still, syft is trying to parse version from jar file name. Is this normal?

I have downloaded jar from here, renamed the jar file name.

santhosh · December 2, 2024, 9:16pm

@willmurphy Any idea about this?

willmurphy · December 3, 2024, 12:44pm

Hi @santhosh great question!

@kzantow can provide better details than I can, but in general, identifying JARs is challenging because Java doesn’t have a single, widely-used standard way of encoding metadata like package name and version into a JAR file.

Right now, Syft tries to get info about a JAR from the following places:

pom.properties
pom.xml
Various manifest keys
The filename

See syft/syft/pkg/cataloger/java/archive_parser.go at c3619422bb0b39149af2528ea3816abe83aa5995 · anchore/syft · GitHub

I think what you’re seeing on that JAR is that we’re falling back to the filename. Based on a little poking around, you might have found a bug where we’re skipping a manifest value we shouldn’t be.

@kzantow it seems incorrect that selectName checks the filepath before checking the manifest:

github.com

anchore/syft/blob/c3619422bb0b39149af2528ea3816abe83aa5995/syft/pkg/cataloger/java/parse_java_manifest.go#L191


      
          	// see https://developer.classpath.org/doc/java/lang/Character-source.html
          	// line 3295
          	r := runes[0]
          	return unicode.Is(unicode.Lu, r) ||
          		unicode.Is(unicode.Ll, r) || unicode.Is(unicode.Lt, r) ||
          		unicode.Is(unicode.Lm, r) || unicode.Is(unicode.Lo, r) ||
          		unicode.Is(unicode.Nl, r) ||
          		unicode.Is(unicode.Sc, r) || unicode.Is(unicode.Pc, r)
          }
          
          func selectName(manifest *pkg.JavaManifest, filenameObj archiveFilename) string {
          	name := extractNameFromApacheMavenBundlePlugin(manifest)
          	if name != "" {
          		return name
          	}
          
          	// the filename tends to be the next-best reference for the package name
          	name = extractNameFromArchiveFilename(filenameObj)
          	if name != "" {
          		return name
          	}

Am I missing something?

santhosh · December 3, 2024, 4:11pm

@willmurphy Thanks for your response. I did follow the code, I have extracted the jar file and verified that

No pom.properties file
No pom.xml file
Multiple keys are available in manifest file
Purl contains GroupId which is from manifest file. So, syft is parsing manifest file for sure.

Small correction, code you posted is about bug parsing name but I think bug is also in the selectVersion.

github.com

anchore/syft/blob/c3619422bb0b39149af2528ea3816abe83aa5995/syft/pkg/cataloger/java/parse_java_manifest.go#L227


      
          			return manifest.Main.MustGet("Extension-Name")
          		case manifest.Main.MustGet("Implementation-Title") != "":
          			// last ditch effort...
          			return manifest.Main.MustGet("Implementation-Title")
          		}
          	}
          	return ""
          }
          
          func selectVersion(manifest *pkg.JavaManifest, filenameObj archiveFilename) string {
          	if v := filenameObj.version; v != "" {
          		return v
          	}
          
          	if manifest == nil {
          		return ""
          	}
          
          	fieldNames := []string{
          		"Implementation-Version",
          		"Specification-Version",

So, I think you are right, parsing from file name should be last resort but looks like version is parsed from file name before checking manifest file.

I do think that it is a bug.

kzantow · December 4, 2024, 7:16pm

Syft attempts to get the identity of a JAR file in various ways, as @willmurphy noted above – the selectName is what you would want to look at.

When I’ve looked at this in the past I’ve definitely found it a bit convoluted and hard to follow. I wonder if it would be helpful to somehow write the identification rules in English or a decision tree, refine as much as necessary, and make sure that the behavior matches. An example is: if we find a pom.xml, what do we do? what about with or without a pom.properties or when we find multiple pom.xml files? IIRC, we don’t use information from a pom.xml if there’s no corresponding pom.properties. But is that right? I don’t really think so, but how do we describe exactly what to do across all the different JARs we find in the wild?

willmurphy · December 5, 2024, 2:25pm

It looks like we’ve already done that, see:

github.com

anchore/syft/blob/c3619422bb0b39149af2528ea3816abe83aa5995/syft/pkg/cataloger/java/archive_parser.go#L282


      
          			ArchiveDigests: digests,
          		},
          	}, nil
          }
          
          func (j *archiveParser) discoverNameVersionLicense(ctx context.Context, manifest *pkg.JavaManifest) (string, string, []pkg.License, error) {
          	// we use j.location because we want to associate the license declaration with where we discovered the contents in the manifest
          	// TODO: when we support locations of paths within archives we should start passing the specific manifest location object instead of the top jar
          	lics := pkg.NewLicensesFromLocation(j.location, selectLicenses(manifest)...)
          	/*
          		We should name and version from, in this order:
          		1. pom.properties if we find exactly 1
          		2. pom.xml if we find exactly 1
          		3. manifest
          		4. filename
          	*/
          	groupID, artifactID, version, parsedPom := j.discoverMainPackageFromPomInfo(ctx)
          	if artifactID == "" {
          		artifactID = selectName(manifest, j.fileInfo)
          	}
          	if version == "" {

But I think currently the code isn’t behaving in line with this comment.

willmurphy · December 11, 2024, 4:29pm

We have a specific test case that implementation title does not override filename:

github.com

anchore/syft/blob/445142886ee62beac34b16eee6fbd6dc89bb3fbd/syft/pkg/cataloger/java/parse_java_manifest_test.go#L218


      
          				Main: []pkg.KeyValue{
          					{
          						Key:   "Implementation-Title",
          						Value: "maven-wrapper",
          					},
          				},
          			},
          			expected: "maven-wrapper",
          		},
          		{
          			desc: "Implementation-Title does not override name from filename",
          			manifest: pkg.JavaManifest{
          				Main: []pkg.KeyValue{
          					{
          						Key:   "Name",
          						Value: "foo",
          					},
          					{
          						Key:   "Implementation-Title",
          						Value: "maven-wrapper",
          					},

I don’t understand why yet, but I will keep looking. @kzantow do you know why? This check seems backwards to me, but the test case has been in place for years.

westonsteimel · December 12, 2024, 9:20am

I know the manifest file is incredibly unreliable as there is no real consistency on how it is used between projects. We do have some specific logic for Apache Maven Bundle Plugins because there was a documented spec that those projects seemed to follow, but otherwise it can be very inconsistent and I suspect if you were to just change this behaviour now you’d introduce many false positives that were previously prevented

westonsteimel · December 12, 2024, 9:22am

At a minimum you’d need to re-run the testing against all of the latest maven central artifacts to see what gets better/worse. I think @Christopher_Phillips had some scripts for this once when we were changing some purl generation and package deduplication behaviour

Topic		Replies	Views
In syft generated sbom we see license as links https://www.apache.org/licenses/LICENSE-2.0.txt General	7	157	September 3, 2024
Syft - v1.11.0 released Announcements	1	10	January 16, 2025
Name vs Group in CycloneDX SBOMs Syft	2	57	May 30, 2024
Change in PURL since syft v.1.19.0 Syft	7	29	January 27, 2025
Syft own format, how to distinguish dependency vs devDependency for npm? Syft	1	45	October 2, 2024

Version is parsed from Jar file name instead of Manifest file

Related topics