Syft is not scanning jar files which are integrated to docker image

Observed syft provides jar name in sbom file but not the components included in the jar file, example: a.jar contains pom.xml ./META-INF/maven/com.inn.a.b/a-b-app/pom.xml, used below command to generate sbom

syft scan -o cyclonedx-json

Please let me know if i have to use any arguments to get all the components.

Note: I have tried manually untar of jar file and scanned, it is working but not in case of docker image

Hi @anvitha_haviligi thanks for the report. Can you provide a link to a public jar that has this problem? There’s a lot of variation in how JARs are packaged, so it’s important that we investigate one that you see the issue one. Thanks!

It sounds like maybe this is happening to a JAR you’re building internally?

Syft generally unpacks and finds JARs within JARs, so it would be helpful to know what’s unusual in this case.

@kzantow do you have any ideas what might be going on here?

I’m a little confused about the issue. I haven’t seen many differences scanning a JAR directly vs on the filesystem in an image. Are you able to provide a specific example showing this issue with a JAR you build yourself or something on Maven central?

The file name influences which pom files are matched to represent the JAR, is there some renaming happening between the image and direct scan?

Or, are you talking about a JAR with multiple pom.xml files? Today, Syft is attempting to determine a singular package that represents the JAR, even if there are multiple pom.xml files present. It is looking for nested JARs, but not multiple packages for the JAR directly. I think there are some cases where we should be surfacing more, for example with shading: a JAR could have shaded multiple other libraries in the same jar file. Unfortunately, this is a pretty sensitive area where changes can also easily result in incorrect results, so we will want to be careful about any behavior we change here, but it seems to me every pom.xml / pom.properties pair we find is potentially declaring something present in the JAR, either through shading or just copying. We should probably try to find a number of examples to validate this assertion, and then adjust the logic accordingly.

Hi @willmurphy & @kzantow , It is not a public image. It is a custom image where maven is used for building jar and it is added to Dockerfile, i see jar name is same in dockerfile also and it has only one pom.xml and it does not contain multiple jars.

Below is added in docker file

ADD ./$APP_NAME.tar $SERVICE_PATH

Thank you

Hi @anvitha_haviligi,

Are you adding a JAR with .tar file extension? Is the .tar full of JARs? I’m surprised to see .tar in your example Dockerfile, and trying to figure out what you mean.

I think you are asking for Syft to go into an image, and in the image find a tar, and in the tar find some jars? Is that right?

That’s exactly the problem: Syft does not extract tar files it finds within an image, but if you scan a tar file directly, it does scan the contents.

This has been requested for some time:

Maybe it’s time to add that?

@anvitha_haviligi would you mind confirming that you are hoping Syft will find a JAR inside tar inside an image? If so, we know what’s going on. If not, we might need to keep investigating :slight_smile: .

Hi @willmurphy , Thanks for noticing .tar. We are verifying dockerfiles which are provided by developers, i didnt notice .tar, may be im pointing to wrong dockerfile also, i will check again and get back on how they are integrating .jar files to the image.

Regards, Anvitha

Hi @willmurphy

I have tried creating image again with the dockerfile provided by dev team, instead of copying tar, i have copied jar to image and built the image, Tried scanning. Still i dont see packages mentioned in pom.xml inside the jar ./META-INF/maven/com.inn.r.management/r-management-app/pom.xml

Dockerfile:

COPY workflow-management-app.jar $SERVICE_PATH/workflow-management-app.jar

example : one of the package from pom.xml

Hi @kzantow , Syft expects along with jar pom.xml to be there in docker image ? or pom.xml insider jar is sufficient ? below is the structure of jar

β”œβ”€β”€ appconfigs
β”œβ”€β”€ certs
β”œβ”€β”€ com
β”‚ └── inn
β”‚ └── appconfiguration
β”‚ β”œβ”€β”€ AccessLogsConfiguration.class
β”‚ β”œβ”€β”€ AppRunner.class
β”‚ β”œβ”€β”€ CacheConfig.class
β”‚ β”œβ”€β”€ CacheEventLogger.class
β”‚ β”œβ”€β”€ ExecutorConfiguration.class
β”‚ β”œβ”€β”€ MetricsFilterConfig$1.class
β”‚ β”œβ”€β”€ MetricsFilterConfig.class
β”‚ β”œβ”€β”€ SecurityConfig.class
β”‚ β”œβ”€β”€ SwaggerConfiguration.class
β”‚ β”œβ”€β”€ TomcatConfiguration.class
β”‚ └── WebMvcConfiguration.class
β”œβ”€β”€ DbQuries
β”‚ └── DropConnector.sql
β”œβ”€β”€ META-INF
β”‚ β”œβ”€β”€ MANIFEST.MF
β”‚ └── maven
β”‚ └── com.inn.rcp.workflowmanagement
β”‚ └── rcp-workflowmanagement-app
β”‚ β”œβ”€β”€ pom.properties
β”‚ └── pom.xml
β”œβ”€β”€ mybatis
β”‚ └── mapper

└── Swagger
└── wf_Swagger_3.10.0.json

Syft scans .jar files it finds in the container (and .war, .ear, etc.). It finds all pom.xml files in each JAR it finds. It then has some logic to determine which is applicable, based on the filename.

If you scan the JAR file directly, does Syft find the package(s) you are expecting? If you copy that JAR file to the container, it should get found and cataloged, unless $SERVICE_PATH is pointing to something that Syft excludes like /dev. Or, if you have a Syft configuration that for some reason is disabling the Java Archive cataloging or otherwise using nonstandard behavior affecting this.

It looks like the path in the JAR is: META-INF/maven/com.inn.rcp.workflowmanagement/rcp-workflowmanagement-app which contains both pom.properties and pom.xml. This should definitely be supported, most JARs put maven files in similar locations.

Still i dont see packages mentioned in pom.xml inside the jar

Are you expecting to see the main package or the everything, including dependencies? Syft does not include dependencies from pom.xml files when they are inside an archive, since the archive would not be expected to contain all the packages. Sometimes it does through shading, in that case, all the pom.xml files should be included in the archive to indicate what is actually included, but as noted above, we are not currently cataloging every pom.xml, only the one we believe to be applicable to the JAR contents.

Location of SERVICE_PATH is /opt/rcp/workflow-mgmt

In the below pom.xml package called β€˜camunda’ is added, i expect this to be listed in SBOM but i see only main .jar in SBOM

one of the package from pom.xml

</dependency-->
                <!--dependency>
                        <groupId>org.camunda.bpm</groupId>
                        <artifactId>camunda-engine-plugin-spin</artifactId>
                        <version>7.21.0-alpha2</version>
                </dependency-->
                <!--dependency>

I expect components mentioned in pom.xml to be there in sbom file

Command used to generate sbom : syft scan --enrich all -o cyclonedx-json imagename | jq

I see only below related to jar in json, component names are not there

{
      "name": "syft:location:0:layerID",
      "value": "sha256:9ebe72691838e20f8886ff71e715d5654ebd35bf531f56e568199c1d223b922d"
    },
    {
      "name": "syft:location:0:path",
      "value": "/opt/rcp/workflow-management-app/workflow-management-app.jar"
    },
    {
      "name": "syft:metadata:-:artifactID",
      "value": "rcp-workflowmanagement-app"
    },
    {
      "name": "syft:metadata:-:groupID",
      "value": "com.inn.rcp.workflowmanagement"
    },
    {
      "name": "syft:metadata:virtualPath",
      "value": "/opt/rcp/workflow-management-app/workflow-management-app.jar:com.inn.rcp.workflowmanagement:rcp-workflowmanagement-app"
    }

Yes when i unzip jar file and scan with following command syft scan -o cyclonedx-json dir:dirname , it list downs components which are mentioned in pom.xml

Note: I have not changed any config file of syft, using default config values

I’m confused, it looks like you were copying the JAR file directly to the container? What happens when you scan the JAR file directly, without unzipping it?

There is a lot going on here, I just want to make sure we’re all on the same page:

  • Syft will not scan a .tar file in the container
  • Syft will scan a .jar file in the container
  • Syft has different behavior if you are scanning a directory vs a container
  • When Syft scans pom.xml files in JAR files, it does not includes dependencies, only the main package
  • When Syft scans pom.xml files in directories, it does include dependencies

When you scan an unzipped JAR file, you are scanning a directory. This looks for all pom.xml files and surfaces them, including dependencies, assuming you are performing a source code scan. In a container, Syft does not scan pom.xml files directly, only those found in .jar and similar files, and the pom.xml files in archives do not include dependencies.

Sorry for the confusion. Ok i got it only the main package will be displayed.

example:
<sonar.coverage.exclusions>src/main/java/com/inn/appconfiguration/*.java</sonar.coverage.exclusions>
<jacoco.outputDir>${project.basedir}/../target/site/jacoco</jacoco.outputDir>
<sonar.coverage.jacoco.xmlReportPaths>${project.basedir}/target/site/jacoco/jacoco.xml</sonar.coverage.jacoco.xmlReportPaths>
<startup.class>com.inn.AppRunner</startup.class>
<output.dir>target/docker</output.dir>
<docker.tag>v1.8.0</docker.tag>

Anything i can configure to get dependencies also from the jar which is in docker image ? OR only option is scanning source code ?

There is currently no option to include dependencies from pom files within JARs, you will need to scan the extracted files. Are you creating an uber-jar with all the classes combined together? I don’t think we will want to include dependencies in the manner you are asking about, as these would not typically be present in the JAR file, so surfacing them would be misleading, unless there are multiple pom.xml files, in which case we wouldn’t surface dependencies but rather the packages described by each pom.

1 Like