Exploring VM Disk Image Scanning: sbom-vm Project

Exploring VM Disk Image Scanning: sbom-vm Project

Hi everyone!

Following up on our previous discussions about improvements to scanning whole machines, I wanted to share a project I’ve been working on this weekend that explores how we might generate SBOMs from VM disk images without booting them.

What is sbom-vm?

sbom-vm is an experimental project that demonstrates how we can leverage Linux utilities (qemu-nbd, mount, etc.) to safely mount VM disk images in read-only mode, allowing Syft to scan their contents without requiring the VM to be running.

Key Features:

  • Read-only mounting of VM disk images via qemu-nbd
  • Support for multiple formats (qcow2, vmdk, vhd, raw)
  • Automatic detection and mounting of common filesystems (NTFS, ext4, HFS+, APFS, ZFS)
  • Safe, non-destructive SBOM generation using Syft
  • Test image generation utility for development and testing

Project Goals

This project is primarily an exploration of techniques for building SBOMs from disk images from outside the running VM. The lessons learned could potentially inform improvements to Syft or additions to stereoscope for handling VM disk images natively.

For those who need to scan VM disk images now, this tool might serve as an interim solution until such functionality is potentially integrated into Syft/stereoscope directly.

Try It Out

The project is available on GitHub: sbom-vm

I’d love to hear your thoughts, experiences, and suggestions if you give it a try. What other use cases should we consider? What challenges have you faced when scanning VM disk images?

What’s Next?

This is very much a prototype to explore the space and start the conversation. I’m particularly interested in:

  1. Understanding different VM disk image scanning needs in the community
  2. Identifying common challenges and edge cases
  3. Exploring how this functionality might best fit into the broader Syft ecosystem

Looking forward to your feedback and discussions!

1 Like

@popey I have doubt. What if i do following?

  1. I have a qcow2 image.
  2. I am extracting to a directory using 7z tool.
  3. Generating SBOM using `syft <extracted_dir>

How is sbom-vm different from using syft on an extracted qcow2 image to a directory? Does sbom-vm avoids extraction? is that the goal? or is syft not accurate scanning an extracted qcow2 image?

Just trying to understand the difference. Sorry if I am wrong?

With Syft today, you need to execute a VM and run syft directly on the running instance… sbom-vm avoids the need to run the VM and allows you to scan the extracted contents more easily.

I wanted to point out one thing, as I’m not sure how sbom-vm works, maybe this is more for @Alan_Pope: Syft has a different set of catalogers it runs depending on what it scans. You can see the differences using syft cataloger list – the ones marked as image are run when you scan an image and directory are used when you scan a directory. There is some overlap, but these are not the same sets.

For example, the the current release has these, among many others:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ PACKAGE CATALOGER                      β”‚ TAGS                                                                             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ alpm-db-cataloger                      β”‚ alpm, archlinux, directory, image, installed, linux, os, package                 β”‚
..
β”‚ dart-pubspec-lock-cataloger            β”‚ dart, declared, directory, language, package                                     

… you will see alpm-db-cataloger runs for both, but dart-pubspec-lock-cataloger only runs for directory. Why is this? The reason is that we assume the most common thing you’re scanning when you scan a directory is a source repository checkout, so we look for lock files to indicate what you would get when you run an installation command. But an image isn’t necessarily going to run any commands, it probably already has, so we only check for things that indicate installed software. But as you might imagine, there is a lot of overlap.

The other issue when scanning a VM with syft in the manner I described, as a running instance, is that it defaults to the directory catalogers, which isn’t ideal for systems, does sbom-vm select the image catalogers? This would probably be the most accurate instead of an extracted directory scan.

1 Like

Just a quick follow-up, it looks like sbom-vm is, in fact, using image catalogers, which will give you the best results for scanning VMs :+1:

1 Like