Any plans for AIBOM using Syft or Grype

anvitha_haviligi · August 26, 2025, 4:40am

Hi Team,

Any plans to support generate AIBOM using Syft or Grype

Regards,

Anvitha

spiffcs · August 26, 2025, 2:05pm

Hi Anvitha!

In short, Yes! We’re currently working on a spike that will try and catalog a few new package types so that they can be added into the SBOM when syft scans sources containing them.

Here is a high level of the formats of what we’re looking at to try and represent in the syft-json. More work is needed currently to figure out cyclone-dx and spdx representations.

GGUF (.gguf): Quantized LLMs (llama.cpp, ollama). Binary format with structured metadata + tensors.
Safetensors (.safetensors): HuggingFace default. Secure, zero-copy, JSON metadata + binary tensors.
OCI Image Source: Docker default, OCI images now store model information in the layers. images → model
ONNX (.onnx): Cross-framework standard. Graph + tensors, Protobuf-based. Metadata available

Are there specific artifact types your looking for support for? We’re happy to take any feedback right now while we gather information and look at what can/should be added

anvitha_haviligi · August 26, 2025, 3:12pm

Great to hear!! Thank you so much for the reply. I’m going through what components are used in my organization, I will post my requirements based on exploration. Could you please let me know when this will be released ?

spiffcs · August 27, 2025, 6:11pm

No timeline yet, but after today’s discussion we’ll probably post a few issues that outline the initial work we’re committing to delivering. When those go up I’ll think them here.

spiffcs · September 2, 2025, 6:57pm

Here is the issue that outlines our first pass at cataloging AI Models:

github.com/anchore/syft

feat: add support for cataloging GGUF models

opened 06:55PM - 02 Sep 25 UTC

spiffcs

enhancement

## Summary Add a new ai-artifact cataloger in Syft that detects and parses GGUF… files (.gguf). We want to do header-only metadata extraction (fast, no full weights load). Emit results in: - Syft JSON (native) using a new metadata type for GGUF - CycloneDX 1.6 (ML-BOM) as machine-learning-model components with basic properties. ### Goals / Scope Detect .gguf files from supported sources. This issue starts with Local FS & container filesystem. A second issue will focus on OCI media types and adding a new syft source to parse the docker layer API for efficient cataloging. ## Notes - Parse only the GGUF header (magic, version, KV count, KV table) to capture identity & key facts. - Create a new package type `model` and a new metadata type gguf-file-metadata. Emit Syft JSON package(s) with: - type: "model" - metadataType: "gguf-file-metadata" - metadata: minimal but stable fields (see below). Emit CycloneDX 1.6 with: - type: "machine-learning-model" - minimal modelCard.modelParameters and properties mapping (see below). - Zero network calls for local/container sources. We're also looking for a stable global identifiers across remotes. This will be obtained by taking a hash of the metadata extracted from the model. ## Examples ### Syft JSON example (native): ```json { "name": "Qwen3-Coder-30B-A3B-Instruct-IQ4_NL.gguf", "type": "ai-artifact", "foundBy": "ai-artifact-cataloger", "locations": [{"path": "/models/Qwen3-Coder-30B-A3B-Instruct-IQ4_NL.gguf"}], "licenses": [], "purl": "", "metadataType": "gguf-file-metadata", "metadata": { "ModelFormat": "gguf", "ModelName": "Qwen3-Coder-30B-A3B-Instruct", "ModelVersion": "unknown", "FileSize": 0, // best-effort if available from resolver "Hash": "", // leave blank unless already computed upstream "License": "apache-2.0", "GGUFVersion": 3, "Architecture": "qwen3moe", "Quantization": "IQ4_NL", "Parameters": 0, // if present in header "TensorCount": 579, // derived from header tensor entries "Header": { // raw KVs (namespaced) "general.architecture": "qwen3moe", "general.name": "Qwen3-Coder-30B-A3B-Instruct", "general.license": "apache-2.0", "general.quantized_by": "Unsloth" }, "TruncatedHeader": false } } ``` ### CycloneDX 1.6 (ML-BOM) mapping: component: - type = "machine-learning-model" - name = general.name || filename - version = header field if available (else "unknown") - modelCard.modelParameters (best-effort): - architectureFamily from general.architecture (map common values: llama/qwen/gemma → "transformer" family) - modelArchitecture freeform (e.g., "decoder-only", if inferable; else omit) properties: - org.syft/gguf/version = GGUFVersion - org.syft/gguf/quantization = Quantization - org.syft/gguf/architecture = Architecture - org.syft/gguf/tensorCount = TensorCount - org.syft/license = License (if present) Note: Keep CycloneDX output minimal & typed; avoid dumping the entire KV bag to properties. ### CLI UX Works out of the box for local files or hugging face URL: - `syft dir:./path/to/models -o json` - `go run cmd/syft/main.go -o json https://huggingface.co/janhq/Jan-v1-4B-GGUF/blob/main/Jan-v1-4B-Q4_K_M.gguf` Add --select-catalogers=ai-artifact to limit runs if needed (optional). ### Follow-ups - OCI Artifact (local | remote) - PURL strategy (e.g., pkg:huggingface/...) once we add remote/registry context. - Safetensors & ONNX parsers.

I’ll have another issue ready soon that has more information about how we’re approaching models stored in OCI artifacts

Topic		Replies	Views
Errors when using the syft image in a Cloud pipeline Syft	2	107	June 30, 2025
Does Syft automaticaly detects existing SBOM files? Syft	1	72	March 20, 2025
Does syft knows how to identify AI packages? Syft	3	53	January 20, 2025
Cyclonedx SBOM files do not pass cyclonedx cli validate command when SMAIL-GPL included as licenses Syft	4	240	September 5, 2025
Components missing from CycloneDX json format SBOM when generation is part of a test Syft	5	43	September 9, 2025

Any plans for AIBOM using Syft or Grype

Related topics