Hi Team,
Any plans to support generate AIBOM using Syft or Grype
Regards,
Anvitha
1 Like
Hi Anvitha!
In short, Yes! We’re currently working on a spike that will try and catalog a few new package types so that they can be added into the SBOM when syft scans sources containing them.
Here is a high level of the formats of what we’re looking at to try and represent in the syft-json. More work is needed currently to figure out cyclone-dx and spdx representations.
GGUF (.gguf) : Quantized LLMs (llama.cpp, ollama). Binary format with structured metadata + tensors.
Safetensors (.safetensors) : HuggingFace default. Secure, zero-copy, JSON metadata + binary tensors.
OCI Image Source : Docker default, OCI images now store model information in the layers. images → model
ONNX (.onnx) : Cross-framework standard. Graph + tensors, Protobuf-based. Metadata available
Are there specific artifact types your looking for support for? We’re happy to take any feedback right now while we gather information and look at what can/should be added
Great to hear!! Thank you so much for the reply. I’m going through what components are used in my organization, I will post my requirements based on exploration. Could you please let me know when this will be released ?
1 Like
No timeline yet, but after today’s discussion we’ll probably post a few issues that outline the initial work we’re committing to delivering. When those go up I’ll think them here.
1 Like
spiffcs
September 2, 2025, 6:57pm
5
Here is the issue that outlines our first pass at cataloging AI Models:
opened 06:55PM - 02 Sep 25 UTC
enhancement
## Summary
Add a new ai-artifact cataloger in Syft that detects and parses GGUF… files (.gguf). We want to do header-only metadata extraction (fast, no full weights load). Emit results in:
- Syft JSON (native) using a new metadata type for GGUF
- CycloneDX 1.6 (ML-BOM) as machine-learning-model components with basic properties.
### Goals / Scope
Detect .gguf files from supported sources. This issue starts with Local FS & container filesystem. A second issue will focus on OCI media types and adding a new syft source to parse the docker layer API for efficient cataloging.
## Notes
- Parse only the GGUF header (magic, version, KV count, KV table) to capture identity & key facts.
- Create a new package type `model` and a new metadata type gguf-file-metadata.
Emit Syft JSON package(s) with:
- type: "model"
- metadataType: "gguf-file-metadata"
- metadata: minimal but stable fields (see below).
Emit CycloneDX 1.6 with:
- type: "machine-learning-model"
- minimal modelCard.modelParameters and properties mapping (see below).
- Zero network calls for local/container sources.
We're also looking for a stable global identifiers across remotes. This will be obtained by taking a hash of the metadata extracted from the model.
## Examples
### Syft JSON example (native):
```json
{
"name": "Qwen3-Coder-30B-A3B-Instruct-IQ4_NL.gguf",
"type": "ai-artifact",
"foundBy": "ai-artifact-cataloger",
"locations": [{"path": "/models/Qwen3-Coder-30B-A3B-Instruct-IQ4_NL.gguf"}],
"licenses": [],
"purl": "",
"metadataType": "gguf-file-metadata",
"metadata": {
"ModelFormat": "gguf",
"ModelName": "Qwen3-Coder-30B-A3B-Instruct",
"ModelVersion": "unknown",
"FileSize": 0, // best-effort if available from resolver
"Hash": "", // leave blank unless already computed upstream
"License": "apache-2.0",
"GGUFVersion": 3,
"Architecture": "qwen3moe",
"Quantization": "IQ4_NL",
"Parameters": 0, // if present in header
"TensorCount": 579, // derived from header tensor entries
"Header": { // raw KVs (namespaced)
"general.architecture": "qwen3moe",
"general.name": "Qwen3-Coder-30B-A3B-Instruct",
"general.license": "apache-2.0",
"general.quantized_by": "Unsloth"
},
"TruncatedHeader": false
}
}
```
### CycloneDX 1.6 (ML-BOM) mapping:
component:
- type = "machine-learning-model"
- name = general.name || filename
- version = header field if available (else "unknown")
- modelCard.modelParameters (best-effort):
- architectureFamily from general.architecture (map common values: llama/qwen/gemma → "transformer" family)
- modelArchitecture freeform (e.g., "decoder-only", if inferable; else omit)
properties:
- org.syft/gguf/version = GGUFVersion
- org.syft/gguf/quantization = Quantization
- org.syft/gguf/architecture = Architecture
- org.syft/gguf/tensorCount = TensorCount
- org.syft/license = License (if present)
Note: Keep CycloneDX output minimal & typed; avoid dumping the entire KV bag to properties.
### CLI UX
Works out of the box for local files or hugging face URL:
- `syft dir:./path/to/models -o json`
- `go run cmd/syft/main.go -o json https://huggingface.co/janhq/Jan-v1-4B-GGUF/blob/main/Jan-v1-4B-Q4_K_M.gguf`
Add --select-catalogers=ai-artifact to limit runs if needed (optional).
### Follow-ups
- OCI Artifact (local | remote)
- PURL strategy (e.g., pkg:huggingface/...) once we add remote/registry context.
- Safetensors & ONNX parsers.
I’ll have another issue ready soon that has more information about how we’re approaching models stored in OCI artifacts