OWASP-AIBOM-Generator

Sleeping

App Files Files Community

OWASP-AIBOM-Generator / docs /AI_SBOM_API_doc.md

a1c00l

Upload 3 files

550ed76 verified 8 months ago

preview code

raw

history blame contribute delete

15.8 kB

	# AI SBOM Generator API Documentation

	## Overview

	The AI SBOM Generator API provides a comprehensive solution for generating CycloneDX-compliant AI Software Bill of Materials (AI SBOM) for Hugging Face models. This API uses a configurable field registry system to extract and score AI SBOM fields across 5 categories, providing detailed completeness assessment and standards compliance.

	---

	## Table of Contents
	- [Base URL](#base-url)
	- [API Endpoints](#api-endpoints)
	- [API Status](#api-status)
	- [Registry Status](#registry-status)
	- [Generate AI SBOM](#generate-ai-sbom)
	- [Generate AI SBOM with Completeness Score Report](#generate-ai-sbom-with-completeness-score-report)
	- [Get Completeness Score Only](#get-completeness-score-only)
	- [Download Generated AI SBOM](#download-generated-ai-sbom)
	- [Form-Based Generation (Web UI)](#form-based-generation-web-ui)
	- [Web UI](#web-ui)
	- [Security Features](#security-features)
	- [Field Registry System](#field-registry-system)
	- [Completeness Score](#completeness-score)
	- [Notes on Using the API](#notes-on-using-the-api)
	- [Error Handling](#error-handling)

	---

	## Base URL

	When deployed on Hugging Face Spaces, the base URL will be:
	```
	https://aetheris-ai-aibom-generator.hf.space
	```

	Replace this with your actual deployment URL.

	---

	## API Endpoints

	### API Status

	Purpose: Check if the API is operational and get version information.

	Endpoint: `/status`

	Method: GET

	cURL Example:
	```bash
	curl -X GET "https://aetheris-ai-aibom-generator.hf.space/status"
	```

	Expected Response:
	```json
	{
	"status": "operational",
	"version": "1.0.0",
	"generator_version": "1.0.0"
	}
	```

	---

	### Registry Status

	Purpose: Check the field registry configuration status and available fields.

	Endpoint: `/api/registry/status`

	Method: GET

	cURL Example:
	```bash
	curl -X GET "https://aetheris-ai-aibom-generator.hf.space/api/registry/status"
	```

	Expected Response:
	```json
	{
	"registry_available": true,
	"total_fields": 29,
	"categories": [
	"required_fields",
	"metadata",
	"component_basic",
	"component_model_card",
	"external_references"
	],
	"field_count_by_category": {
	"required_fields": 4,
	"metadata": 5,
	"component_basic": 5,
	"component_model_card": 14,
	"external_references": 1
	},
	"registry_manager_loaded": true
	}
	```

	---

	### Generate AI SBOM

	Purpose: Generate an AI SBOM for a specified Hugging Face model.

	Endpoint: `/api/generate`

	Method: POST

	Parameters:
	- `model_id` (required): The Hugging Face model ID (e.g., 'meta-llama/Llama-2-7b-chat-hf')
	- `include_inference` (optional): Whether to use AI inference to enhance the AI SBOM (default: true)
	- `use_best_practices` (optional): Whether to use industry best practices for scoring (default: true)
	- `hf_token` (optional): Hugging Face API token for accessing private models

	cURL Example:
	```bash
	curl -X POST "https://aetheris-ai-aibom-generator.hf.space/api/generate" \
	-H "Content-Type: application/json" \
	-d '{
	"model_id": "deepseek-ai/DeepSeek-R1",
	"include_inference": true,
	"use_best_practices": true
	}'
	```

	Expected Response: JSON containing the generated AI SBOM, model ID, timestamp, and download URL.
	```json
	{
	"aibom": {
	"bomFormat": "CycloneDX",
	"specVersion": "1.6",
	"serialNumber": "urn:uuid:deepseek-ai-DeepSeek-R1",
	"version": "1.0.0",
	"metadata": {
	"timestamp": "2025-07-15T18:31:18Z",
	"tools": [
	{
	"vendor": "Aetheris AI",
	"name": "AI SBOM Generator",
	"version": "1.0.0"
	}
	],
	"properties": [
	{
	"name": "primaryPurpose",
	"value": "text-generation"
	},
	{
	"name": "suppliedBy",
	"value": "deepseek-ai"
	}
	]
	},
	"components": [
	{
	"type": "machine-learning-model",
	"name": "DeepSeek-R1",
	"purl": "pkg:huggingface/deepseek-ai/DeepSeek-R1",
	"description": "Advanced reasoning model with enhanced capabilities",
	"licenses": [
	{
	"license": {
	"name": "DeepSeek License"
	}
	}
	],
	"modelCard": {
	"limitation": "Model may have limitations in certain domains"
	}
	}
	],
	"externalReferences": [
	{
	"type": "distribution",
	"url": "https://huggingface.co/deepseek-ai/DeepSeek-R1"
	}
	]
	},
	"model_id": "deepseek-ai/DeepSeek-R1",
	"generated_at": "2025-07-15T18:31:18Z",
	"request_id": "550e8400-e29b-41d4-a716-446655440000",
	"download_url": "/output/deepseek-ai_DeepSeek-R1_ai_sbom.json"
	}
	```

	---

	### Generate AI SBOM with Completeness Score Report

	Purpose: Generate an AI SBOM along with a detailed completeness score report.

	Endpoint: `/api/generate-with-report`

	Method: POST

	Parameters: Same as Generate AI SBOM

	cURL Example:
	```bash
	curl -X POST "https://aetheris-ai-aibom-generator.hf.space/api/generate-with-report" \
	-H "Content-Type: application/json" \
	-d '{
	"model_id": "deepseek-ai/DeepSeek-R1",
	"include_inference": true,
	"use_best_practices": true
	}'
	```

	Expected Response: Same as Generate AI SBOM plus completeness score details.
	```json
	{
	"aibom": { ... },
	"model_id": "deepseek-ai/DeepSeek-R1",
	"generated_at": "2025-07-15T18:31:18Z",
	"request_id": "550e8400-e29b-41d4-a716-446655440000",
	"download_url": "/output/deepseek-ai_DeepSeek-R1_ai_sbom.json",
	"completeness_score": {
	"total_score": 62.3,
	"section_scores": {
	"required_fields": 20.0,
	"metadata": 8.0,
	"component_basic": 20.0,
	"component_model_card": 4.3,
	"external_references": 10.0
	},
	"max_scores": {
	"required_fields": 20,
	"metadata": 20,
	"component_basic": 20,
	"component_model_card": 30,
	"external_references": 10
	},
	"field_checklist": {
	"bomFormat": "present",
	"specVersion": "present",
	"serialNumber": "present",
	"version": "present",
	"primaryPurpose": "present",
	"suppliedBy": "present",
	"standardCompliance": "missing",
	"domain": "missing",
	"autonomyType": "missing",
	"name": "present",
	"type": "present",
	"purl": "present",
	"description": "present",
	"licenses": "present",
	"energyConsumption": "missing",
	"hyperparameter": "missing",
	"limitation": "present",
	"safetyRiskAssessment": "missing",
	"typeOfModel": "present",
	"modelExplainability": "missing",
	"energyQuantity": "missing",
	"energyUnit": "missing",
	"informationAboutTraining": "missing",
	"informationAboutApplication": "missing",
	"metric": "missing",
	"metricDecisionThreshold": "missing",
	"modelDataPreprocessing": "missing",
	"useSensitivePersonalInformation": "missing",
	"downloadLocation": "present"
	},
	"category_details": {
	"required_fields": {
	"present_fields": 4,
	"total_fields": 4,
	"percentage": 100.0
	},
	"metadata": {
	"present_fields": 2,
	"total_fields": 5,
	"percentage": 40.0
	},
	"component_basic": {
	"present_fields": 5,
	"total_fields": 5,
	"percentage": 100.0
	},
	"component_model_card": {
	"present_fields": 2,
	"total_fields": 14,
	"percentage": 14.3
	},
	"external_references": {
	"present_fields": 1,
	"total_fields": 1,
	"percentage": 100.0
	}
	}
	}
	}
	```

	---

	### Get Completeness Score Only

	Purpose: Get the completeness score for a model without generating a full AI SBOM.

	Endpoint: `/api/models/{model_id}/score`

	Method: GET

	Parameters:
	- `model_id` (path parameter): The Hugging Face model ID
	- `use_best_practices` (query parameter, optional): Whether to use industry best practices for scoring (default: true)
	- `hf_token` (query parameter, optional): Hugging Face API token for accessing private models

	cURL Example:
	```bash
	curl -X GET "https://aetheris-ai-aibom-generator.hf.space/api/models/deepseek-ai/DeepSeek-R1/score?use_best_practices=true"
	```

	Expected Response:
	```json
	{
	"model_id": "deepseek-ai/DeepSeek-R1",
	"total_score": 62.3,
	"section_scores": {
	"required_fields": 20.0,
	"metadata": 8.0,
	"component_basic": 20.0,
	"component_model_card": 4.3,
	"external_references": 10.0
	},
	"max_scores": {
	"required_fields": 20,
	"metadata": 20,
	"component_basic": 20,
	"component_model_card": 30,
	"external_references": 10
	},
	"field_checklist": {
	"bomFormat": "present",
	"specVersion": "present",
	"name": "present",
	"downloadLocation": "present"
	},
	"generated_at": "2025-07-15T18:31:18Z",
	"request_id": "550e8400-e29b-41d4-a716-446655440000"
	}
	```

	---

	### Download Generated AI SBOM

	Purpose: Download a previously generated AI SBOM file.

	Endpoint: `/output/{filename}`

	Method: GET

	cURL Example:
	```bash
	curl -X GET "https://aetheris-ai-aibom-generator.hf.space/output/deepseek-ai_DeepSeek-R1_ai_sbom.json" \
	-o "deepseek_r1_aibom.json"
	```

	---

	### Form-Based Generation (Web UI)

	Purpose: Generate AI SBOM through the web interface form submission.

	Endpoint: `/generate`

	Method: POST

	Content-Type: `application/x-www-form-urlencoded`

	Parameters:
	- `model_id` (required): The Hugging Face model ID
	- `g-recaptcha-response` (required): reCAPTCHA response token

	cURL Example:
	```bash
	curl -X POST "https://aetheris-ai-aibom-generator.hf.space/generate" \
	-H "Content-Type: application/x-www-form-urlencoded" \
	-d "model_id=deepseek-ai/DeepSeek-R1&g-recaptcha-response=YOUR_RECAPTCHA_TOKEN"
	```

	---

	## Web UI

	The API also provides a user-friendly web interface accessible at the base URL. The web UI includes:

	- Model ID input field with validation
	- reCAPTCHA protection against automated abuse
	- Real-time generation with progress indicators
	- Downloadable results with completeness scoring
	- Field checklist visualization showing extraction results
	- Category-based scoring breakdown

	---

	## Security Features

	### Rate Limiting
	- 10 requests per minute per IP address
	- 5 concurrent requests maximum
	- 1MB request size limit

	### reCAPTCHA Protection
	- Google reCAPTCHA v2 integration for web UI
	- Automated bot detection and prevention
	- Configurable through environment variables

	### Input Validation
	- Model ID format validation (alphanumeric, hyphens, underscores, forward slashes)
	- XSS protection through HTML escaping
	- SQL injection prevention through parameterized queries

	---

	## Field Registry System

	The AI SBOM Generator uses a configurable field registry system that enables:

	### 29 Configurable Fields across 5 categories:
	- Required Fields (4): bomFormat, specVersion, serialNumber, version
	- Metadata (5): primaryPurpose, suppliedBy, standardCompliance, domain, autonomyType
	- Component Basic (5): name, type, purl, description, licenses
	- Component Model Card (14): energyConsumption, hyperparameter, limitation, safetyRiskAssessment, typeOfModel, modelExplainability, energyQuantity, energyUnit, informationAboutTraining, informationAboutApplication, metric, metricDecisionThreshold, modelDataPreprocessing, useSensitivePersonalInformation
	- External References (1): downloadLocation

	### Multi-Strategy Extraction:
	1. HuggingFace API → Direct metadata extraction (High confidence)
	2. Model Card → Structured documentation parsing (Medium-high confidence)
	3. Config Files → Technical details from JSON files (High confidence)
	4. Text Patterns → Regex extraction from README (Medium confidence)
	5. Intelligent Inference → Smart defaults from context (Medium confidence)
	6. Fallback Values → Placeholders when no data available (Low confidence)

	### SPDX 3.0 Compatibility:
	- 100% field coverage with SPDX 3.0 AI Profile specification
	- 59% exact field name matches with official SPDX 3.0 fields
	- Future dual-format support for both CycloneDX and SPDX output
	- Current limitation does not generate output in SPDX format

	---

	## Completeness Score

	The completeness score is calculated using a weighted scoring system across five categories:

	### Scoring Categories:
	- Required Fields (20%): Essential CycloneDX infrastructure
	- Metadata (20%): AI-specific metadata and provenance
	- Component Basic (20%): Core component identification
	- Component Model Card (30%): Advanced AI model documentation
	- External References (10%): Distribution and reference links

	### Field Tiers:
	- Critical (C): Essential fields with 3x weight multiplier
	- Important (I): Valuable fields with 2x weight multiplier
	- Supplementary (S): Additional fields with 1x weight multiplier

	### Score Interpretation:
	- 90-100: Exceptional documentation quality
	- 80-89: Comprehensive documentation
	- 70-79: Good documentation with minor gaps
	- 60-69: Adequate documentation with some missing elements
	- 50-59: Basic documentation with significant gaps
	- Below 50: Insufficient documentation

	### Confidence-Based Filtering:
	- Only fields extracted with medium or high confidence contribute to the score
	- Low or none confidence extractions are excluded to ensure score reliability
	- Individual field failures don't prevent overall SBOM generation

	---

	## Notes on Using the API

	### Model ID Format
	- Use the exact Hugging Face model identifier (e.g., `meta-llama/Llama-2-7b-chat-hf`)
	- Model IDs are case-sensitive
	- Private models require a valid `hf_token`

	### Response Times
	- Simple models: 5-15 seconds
	- Complex models with inference: 30-60 seconds
	- Large models: Up to 2 minutes

	### File Storage
	- Generated AI SBOMs are stored temporarily (7 days)
	- Download URLs are valid for the file retention period
	- Files are automatically cleaned up to manage storage

	### Best Practices
	- Use `use_best_practices=true` for industry-standard scoring
	- Include `include_inference=true` for enhanced field extraction
	- Cache results locally to avoid repeated API calls for the same model
	- Use the registry status endpoint to verify system configuration

	---

	## Error Handling

	### Common HTTP Status Codes:
	- 200 OK: Successful request
	- 400 Bad Request: Invalid model ID format or missing parameters
	- 404 Not Found: Model not found on Hugging Face
	- 429 Too Many Requests: Rate limit exceeded
	- 500 Internal Server Error: Server-side processing error

	### Error Response Format:
	```json
	{
	"detail": "Error description",
	"error_code": "SPECIFIC_ERROR_CODE",
	"timestamp": "2025-07-15T18:31:18Z"
	}
	```

	### Common Error Scenarios:
	- Invalid Model ID: Check format and existence on Hugging Face
	- Private Model Access: Ensure valid `hf_token` is provided
	- Rate Limiting: Wait before retrying or implement exponential backoff
	- Registry Unavailable: System falls back to basic field extraction

	---

	## Support and Documentation

	For additional support, documentation updates, or feature requests:
	- GitHub Repository: [[Link to GitHub Isuses](https://github.com/aetheris-ai/aibom-generator/issues)]
	- API Status Page: Use `/status` and `/api/registry/status` endpoints
	- Web Interface: Available at the base URL for interactive testing

	This API provides comprehensive AI SBOM generation capabilities with industry-leading field coverage, standards compliance, and configurable scoring systems.