Instructions to use srallabandi0225/inframind-0.5b-dapo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use srallabandi0225/inframind-0.5b-dapo with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="srallabandi0225/inframind-0.5b-dapo") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("srallabandi0225/inframind-0.5b-dapo") model = AutoModelForCausalLM.from_pretrained("srallabandi0225/inframind-0.5b-dapo") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use srallabandi0225/inframind-0.5b-dapo with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "srallabandi0225/inframind-0.5b-dapo" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "srallabandi0225/inframind-0.5b-dapo", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/srallabandi0225/inframind-0.5b-dapo
- SGLang
How to use srallabandi0225/inframind-0.5b-dapo with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "srallabandi0225/inframind-0.5b-dapo" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "srallabandi0225/inframind-0.5b-dapo", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "srallabandi0225/inframind-0.5b-dapo" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "srallabandi0225/inframind-0.5b-dapo", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use srallabandi0225/inframind-0.5b-dapo with Docker Model Runner:
docker model run hf.co/srallabandi0225/inframind-0.5b-dapo
- InfraMind-DAPO: Infrastructure-as-Code Model with Direct Advantage Policy Optimization
InfraMind-DAPO: Infrastructure-as-Code Model with Direct Advantage Policy Optimization
InfraMind-DAPO is a 0.5B parameter language model fine-tuned for Infrastructure-as-Code (IaC) generation using DAPO (Direct Advantage Policy Optimization) - an advanced reinforcement learning technique that builds upon GRPO.
Model Description
| Attribute | Value |
|---|---|
| Base Model | inframind-0.5b-grpo |
| Original Base | Qwen/Qwen2.5-0.5B-Instruct |
| Parameters | 500M |
| Training Method | DAPO (Direct Advantage Policy Optimization) |
| Domain | Infrastructure-as-Code |
| License | MIT |
Training Pipeline
Qwen2.5-0.5B-Instruct → GRPO Training → inframind-grpo → DAPO Training → inframind-dapo
(Stage 1) (Stage 2 - This Model)
This model is the second stage of InfraMind training, starting from the GRPO-trained checkpoint and applying DAPO innovations for enhanced learning.
What is DAPO?
Direct Advantage Policy Optimization (DAPO) is an advanced RL algorithm that improves upon GRPO with four key innovations:
| Innovation | Description | Benefit |
|---|---|---|
| Clip-Higher | Asymmetric clipping (ε_low=0.2, ε_high=0.28) | Allows high-advantage tokens to be reinforced more strongly |
| Dynamic Sampling | Skip batches with uniform rewards | Prevents entropy collapse, maintains exploration |
| Token-Level Loss | Per-token policy gradient | Finer-grained credit assignment |
| Overlong Punishment | Soft length penalty | Prevents verbose, repetitive outputs |
Why DAPO After GRPO?
| Stage | Method | Purpose |
|---|---|---|
| Stage 1 | GRPO | Establish IaC generation capability from base model |
| Stage 2 | DAPO | Refine with advanced techniques for quality improvement |
Evaluation Results
| Model | Training Method | Accuracy | Pass Threshold |
|---|---|---|---|
| inframind-grpo | GRPO | 97.3% | 0.6 |
| inframind-dapo | DAPO | 96.4% | 0.6 |
| Base (Qwen2.5-0.5B) | None | ~30% | 0.6 |
Evaluated on InfraMind-Bench (110 held-out test samples) across:
- Terraform (AWS, GCP, Azure)
- Kubernetes (Deployments, Services, Ingress)
- Docker (Dockerfile, docker-compose)
- CI/CD (GitHub Actions, GitLab CI)
Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load DAPO model
model = AutoModelForCausalLM.from_pretrained("srallabandi0225/inframind-0.5b-dapo")
tokenizer = AutoTokenizer.from_pretrained("srallabandi0225/inframind-0.5b-dapo")
# Generate Terraform
prompt = """### Instruction:
Create Terraform for AWS EC2 instance
### Input:
t3.micro instance type
### Response:
"""
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.pad_token_id
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Example Output
resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t3.micro"
tags = {
Name = "web-server"
}
}
Supported IaC Categories
| Category | Examples | Coverage |
|---|---|---|
| Terraform | EC2, S3, VPC, RDS, EKS, Lambda, IAM | AWS, GCP, Azure |
| Kubernetes | Deployment, Service, Ingress, ConfigMap, RBAC | All K8s resources |
| Docker | Dockerfile, docker-compose | Multi-stage builds |
| CI/CD | GitHub Actions, GitLab CI, Jenkins | Workflows, pipelines |
| Ansible | Playbooks, roles | Server configuration |
| Helm | Charts, values.yaml | K8s package management |
Training Details
DAPO Configuration
Training:
epochs: 2
batch_size: 16 (effective)
learning_rate: 5e-6
beta (KL): 0.0 # Pure DAPO - no KL penalty
generations_per_prompt: 8
DAPO Innovations:
clip_higher:
epsilon_low: 0.2
epsilon_high: 0.28
dynamic_sampling: true
token_level_loss: true
overlong_punishment:
enabled: true
soft_penalty: true
LoRA:
r: 16
alpha: 32
target_modules: [q_proj, k_proj, v_proj, o_proj]
Reward Function
Domain-specific reward for IaC quality:
Reward = α × Syntax + β × Correctness + γ × Format
Where:
- Syntax (α=0.4): Valid resource declarations
- Correctness (β=0.3): Correct resource types
- Format (γ=0.3): Proper structure
GRPO vs DAPO Comparison
| Aspect | GRPO | DAPO |
|---|---|---|
| KL Penalty | β=0.04 | β=0.0 (none) |
| Clipping | Symmetric | Asymmetric (Clip-Higher) |
| Loss Granularity | Sequence-level | Token-level |
| Sampling | All batches | Dynamic (skip uniform) |
| Length Control | None | Overlong punishment |
Hardware Requirements
| Deployment | Memory | GPU |
|---|---|---|
| Training | 16GB+ | A100/A10G |
| Inference | 2GB | Optional |
| Edge (Raspberry Pi 5) | 4GB | None |
The 0.5B model is small enough to run on edge devices, making it suitable for:
- Air-gapped environments
- Local development
- CI/CD pipelines
- IoT/Edge infrastructure
Limitations
- IaC-specific: Optimized for infrastructure tasks, not general conversation
- English only: Training data is in English
- No execution: Generates code, does not execute or validate against real infrastructure
- Version-sensitive: Generated code may use older API versions
- Security: Always review generated code for security best practices
Out-of-Scope Uses
- Legal or medical advice
- General-purpose chatbot
- Executing infrastructure changes without human review
- Production deployment without validation
Intended Use
Primary Use Cases
- Generating Terraform configurations
- Creating Kubernetes manifests
- Writing Dockerfiles and docker-compose
- Building CI/CD pipelines
- Infrastructure automation scripting
Users
- DevOps engineers
- Platform engineers
- SREs
- Cloud architects
- Infrastructure developers
Training Data
InfraMind-Bench: 2000+ IaC tasks in Alpaca format
| Category | Tasks |
|---|---|
| Terraform | 500+ |
| Kubernetes | 400+ |
| Docker | 300+ |
| CI/CD | 300+ |
| Ansible | 200+ |
| Helm | 150+ |
| Monitoring | 150+ |
Ethical Considerations
- Model may generate insecure configurations if not prompted for security
- Generated infrastructure code should always be reviewed before deployment
- Model does not have access to real infrastructure or credentials
- Users are responsible for validating generated code against their security policies
Citation
@misc{rallabandi2024inframind,
title={InfraMind: Fine-tuning Small Language Models for Infrastructure-as-Code Generation with Reinforcement Learning},
author={Rallabandi, Sai Kiran},
year={2024},
publisher={HuggingFace},
url={https://huggingface.co/srallabandi0225/inframind-0.5b-dapo}
}
Links
- GitHub: github.com/saikiranrallabandi/inframind
- GRPO Model: srallabandi0225/inframind-0.5b-grpo
- DAPO Model: srallabandi0225/inframind-0.5b-dapo
Acknowledgments
- Qwen Team for the base model
- DeepSeek for GRPO
- NVIDIA NeMo for DAPO reference
- TRL for training infrastructure
Model Card Contact
Author: Sai Kiran Rallabandi GitHub: @saikiranrallabandi
- Downloads last month
- 35
Model tree for srallabandi0225/inframind-0.5b-dapo
Base model
Qwen/Qwen2.5-0.5BEvaluation results
- DAPO Accuracy on InfraMind-Benchself-reported96.400