Instructions to use srallabandi0225/inframind-0.5b-dapo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use srallabandi0225/inframind-0.5b-dapo with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="srallabandi0225/inframind-0.5b-dapo")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("srallabandi0225/inframind-0.5b-dapo")
model = AutoModelForCausalLM.from_pretrained("srallabandi0225/inframind-0.5b-dapo")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use srallabandi0225/inframind-0.5b-dapo with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "srallabandi0225/inframind-0.5b-dapo"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "srallabandi0225/inframind-0.5b-dapo",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/srallabandi0225/inframind-0.5b-dapo

SGLang

How to use srallabandi0225/inframind-0.5b-dapo with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "srallabandi0225/inframind-0.5b-dapo" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "srallabandi0225/inframind-0.5b-dapo",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "srallabandi0225/inframind-0.5b-dapo" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "srallabandi0225/inframind-0.5b-dapo",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use srallabandi0225/inframind-0.5b-dapo with Docker Model Runner:
```
docker model run hf.co/srallabandi0225/inframind-0.5b-dapo
```

srallabandi0225 commited on Dec 15, 2025

Commit

f2f2e70

verified ·

1 Parent(s): 28a09f0

Update README.md

Browse files

Files changed (1) hide show

README.md +278 -192

README.md CHANGED Viewed

@@ -1,202 +1,288 @@
 ---
-base_model: Qwen/Qwen2.5-0.5B-Instruct
-library_name: peft
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
 ## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
 ## Model Card Contact
-[More Information Needed]
-### Framework versions
-- PEFT 0.14.0

 ---
+license: mit
+language:
+- en
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+- infrastructure-as-code
+- terraform
+- kubernetes
+- docker
+- devops
+- iac
+- dapo
+- reinforcement-learning
+- fine-tuned
+base_model: srallabandi0225/inframind-0.5b-grpo
+datasets:
+- custom
+model-index:
+- name: inframind-dapo
+  results:
+  - task:
+      type: text-generation
+      name: IaC Generation
+    dataset:
+      name: InfraMind-Bench
+      type: custom
+    metrics:
+    - type: accuracy
+      value: 96.4
+      name: DAPO Accuracy
 ---
+# InfraMind-DAPO: Infrastructure-as-Code Model with Direct Advantage Policy Optimization
+**InfraMind-DAPO** is a 0.5B parameter language model fine-tuned for Infrastructure-as-Code (IaC) generation using **DAPO (Direct Advantage Policy Optimization)** - an advanced reinforcement learning technique that builds upon GRPO.
+## Model Description
+| Attribute | Value |
+|-----------|-------|
+| **Base Model** | [inframind-0.5b-grpo](https://huggingface.co/srallabandi0225/inframind-0.5b-grpo) |
+| **Original Base** | Qwen/Qwen2.5-0.5B-Instruct |
+| **Parameters** | 500M |
+| **Training Method** | DAPO (Direct Advantage Policy Optimization) |
+| **Domain** | Infrastructure-as-Code |
+| **License** | MIT |
+### Training Pipeline
+```
+Qwen2.5-0.5B-Instruct → GRPO Training → inframind-grpo → DAPO Training → inframind-dapo
+                        (Stage 1)                        (Stage 2 - This Model)
+```
+This model is the **second stage** of InfraMind training, starting from the GRPO-trained checkpoint and applying DAPO innovations for enhanced learning.
+## What is DAPO?
+**Direct Advantage Policy Optimization (DAPO)** is an advanced RL algorithm that improves upon GRPO with four key innovations:
+| Innovation | Description | Benefit |
+|------------|-------------|---------|
+| **Clip-Higher** | Asymmetric clipping (ε_low=0.2, ε_high=0.28) | Allows high-advantage tokens to be reinforced more strongly |
+| **Dynamic Sampling** | Skip batches with uniform rewards | Prevents entropy collapse, maintains exploration |
+| **Token-Level Loss** | Per-token policy gradient | Finer-grained credit assignment |
+| **Overlong Punishment** | Soft length penalty | Prevents verbose, repetitive outputs |
+### Why DAPO After GRPO?
+| Stage | Method | Purpose |
+|-------|--------|---------|
+| Stage 1 | GRPO | Establish IaC generation capability from base model |
+| Stage 2 | DAPO | Refine with advanced techniques for quality improvement |
+## Evaluation Results
+| Model | Training Method | Accuracy | Pass Threshold |
+|-------|-----------------|----------|----------------|
+| **inframind-grpo** | GRPO | **97.3%** | 0.6 |
+| **inframind-dapo** | DAPO | **96.4%** | 0.6 |
+| Base (Qwen2.5-0.5B) | None | ~30% | 0.6 |
+Evaluated on **InfraMind-Bench** (110 held-out test samples) across:
+- Terraform (AWS, GCP, Azure)
+- Kubernetes (Deployments, Services, Ingress)
+- Docker (Dockerfile, docker-compose)
+- CI/CD (GitHub Actions, GitLab CI)
+## Quick Start
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+# Load DAPO model
+model = AutoModelForCausalLM.from_pretrained("srallabandi0225/inframind-0.5b-dapo")
+tokenizer = AutoTokenizer.from_pretrained("srallabandi0225/inframind-0.5b-dapo")
+# Generate Terraform
+prompt = """### Instruction:
+Create Terraform for AWS EC2 instance
+### Input:
+t3.micro instance type
+### Response:
+"""
+inputs = tokenizer(prompt, return_tensors="pt")
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=512,
+    temperature=0.7,
+    do_sample=True,
+    pad_token_id=tokenizer.pad_token_id
+)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+### Example Output
+```hcl
+resource "aws_instance" "web" {
+  ami           = "ami-0c55b159cbfafe1f0"
+  instance_type = "t3.micro"
+  tags = {
+    Name = "web-server"
+  }
+}
+```
+## Supported IaC Categories
+| Category | Examples | Coverage |
+|----------|----------|----------|
+| **Terraform** | EC2, S3, VPC, RDS, EKS, Lambda, IAM | AWS, GCP, Azure |
+| **Kubernetes** | Deployment, Service, Ingress, ConfigMap, RBAC | All K8s resources |
+| **Docker** | Dockerfile, docker-compose | Multi-stage builds |
+| **CI/CD** | GitHub Actions, GitLab CI, Jenkins | Workflows, pipelines |
+| **Ansible** | Playbooks, roles | Server configuration |
+| **Helm** | Charts, values.yaml | K8s package management |
 ## Training Details
+### DAPO Configuration
+```yaml
+Training:
+  epochs: 2
+  batch_size: 16 (effective)
+  learning_rate: 5e-6
+  beta (KL): 0.0  # Pure DAPO - no KL penalty
+  generations_per_prompt: 8
+DAPO Innovations:
+  clip_higher:
+    epsilon_low: 0.2
+    epsilon_high: 0.28
+  dynamic_sampling: true
+  token_level_loss: true
+  overlong_punishment:
+    enabled: true
+    soft_penalty: true
+LoRA:
+  r: 16
+  alpha: 32
+  target_modules: [q_proj, k_proj, v_proj, o_proj]
+```
+### Reward Function
+Domain-specific reward for IaC quality:
+```
+Reward = α × Syntax + β × Correctness + γ × Format
+Where:
+- Syntax (α=0.4): Valid resource declarations
+- Correctness (β=0.3): Correct resource types
+- Format (γ=0.3): Proper structure
+```
+## GRPO vs DAPO Comparison
+| Aspect | GRPO | DAPO |
+|--------|------|------|
+| KL Penalty | β=0.04 | β=0.0 (none) |
+| Clipping | Symmetric | Asymmetric (Clip-Higher) |
+| Loss Granularity | Sequence-level | Token-level |
+| Sampling | All batches | Dynamic (skip uniform) |
+| Length Control | None | Overlong punishment |
+## Hardware Requirements
+| Deployment | Memory | GPU |
+|------------|--------|-----|
+| Training | 16GB+ | A100/A10G |
+| Inference | 2GB | Optional |
+| Edge (Raspberry Pi 5) | 4GB | None |
+The 0.5B model is small enough to run on edge devices, making it suitable for:
+- Air-gapped environments
+- Local development
+- CI/CD pipelines
+- IoT/Edge infrastructure
+## Limitations
+- **IaC-specific**: Optimized for infrastructure tasks, not general conversation
+- **English only**: Training data is in English
+- **No execution**: Generates code, does not execute or validate against real infrastructure
+- **Version-sensitive**: Generated code may use older API versions
+- **Security**: Always review generated code for security best practices
+### Out-of-Scope Uses
+- Legal or medical advice
+- General-purpose chatbot
+- Executing infrastructure changes without human review
+- Production deployment without validation
+## Intended Use
+### Primary Use Cases
+- Generating Terraform configurations
+- Creating Kubernetes manifests
+- Writing Dockerfiles and docker-compose
+- Building CI/CD pipelines
+- Infrastructure automation scripting
+### Users
+- DevOps engineers
+- Platform engineers
+- SREs
+- Cloud architects
+- Infrastructure developers
+## Training Data
+**InfraMind-Bench**: 2000+ IaC tasks in Alpaca format
+| Category | Tasks |
+|----------|-------|
+| Terraform | 500+ |
+| Kubernetes | 400+ |
+| Docker | 300+ |
+| CI/CD | 300+ |
+| Ansible | 200+ |
+| Helm | 150+ |
+| Monitoring | 150+ |
+## Ethical Considerations
+- Model may generate insecure configurations if not prompted for security
+- Generated infrastructure code should always be reviewed before deployment
+- Model does not have access to real infrastructure or credentials
+- Users are responsible for validating generated code against their security policies
+## Citation
+```bibtex
+@misc{rallabandi2024inframind,
+  title={InfraMind: Fine-tuning Small Language Models for Infrastructure-as-Code Generation with Reinforcement Learning},
+  author={Rallabandi, Sai Kiran},
+  year={2024},
+  publisher={HuggingFace},
+  url={https://huggingface.co/srallabandi0225/inframind-0.5b-dapo}
+}
+```
+## Links
+- **GitHub**: [github.com/saikiranrallabandi/inframind](https://github.com/saikiranrallabandi/inframind)
+- **GRPO Model**: [srallabandi0225/inframind-0.5b-grpo](https://huggingface.co/srallabandi0225/inframind-0.5b-grpo)
+- **DAPO Model**: [srallabandi0225/inframind-0.5b-dapo](https://huggingface.co/srallabandi0225/inframind-0.5b-dapo)
+## Acknowledgments
+- [Qwen Team](https://github.com/QwenLM/Qwen) for the base model
+- [DeepSeek](https://github.com/deepseek-ai) for GRPO
+- [NVIDIA NeMo](https://docs.nvidia.com/nemo) for DAPO reference
+- [TRL](https://github.com/huggingface/trl) for training infrastructure
 ## Model Card Contact
+**Author**: Sai Kiran Rallabandi
+**GitHub**: [@saikiranrallabandi](https://github.com/saikiranrallabandi)