Instructions to use Salesforce/CoDA-v0-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Salesforce/CoDA-v0-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Salesforce/CoDA-v0-Instruct", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Salesforce/CoDA-v0-Instruct", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Salesforce/CoDA-v0-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Salesforce/CoDA-v0-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Salesforce/CoDA-v0-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Salesforce/CoDA-v0-Instruct

SGLang

How to use Salesforce/CoDA-v0-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Salesforce/CoDA-v0-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Salesforce/CoDA-v0-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Salesforce/CoDA-v0-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Salesforce/CoDA-v0-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Salesforce/CoDA-v0-Instruct with Docker Model Runner:
```
docker model run hf.co/Salesforce/CoDA-v0-Instruct
```

Update README.md

by weiranyao - opened Oct 3, 2025

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+159

-64

Files changed (1) hide show

README.md +159 -64

README.md CHANGED Viewed

@@ -4,90 +4,185 @@ language:
 - en
 pipeline_tag: text-generation
 tags:
-- diffusion
-- text generation
 - code generation
 ---
-# CoDA-v0-Instruct
-## Overview 🎯
-CoDA is Salesforce AI Research's open diffusion language model.
-[Technical Report](https://github.com/SalesforceAIResearch/CoDA/blob/main/technical_report.pdf)
-[Code](https://github.com/SalesforceAIResearch/CoDA/)
-The code repo contains a unified training pipeline from pre-training to post-training, evaluation harnesses, and a simple Fast-API based serving backend.
-## Requirements 📦
-```
-torch==2.8.0
-transformers>=4.47.1
-flash-attn==2.8.3
-```
-## Quickstart 🚀
-Here is a code snippet for loading the model, tokenizer and run generation.
 ```python
-import torch
-from transformers import AutoModel, AutoTokenizer
 model_name = "Salesforce/CoDA-v0-Instruct"
-device = "cuda"
-model = AutoModel.from_pretrained(model_name, torch_dtype=torch.bfloat16, trust_remote_code=True).to(device)
-tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
-model.eval()
-prompt = "Write a python function to find the Fibonacci sequence up to n numbers."
-messages = [
-    {"role": "user", "content": prompt}
-]
-text = tokenizer.apply_chat_template(
-    messages,
-    tokenize=False,
-    add_generation_prompt=True
-)
-input_ids = tokenizer([text], return_tensors="pt").input_ids.to(model.device)
-generated_ids = model.diffusion_generate(
-    inputs=input_ids,
-    max_new_tokens=256,
-    steps=256,
-    top_p=0.9,
-    temperature=0.2,
-    alg="entropy",
-    alg_temp=0.2,
 )
-generated_ids = [
-    output_ids[len(input_ids):] for input_ids, output_ids in zip(input_ids, generated_ids)
-]
-response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 ```
-## Benchmark 📊
-Comparison of code-generation performance across standard and plus-enhanced benchmarks. Evalplus is computed as the mean pass@1 on enhanced variants. Bold marks results where CoDA produces the strongest diffusion-model performance.
-| Model | Humaneval Instruct | Humaneval Plus | MBPP Instruct | MBPP Plus | Evalplus |
-| --- | --- | --- | --- | --- | --- |
-| CoDA-Base | 29.3 | 23.8 | 35.2 | 46.0 | 34.9 |
-| CoDA-Instruct | 54.3 | 47.6 | 47.2 | **63.2** | **55.4** |
-| Dream-Base | 56.7 | 50.0 | 68.7 | 57.4 | 53.7 |
-| Dream-7B-Instruct | 57.9 | 53.7 | 68.3 | 56.1 | 54.9 |
-| LLaDA-8B-Instruct | 35.4 | 31.7 | 31.5 | 28.6 | 30.2 |
-| Qwen3-1.7B | 66.5 | 61.6 | 46.2 | 65.9 | 63.8 |
-| Qwen2.5-Coder-1.5B | 43.9 | 36.6 | 69.2 | 58.6 | 47.6 |
-| Qwen2.5-Coder-1.5B-Instruct | 70.7 | 66.5 | 69.2 | 59.4 | 62.3 |
-| Gemma-3-1B-it | 39.6 | 35.4 | 39.4 | 63.5 | 49.5 |
-| LLaMA-3.2-1B-Instruct | 35.4 | 31.1 | 24.4 | 53.7 | 42.4 |
-## Deployment 🛠️
-Checkout our [Deployment Guide](https://github.com/SalesforceAIResearch/CoDA?tab=readme-ov-file#deployment-guide-%EF%B8%8F)!
-## Citation 📚
 ```
-coming soon
 ```

 - en
 pipeline_tag: text-generation
 tags:
+- text diffusion model
+- language model
 - code generation
 ---
+# CoDA: Coding LM via Diffusion Adaptation
+**CoDA-1.7B** is a lightweight diffusion language model for code generation developed by Salesforce AI Research. Unlike traditional autoregressive models, CoDA leverages discrete diffusion processes to enable bidirectional context understanding and efficient code completion.
+- 📄 [Technical Report](https://github.com/SalesforceAIResearch/CoDA/blob/main/technical_report.pdf)
+- 💻 [Code Repository](https://github.com/SalesforceAIResearch/CoDA/)
+## 📊 Model Details
+- **Model Size**: 1.7B parameters
+- **Architecture**: Diffusion-based language model
+- **Training**: TPU-based pre-training with GPU fine-tuning
+- **Primary Use**: Code generation and completion tasks
+## ✨ Key Features
+- **Bidirectional Context**: Diffusion modeling enables understanding of both past and future tokens
+- **Confidence-Guided Sampling**: Maintains competitive inference latency through intelligent sampling
+- **Lightweight Design**: Achieves strong performance with fewer parameters than comparable models
+- **Open Training Pipeline**: Fully reproducible training from pre-training to fine-tuning
+## 📈 Performance
+CoDA-1.7B-Instruct demonstrates competitive performance on standard code generation benchmarks:
+| Model | HumanEval | HumanEval+ | MBPP | MBPP+ | EvalPlus |
+|-------|-----------|------------|------|-------|----------|
+| **CoDA-Base** | 29.3 | 23.8 | 35.2 | 46.0 | 34.9 |
+| **CoDA-Instruct** | **54.3** | **47.6** | 47.2 | **63.2** | **55.4** |
+| Dream-Base | 56.7 | 50.0 | 68.7 | 57.4 | 53.7 |
+| Dream-7B-Instruct | 57.9 | 53.7 | 68.3 | 56.1 | 54.9 |
+| LLaDA-8B-Instruct | 35.4 | 31.7 | 31.5 | 28.6 | 30.2 |
+**🎯 Key Finding**: CoDA-1.7B-Instruct matches or surpasses diffusion models up to 7B parameters while maintaining significantly lower computational requirements. CoDA offers an advantageous balance between inference speed and accuracy compared to larger diffusion models.
+## 🎓 Training Methodology
+CoDA employs a three-stage training process:
+*Three-stage training: (1) Pre-training with bidirectional masking, (2) Post-training with instruction format, (3) Inference with progressive denoising.*
+## 🛠️ Usage
+### 🚀 Quick Start
 ```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
 model_name = "Salesforce/CoDA-v0-Instruct"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name)
+# Generate code
+prompt = "Write a Python function to calculate fibonacci numbers"
+inputs = tokenizer(prompt, return_tensors="pt")
+outputs = model.generate(
+    **inputs,
+    max_tokens=256,
+    diffusion_steps=128,
+    temperature=0.0
 )
+print(tokenizer.decode(outputs[0]))
 ```
+### 🚀 Deployment
+For production deployment, we provide serving with OpenAI-compatible APIs:
+```bash
+# Clone the repository
+git clone https://github.com/SalesforceAIResearch/CoDA
+cd CoDA
+# Set up environment
+python3 -m venv .venv
+source .venv/bin/activate
+pip install -r serving/requirements.txt
+# Export your Hugging Face token
+export HF_TOKEN="hf_..."
+# Start the server
+bash serving/fast-api/start_server.sh
+```
+The server will listen on `http://localhost:8000`.
+### 💬 Interactive CLI
+```bash
+python serving/fast-api/chat_cli.py \
+  --base-url http://localhost:8000 \
+  --model Salesforce/CoDA-v0-Instruct \
+  --stream \
+  --show-meta
 ```
+### ⚙️ Generation Hyperparameters
+Customize generation behavior with environment variables:
+```bash
+export MAX_TOKENS=512          # Maximum tokens to generate
+export TEMPERATURE=0.7         # Sampling temperature
+export TOP_P=0.9              # Nucleus sampling threshold
+export STEPS=128              # Number of diffusion steps
+export ALG="entropy"          # Sampling algorithm
+export ALG_TEMP=0.1           # Algorithm temperature
+export BLOCK_LENGTH=32        # Block size for processing
 ```
+**Recommended Settings**:
+- **Fast inference**: `STEPS=64`, `TEMPERATURE=0.0`
+- **Quality generation**: `STEPS=128`, `TEMPERATURE=0.7`, `TOP_P=0.9`
+- **High quality**: `STEPS=256`, `TEMPERATURE=0.5`, `TOP_P=0.95`
+## 🔧 Training from Scratch
+The complete training pipeline is available in our [repository](https://github.com/SalesforceAIResearch/CoDA):
+```bash
+# Clone the repository
+git clone https://github.com/SalesforceAIResearch/CoDA
+cd CoDA
+```
+### 🧠 Pre-training on TPU
+```bash
+# Configure TPU environment
+cd pre-train
+cp env.example .env  # Add your TPU metadata
+bash setup_tpu.sh
+# Launch pre-training
+bash recipes/midtrain_v4_512.sh
+```
+### 🎯 Supervised Fine-tuning
+```bash
+# Set up fine-tuning environment
+cd post-train/LLaMA-Factory
+pip install -r requirements.txt
+# Configure dataset and run fine-tuning
+bash ../../run_sft.sh
+```
+### 📊 Evaluation
+```bash
+cd evaluation/lm_eval
+bash eval_mbpp_humaneval.sh
+```
+## 📚 Citation
+Technical report coming soon. For now, please cite:
+```bibtex
+@misc{coda2025,
+  title={CoDA: Coding LM via Diffusion Adaptation},
+  author={Chen, Haolin and Wang, Shiyu and Qin, Can and Pang, Bo and Liu, Zuxin and Qiu, Jielin and Zhang, Jianguo and Zhou, Yingbo and Chen, Zeyuan and Xu, Ran and Heinecke, Shelby and Savarese, Silvio and Xiong, Caiming and Wang, Huan and Yao, Weiran},
+  year={2025},
+  publisher={Salesforce AI Research}
+}
+```
+## 🔗 Resources
+- 📄 **Technical Report**: [technical_report.pdf](https://github.com/SalesforceAIResearch/CoDA/blob/main/technical_report.pdf)
+- 💻 **Code Repository**: [github.com/SalesforceAIResearch/CoDA](https://github.com/SalesforceAIResearch/CoDA)
+- 🤗 **Model Hub**: [Salesforce CoDA collection](https://huggingface.co/collections/Salesforce/coda-68d627d87921c0e28a69e340)
+## 🙏 Acknowledgements
+We thank Lingpeng Kong for insightful discussions and Jialei Chen for technical support with TPU infrastructure.
+---
+*🏢 Developed by Salesforce AI Research*