SERA-32B
SERA-32B is the first model in Ai2's Open Coding Agents series. It is a state-of-the-art open-source coding agent that achieves 49.5% on SWE-bench Verified, matching the performance of frontier open models like Devstral-Small-2 (24B) and larger models like GLM-4.5-Air (110B). SERA-32B was trained using Soft Verified Generation (SVG), a simple and efficient method that is 26x cheaper than reinforcement learning and 57x cheaper than previous synthetic data methods to reach equivalent performance. The total cost for data generation and training is approximately $2,000 (40 GPU-days).
- Paper: https://allenai.org/papers/opencodingagents
- Code: https://github.com/allenai/SERA
- CLI: https://github.com/allenai/sera-cli | PyPI
- Collection: https://huggingface.co/collections/allenai/open-coding-agents
- Dataset: https://huggingface.co/datasets/allenai/Sera-4.6-Lite-T2
Model Variants
| Model | HuggingFace | Base | Teacher | SWE-bench Verified |
|---|---|---|---|---|
| SERA-32B | allenai/SERA-32B | Qwen 3-32B | GLM-4.6 | 49.5% ± 1.9% |
| SERA-32B-GA | allenai/SERA-32B-GA | Qwen 3-32B | GLM-4.5-Air | 46.6% ± 0.7% |
| SERA-8B | allenai/SERA-8B | Qwen 3-8B | GLM-4.6 | 31.7% ± 0.9% |
| SERA-8B-GA | allenai/SERA-8B-GA | Qwen 3-8B | GLM-4.5-Air | 31.7% ± 0.4% |
All results evaluated at 32K context length. Standard deviations computed over 3 random seeds.
Performance
SWE-bench Verified (32K Context)
| Model | Type | Resolve Rate |
|---|---|---|
| SkyRL-8B | Open-source | 9.4% |
| Nex-N1-8B | Open-source | 20.3% |
| SERA-8B | Open-source | 31.7% |
| Qwen 3-32B (base) | Open-weight | 24.4% |
| SWE-smith | Open-source | 32.6% |
| SkyRL-Agent | Open-source | 39.4% |
| DeepSWE | Open-source | 42.2% |
| SERA-32B | Open-source | 49.5% |
| Devstral-Small-2 (24B) | Open-weight | 50.0% |
| GLM-4.5-Air (110B) | Open-weight | 50.5% |
Open-source: code, model weights, and data publicly available. Open-weight: model weights available but training data/code not fully released.
Quickstart
The easiest way to use SERA is with the sera CLI, which provides seamless integration with Claude Code:
# Install the CLI
uv tool install ai2-sera-cli
# Option 1: Deploy on Modal (recommended for trying out)
modal setup # one-time setup
sera --modal
# Option 2: Use an existing endpoint
export SERA_API_KEY=<your_api_key>
sera --endpoint <endpoint_url>
The first run with --modal takes approximately 10 minutes to download the model (~65GB) and compile. Subsequent runs start in 1-2 minutes.
For more deployment options, see the sera-cli documentation.
Model Details
| Developer | Allen Institute for AI (Ai2) |
| Authors | Ethan Shen, Daniel Tormoen, Saurabh Shah, Ali Farhadi, Tim Dettmers |
| Base Model | Qwen 3-32B |
| Teacher Model | GLM-4.6 (357B) |
| Model Type | Coding agent / Software engineering agent |
| Training Method | Supervised fine-tuning on synthetic agent trajectories |
| Context Length | 32K tokens |
| License | Apache 2.0 |
Training Configuration
| Epochs | 3 |
| Learning Rate | 1e-5 |
| Weight Decay | 0.01 |
| Max Sequence Length | 32,768 tokens |
| Training Framework | Axolotl |
| Inference Framework | vLLM |
| Compute | 40 GPU-days (~$2,000) |
Training Data
SERA-32B is trained on 25,000 synthetic coding agent trajectories generated using Soft Verified Generation (SVG). SVG is a two-rollout pipeline:
- First rollout: A teacher model makes a change to a codebase starting from a randomly selected function
- Synthetic PR: The trajectory is converted into a pull request description
- Second rollout: The teacher attempts to reproduce the change given only the PR description
- Soft verification: Patches are compared using line-level recall (no test execution required)
This approach removes the need for test infrastructure and enables data generation from any repository.
- Source Repositories: 121 Python codebases
- Teacher Model: GLM-4.6 (357B)
- Dataset: [Coming soon]
Intended Use
- Automated software engineering: Bug fixes, feature implementation, refactoring
- Repository specialization: Fine-tune on private codebases to create specialized coding agents (~8,000 trajectories / $1,300)
- Research: Studying coding agents, data generation methods, and agent behavior
Limitations
- SWE-bench training artifact: The model was trained on SWE-bench-style tasks and may attempt to call a nonexistent
submittool when finished editing. The sera-cli proxy handles this automatically. - Evaluation scope: Only validated on SWE-bench Verified (Python repositories). Performance on other languages or benchmarks is unknown.
- Teacher bound: Performance is largely bounded by the teacher model (GLM-4.6) capability.
- Statistical variance: Results computed over 3 seeds. Effects smaller than 2-3% should be interpreted with caution.
- Model-specific: Experiments use Qwen 3 as the base model. Generalization to other model families is not validated.
Bias, Risks, and Limitations
Like any language model without safety filtering, SERA can be prompted to generate harmful or insecure code. Users should be aware of the following risks:
- Code security: May generate code with security vulnerabilities (e.g., injection attacks, insecure defaults). All generated code should be reviewed before deployment.
- Accuracy: May produce incorrect or buggy code. Outputs should be tested and verified.
- Inherited biases: May reflect biases present in the Qwen 3-32B base model and GLM-4.6 teacher model.
- Misuse potential: Could potentially be used to generate malicious code or identify vulnerabilities for exploitation.
Responsible Use
This model is intended for research and educational use. Users should adhere to Ai2's Responsible Use Guidelines. Key principles include:
- Use the model for beneficial purposes
- Review and test all generated code before deployment
- Do not use to generate malicious software or exploit vulnerabilities
- Consider the potential impact of automated code generation in your context
Hardware Requirements
| Configuration | GPU | Notes |
|---|---|---|
| Minimum | 1× 80GB GPU (A100, H100) | 32K context |
| Recommended | 1× H100 | Best performance |
Quantization (AWQ, GPTQ) can reduce memory requirements if needed.
License
This model is licensed under Apache 2.0. It is intended for research and educational use and may be used commercially in accordance with Ai2's Responsible Use Guidelines.
Citation
@article{sera2026,
title={SERA: Soft-Verified Efficient Repository Agents},
author={Shen, Ethan and Tormoen, Daniel and Shah, Saurabh and Farhadi, Ali and Dettmers, Tim},
year={2026},
institution={Allen Institute for AI},
url={https://allenai.org/papers/opencodingagents}
}
Contact
- Email: ethans03@cs.washington.edu, dettmers@cmu.edu
- Issues: GitHub Issues
SERA / Open Coding Agents - Disclaimer Text
Bias, Risks, and Limitations SERA-32B/SERA-8B is an open coding agent model released for research and educational purposes without any safety filtering or safety tuning. As a research artifact, this model is not suitable for real-world use without significant human oversight. Like other coding agents, this model may propagate biases present in training data or generate incorrect or insecure code. Security risks include prompt injection and data leakage. Always verify code outputs and manage context windows to avoid disclosing sensitive data or information.
Bias, Risks, and Limitations Like any base language model or fine-tuned model without safety filtering, these models can easily be prompted by users to generate harmful and sensitive content. Such content may also be produced unintentionally, especially in cases involving bias, so we recommend that users consider the risks when applying this technology. Additionally, many statements from OLMo or any LLM are often inaccurate, so facts should be verified. License This model is licensed under Apache 2.0. It is intended for research and educational use in accordance with Ai2's Responsible Use Guidelines.
- Downloads last month
- -