NousCoder-14B-AWQ
Model Description
NousCoder-14B-AWQ is a 4-bit AWQ (Activation-aware Weight Quantization) quantized version of NousResearch/NousCoder-14B.
This model specializes in competitive programming and coding tasks, achieving 67.87% Pass@1 on LiveCodeBench v6. It has been post-trained on Qwen3-14B using reinforcement learning on 24k verifiable coding problems.
Key Features
- 🔥 Specialized for Coding: Trained with RL on competitive programming problems
- ⚡ 4-bit Quantized: 66% smaller (9.4GB vs 28GB) with minimal quality loss
- 🚀 Fast Inference: Optimized for AWQ Marlin kernel (2-3x faster)
- 💻 Production Ready: Tested and verified for deployment
Model Stats
| Metric | Value |
|---|---|
| Base Model | Qwen3-14B |
| Quantization | 4-bit AWQ |
| Size | 9.4 GB (from 28 GB) |
| VRAM | ~6GB per GPU (2x GPUs) |
| Context Length | 16,384 tokens |
| LiveCodeBench v6 Pass@1 | 67.87% |
| Training | 24k coding problems (RL) |
Usage
With AutoAWQ
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
model = AutoAWQForCausalLM.from_quantized(
"froogai/NousCoder-14B-AWQ",
device_map="auto",
safetensors=True,
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
"froogai/NousCoder-14B-AWQ",
trust_remote_code=True,
)
# Generate code
prompt = "Write a Python function to implement binary search:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.2,
top_p=0.95,
)
code = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(code)
With vLLM (Recommended for Production)
python -m vllm.entrypoints.openai.api_server \
--model froogai/NousCoder-14B-AWQ \
--quantization awq_marlin \
--tensor-parallel-size 2 \
--max-model-len 16384 \
--gpu-memory-utilization 0.85 \
--trust-remote-code
With OpenAI-Compatible API
# Start vLLM server
vllm serve froogai/NousCoder-14B-AWQ \
--quantization awq_marlin \
--tensor-parallel-size 2
# Make API requests
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "froogai/NousCoder-14B-AWQ",
"prompt": "def quicksort(arr):",
"max_tokens": 512
}'
Quantization Details
This model was quantized using AutoAWQ with the following configuration:
{
"zero_point": True,
"q_group_size": 128,
"w_bit": 4,
"version": "GEMM",
}
Calibration
- Dataset: pileval (128 samples)
- Method: Activation-aware Weight Quantization
- Preservation: Coding intelligence maintained through careful calibration
Performance
Benchmarks
| Benchmark | Score |
|---|---|
| LiveCodeBench v6 Pass@1 | 67.87% |
| Base Model Pass@1 | 60.79% |
| Improvement | +7.08% |
Inference Speed
| Hardware | Speed (tokens/sec) |
|---|---|
| 2x RTX 5060 Ti (awq_marlin) | 15-25 |
| 2x RTX 5060 Ti (awq) | 8-12 |
| Single A100 (awq_marlin) | 40-60 |
Memory Usage
| Configuration | VRAM Usage |
|---|---|
| 2x RTX 5060 Ti (TP=2) | ~6GB per GPU |
| Single RTX 5060 Ti | ~12GB |
| Single A100 | ~6GB |
Best Use Cases
This model excels at:
- ✅ Competitive programming problems
- ✅ Algorithm implementation
- ✅ Data structure design
- ✅ Code debugging and optimization
- ✅ Technical interview preparation
- ✅ LeetCode-style challenges
Recommended Generation Parameters
For coding tasks, use these settings:
- temperature: 0.1-0.3 (for deterministic code)
- top_p: 0.95
- max_tokens: 2048+ (for complete solutions)
- presence_penalty: 0.0
- frequency_penalty: 0.0
Hardware Requirements
Minimum Requirements
- VRAM: 12GB (single GPU) or 6GB (2 GPUs with tensor parallelism)
- RAM: 24GB
- Storage: 10GB
Recommended Requirements
- GPUs: 2x NVIDIA RTX 5060 Ti 16GB (32GB total VRAM)
- RAM: 128GB
- Storage: 20GB (for model + cache)
Compatible Hardware
- NVIDIA GPUs with compute capability 7.0+ (for AWQ Marlin)
- CUDA 11.8+ or 12.1+
- Python 3.10+
Limitations
- Quantized model may have slight accuracy degradation compared to FP16
- Requires AWQ-compatible libraries (AutoAWQ or vLLM)
- Best performance on NVIDIA GPUs (CPU inference slower)
Training Details
Base Model
- Architecture: Qwen3-14B
- Parameters: 14B
- License: Apache 2.0
Post-Training
- Method: Reinforcement Learning
- Dataset: 24k verifiable coding problems
- Hardware: 48 B200 GPUs
- Duration: 4 days
- Framework: Atropos (NousResearch training system)
Acknowledgments
- Original Model: NousResearch/NousCoder-14B
- Base Model: Qwen/Qwen3-14B
- Quantization: AutoAWQ library
- Training Team: Joe Li (@JoeLi5050) at NousResearch
Citation
If you use this model, please cite:
@misc{nouscoder_14b_2025,
title={NousCoder-14B: Competitive Programming AI Model},
author={Li, Joe},
organization={NousResearch},
year={2025},
month={January},
url={https://huggingface.co/NousResearch/NousCoder-14B}
}
License
This model is licensed under the Apache 2.0 License. See the LICENSE file for details.
Model Card Authors
Quantized by: froogai
For questions or issues, please:
- Open an issue on the model repository
- Contact: HuggingFace profile
Note: This is a quantized version of the original model. For best performance, use the vLLM inference engine with the awq_marlin quantization backend.
- Downloads last month
- 8