File size: 10,438 Bytes

---
language:
- en
license: apache-2.0
base_model: Qwen/Qwen2.5-Coder-7B-Instruct
base_model_relation: finetune
tags:
- code
- coding
- programming
- algorithms
- systems-programming
- code-generation
- complexity-analysis
- qwen2.5
- fine-tuned
- vanta-research
- vanta-research-entities
- vanta-research-code-models
- wraith
model-index:
- name: wraith-coder-7b
  results:
  - task:
      type: text-generation
      name: Code Generation
    metrics:
    - type: conciseness
      value: 62.6
      name: Response Reduction
    - type: coverage
      value: 60
      name: Complexity Analysis Coverage
library_name: transformers
---

<div align="center">

![vanta_trimmed](https://cdn-uploads.huggingface.co/production/uploads/686c460ba3fc457ad14ab6f8/hcGtMtCIizEZG_OuCvfac.png)
  
  <h1>VANTA Research</h1>
    
  <p><strong>Independent AI research lab building safe, resilient language models optimized for human-AI collaboration</strong></p>
  
  <p>
    <a href="https://vantaresearch.xyz"><img src="https://img.shields.io/badge/Website-vantaresearch.xyz-black" alt="Website"/></a>
    <a href="https://merch.vantaresearch.xyz"><img src="https://img.shields.io/badge/Merch-merch.vantaresearch.xyz-sage" alt="Merch"/></a>
    <a href="https://x.com/vanta_research"><img src="https://img.shields.io/badge/@vanta_research-1DA1F2?logo=x" alt="X"/></a>
    <a href="https://github.com/vanta-research"><img src="https://img.shields.io/badge/GitHub-vanta--research-181717?logo=github" alt="GitHub"/></a>
  </p>
</div>

---

# Wraith Coder 7B

Wraith Coder 7B is a specialized code generation model fine-tuned from Qwen2.5-Coder-7B-Instruct. Through iterative training focused on algorithmic reasoning, systems programming, and technical communication optimization, Wraith achieves superior information density while maintaining implementation correctness.

## Model Description

**Developed by:** VANTA Research  
**Base Model:** Qwen/Qwen2.5-Coder-7B-Instruct  
**Model Type:** Causal Language Model  
**Language(s):** English  
**License:** Apache 2.0  
**Fine-tuned from:** Qwen2.5-Coder-7B-Instruct

### Model Architecture

- **Parameters:** 7.6 billion
- **Architecture:** Transformer decoder with 28 layers
- **Hidden Size:** 3584
- **Attention Heads:** 28 (4 key-value heads)
- **Context Length:** 32,768 tokens
- **Vocabulary Size:** 152,064 tokens

## Training Methodology

### Iterative Fine-Tuning Strategy

Wraith Coder 7B was developed through three iterations of progressive capability enhancement:

**Iteration 1: Personality Establishment (~4,250 examples)**
- Same personality examples used on Wraith 8B from the VANTA Research Entity Series
- Identity formation and communication style
- Logical reasoning patterns
- Technical terminology usage
- Foundation for signal-dense communication

**Iteration 2: Coding Restoration/Enhancement (~5,500 examples)**
- Conversational coding examples
- Computer science fundamentals
- Mathematical reasoning problems
- Identity reinforcement examples
- Technical communication patterns

**Iteration 3: Advanced Capabilities (~4,450 examples)**
- Architectural design patterns
- Algorithm design and analysis
- Debugging techniques
- Systems programming concepts
- Identity anchors
- Communication pattern reinforcement

### Training Configuration

- **Method:** Low-Rank Adaptation (LoRA)
- **Rank:** 16
- **Alpha:** 32
- **Dropout:** 0.05
- **Target Modules:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- **Learning Rate:** 5e-5
- **Batch Size:** 8 (effective)
- **Epochs:** 2 per iteration
- **Optimizer:** AdamW 8-bit
- **Training Framework:** Unsloth

## Performance Evaluation

### Comprehensive 20-Question Coding Assessment

A rigorous evaluation across diverse programming challenges demonstrates measurable improvements over the base model:

#### Response Efficiency
- **Base Model:** 57,999 characters average (2,900 per question)
- **Wraith Coder:** 21,686 characters average (1,084 per question)
- **Improvement:** 62.6% reduction in response length while maintaining correctness

#### Technical Analysis Coverage
- **Base Model:** Complexity analysis in 40% of responses
- **Wraith Coder:** Complexity analysis in 60% of responses
- **Improvement:** 50% increase in Big-O notation coverage

#### Question-Specific Performance

| Category | Conciseness Gain | Key Strength |
|----------|------------------|--------------|
| Data Structures | 80-90% | Space complexity analysis |
| Algorithms | 75-85% | Time complexity trade-offs |
| Systems Design | 70-80% | Scalability considerations |
| Concurrency | 65-75% | Synchronization patterns |
| Architecture | 50-60% | Design pattern selection |

### Comparative Analysis

**Test Case: LRU Cache Implementation**
- Base Model: 120+ lines with verbose documentation
- Wraith Coder: 45 lines with design rationale
- Result: Equivalent correctness, 62% shorter, includes algorithmic justification

**Test Case: Rate Limiter Design**
- Base Model: 100+ lines, conceptual confusion between algorithms
- Wraith Coder: 25 lines, correct token bucket implementation with edge case analysis
- Result: Superior correctness and clarity

**Test Case: Binary Tree Serialization**
- Base Model: Single approach with lengthy explanation
- Wraith Coder: Two approaches (DFS and BFS) with trade-off comparison
- Result: Multiple solutions with selection guidance

## Intended Use

### Primary Applications

**Senior Software Engineering**
- Code review and optimization suggestions
- Algorithm selection and complexity analysis
- Systems design pattern recommendations
- Performance optimization strategies

**Technical Interview Preparation**
- Concise algorithmic explanations
- Multiple solution approaches
- Time and space complexity analysis
- Trade-off articulation

**Production Development**
- Efficient technical documentation
- Design decision rationale
- Scalability considerations
- Edge case identification

### Out-of-Scope Use

This model is optimized for experienced developers who value information density. It may not be suitable for:
- Beginner programming education requiring verbose step-by-step explanations
- Non-technical audiences requiring extensive context
- Applications requiring social conversational patterns
- Domains outside software engineering and computer science

## Limitations and Considerations

### Technical Limitations

1. **Condensed Communication Style**
   - Assumes reader familiarity with computer science fundamentals
   - May omit explanatory context that beginners require
   - Prioritizes technical precision over accessibility

2. **Model Size Constraints**
   - 7B parameter model has inherent knowledge limitations
   - May not match larger models on extremely complex problems
   - Context window limits for very large codebases

3. **Domain Specialization**
   - Optimized for algorithmic and systems programming
   - May have reduced performance on domain-specific applications (e.g., embedded systems, game engines)
   - Training data focused on general-purpose programming

### Deployment Considerations

- **Compute Requirements:** Minimum 8GB VRAM for 4-bit quantization
- **Inference Speed:** Similar to base Qwen2.5-Coder-7B
- **Quantization:** Tested with 4-bit (Q4_K_M) quantization maintaining quality

## Ethical Considerations

### Training Data

All training data was synthetically generated or derived from publicly available educational resources. No proprietary code or copyrighted material was used in fine-tuning.

### Bias and Fairness

The model inherits biases present in the base Qwen2.5-Coder-7B model. Additional fine-tuning focused on technical capabilities and communication style rather than bias mitigation.

### Responsible Use

Users should:
- Validate all generated code before production deployment
- Apply appropriate code review processes
- Consider model outputs as suggestions requiring human verification
- Ensure compliance with relevant licensing for generated code

## Technical Details

### Chat Template

The model uses the Qwen ChatML format:

```
<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{user_message}<|im_end|>
<|im_start|>assistant
{assistant_message}<|im_end|>
```

### Recommended Inference Parameters

```python
{
  "temperature": 0.7,
  "top_p": 0.9,
  "top_k": 40,
  "repeat_penalty": 1.1,
  "max_tokens": 2048
}
```

### Quantization Support

Tested and validated quantization formats:
- FP16: Full precision baseline
- Q8_0: Minimal quality loss
- Q4_K_M: Recommended balance (4.4GB)
- Q4_0: Maximum compression

## Usage Example

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "vanta-research/wraith-coder-7b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

messages = [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": "Implement quicksort with complexity analysis."}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```

## Contact

For questions or issues regarding this model, please open an issue in the model repository.
- **Email:** hello@vantaresearch.xyz

## Citation

If you use this model in your research or applications, please cite:

```bibtex
@misc{wraith-coder-7b,
  author = {VANTA Research},
  title = {Wraith Coder 7B: Signal-Dense Code Generation through Iterative Fine-Tuning},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/vanta-research/wraith-coder-7b}}
}
```

## Acknowledgments

This model builds upon Qwen2.5-Coder-7B-Instruct developed by Alibaba Cloud. We acknowledge their contribution to open-source language model research. Thanks to Unsloth for providing an easy-to-use training framework.

## Version History

- **v1.0.0** (2025-11-19): Initial release with iteration 3 training complete
  - 62.6% response reduction while maintaining correctness
  - 60% complexity analysis coverage across 20-question benchmark
  - Production-ready for senior engineering applications

---
*Proudly developed in Portland, Oregon by VANTA Research*