Friday-35B / README.md
dangell7's picture
Upload folder using huggingface_hub
4ebf654 verified
---
license: apache-2.0
base_model: Qwen/Qwen3.6-35B-A3B
tags:
- reasoning
- software-engineering
- moe
- code-review
- architecture
model-index:
- name: Friday-35B
results: []
---
# FRIDAY-35B
A reasoning-enhanced 35B parameter Mixture-of-Experts model fine-tuned for senior software engineering. Built on [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) (256 experts, 8 active per token, ~3B active parameters per forward pass).
FRIDAY reasons at a staff+ engineer level — architectural thinking, tradeoff analysis, and code review with root-cause depth.
## What FRIDAY Does
- **Code review**: Identifies concurrency bugs, data consistency issues, and architectural anti-patterns
- **System design**: Diagnosis → root causes → short-term/long-term solutions
- **Architectural reasoning**: Evaluates tradeoffs rather than prescribing a single answer
- **Multi-language**: Rust, Python, TypeScript, C++, Go, Java
## Eval
Buggy async Python checkout service with 10 planted bugs:
| | FRIDAY-35B | Competitor (API) |
|---|---|---|
| Bugs found | **10/10** | 7/10 |
| Time | **19.5s** | 53.2s |
| Tokens out | 3,156 | 4,226 |
| Throughput | **~162 tok/s** | ~79 tok/s |
FRIDAY found all 10 bugs across both runs. The competitor missed 3: lock TTL expiration during slow payments, null product row dereference, and Redis type mismatch on `lpush`. FRIDAY also flagged the Redis distributed lock as architecturally redundant given proper DB-level locking.
## Training
| | |
|---|---|
| **Base model** | [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) |
| **Architecture** | MoE — 256 experts, 8 active/token, GDN hybrid attention |
| **Method** | Full fine-tune SFT |
| **Training data** | 2,472 reasoning traces |
| **Sequence length** | 8,192 tokens |
| **Epochs** | 3 |
| **Learning rate** | 2e-5, cosine schedule |
| **Precision** | BF16 + TF32 |
| **Framework** | TRL SFTTrainer + DeepSpeed ZeRO-3 |
| **Hardware** | 8× A100 80GB |
## Usage
### With SGLang
```bash
python -m sglang.launch_server \
--model dangell7/Friday-35B \
--dtype bfloat16 \
--tp 8 \
--trust-remote-code
```
### With Transformers
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"dangell7/Friday-35B",
torch_dtype="bfloat16",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("dangell7/Friday-35B")
```
## Limitations
- Autoregressive LLM; may hallucinate technical details
- MoE architecture requires significant VRAM (~8× A100 or equivalent)
- Not a substitute for human code review in production systems
## Acknowledgements
- [Qwen](https://huggingface.co/Qwen) team for Qwen3.6-35B-A3B
- [SGLang](https://github.com/sgl-project/sglang) for high-performance MoE serving
- [TRL](https://github.com/huggingface/trl) and [DeepSpeed](https://github.com/microsoft/DeepSpeed) for training infrastructure
## Citation
```bibtex
@misc{Friday_35B,
title = {FRIDAY-35B},
author = {dangell7},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/dangell7/Friday-35B}}
}
```