| --- |
| license: apache-2.0 |
| base_model: Qwen/Qwen3.6-35B-A3B |
| tags: |
| - reasoning |
| - software-engineering |
| - moe |
| - code-review |
| - architecture |
| model-index: |
| - name: Friday-35B |
| results: [] |
| --- |
| |
| # FRIDAY-35B |
|
|
| A reasoning-enhanced 35B parameter Mixture-of-Experts model fine-tuned for senior software engineering. Built on [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) (256 experts, 8 active per token, ~3B active parameters per forward pass). |
|
|
| FRIDAY reasons at a staff+ engineer level — architectural thinking, tradeoff analysis, and code review with root-cause depth. |
|
|
| ## What FRIDAY Does |
|
|
| - **Code review**: Identifies concurrency bugs, data consistency issues, and architectural anti-patterns |
| - **System design**: Diagnosis → root causes → short-term/long-term solutions |
| - **Architectural reasoning**: Evaluates tradeoffs rather than prescribing a single answer |
| - **Multi-language**: Rust, Python, TypeScript, C++, Go, Java |
|
|
| ## Eval |
|
|
| Buggy async Python checkout service with 10 planted bugs: |
|
|
| | | FRIDAY-35B | Competitor (API) | |
| |---|---|---| |
| | Bugs found | **10/10** | 7/10 | |
| | Time | **19.5s** | 53.2s | |
| | Tokens out | 3,156 | 4,226 | |
| | Throughput | **~162 tok/s** | ~79 tok/s | |
|
|
| FRIDAY found all 10 bugs across both runs. The competitor missed 3: lock TTL expiration during slow payments, null product row dereference, and Redis type mismatch on `lpush`. FRIDAY also flagged the Redis distributed lock as architecturally redundant given proper DB-level locking. |
|
|
| ## Training |
|
|
| | | | |
| |---|---| |
| | **Base model** | [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) | |
| | **Architecture** | MoE — 256 experts, 8 active/token, GDN hybrid attention | |
| | **Method** | Full fine-tune SFT | |
| | **Training data** | 2,472 reasoning traces | |
| | **Sequence length** | 8,192 tokens | |
| | **Epochs** | 3 | |
| | **Learning rate** | 2e-5, cosine schedule | |
| | **Precision** | BF16 + TF32 | |
| | **Framework** | TRL SFTTrainer + DeepSpeed ZeRO-3 | |
| | **Hardware** | 8× A100 80GB | |
|
|
| ## Usage |
|
|
| ### With SGLang |
|
|
| ```bash |
| python -m sglang.launch_server \ |
| --model dangell7/Friday-35B \ |
| --dtype bfloat16 \ |
| --tp 8 \ |
| --trust-remote-code |
| ``` |
|
|
| ### With Transformers |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model = AutoModelForCausalLM.from_pretrained( |
| "dangell7/Friday-35B", |
| torch_dtype="bfloat16", |
| device_map="auto" |
| ) |
| tokenizer = AutoTokenizer.from_pretrained("dangell7/Friday-35B") |
| ``` |
|
|
| ## Limitations |
|
|
| - Autoregressive LLM; may hallucinate technical details |
| - MoE architecture requires significant VRAM (~8× A100 or equivalent) |
| - Not a substitute for human code review in production systems |
|
|
| ## Acknowledgements |
|
|
| - [Qwen](https://huggingface.co/Qwen) team for Qwen3.6-35B-A3B |
| - [SGLang](https://github.com/sgl-project/sglang) for high-performance MoE serving |
| - [TRL](https://github.com/huggingface/trl) and [DeepSpeed](https://github.com/microsoft/DeepSpeed) for training infrastructure |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{Friday_35B, |
| title = {FRIDAY-35B}, |
| author = {dangell7}, |
| year = {2026}, |
| publisher = {Hugging Face}, |
| howpublished = {\url{https://huggingface.co/dangell7/Friday-35B}} |
| } |
| ``` |
|
|