--- license: apache-2.0 base_model: Qwen/Qwen3.6-35B-A3B tags: - reasoning - software-engineering - moe - code-review - architecture model-index: - name: Friday-35B results: [] --- # FRIDAY-35B A reasoning-enhanced 35B parameter Mixture-of-Experts model fine-tuned for senior software engineering. Built on [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) (256 experts, 8 active per token, ~3B active parameters per forward pass). FRIDAY reasons at a staff+ engineer level — architectural thinking, tradeoff analysis, and code review with root-cause depth. ## What FRIDAY Does - **Code review**: Identifies concurrency bugs, data consistency issues, and architectural anti-patterns - **System design**: Diagnosis → root causes → short-term/long-term solutions - **Architectural reasoning**: Evaluates tradeoffs rather than prescribing a single answer - **Multi-language**: Rust, Python, TypeScript, C++, Go, Java ## Eval Buggy async Python checkout service with 10 planted bugs: | | FRIDAY-35B | Competitor (API) | |---|---|---| | Bugs found | **10/10** | 7/10 | | Time | **19.5s** | 53.2s | | Tokens out | 3,156 | 4,226 | | Throughput | **~162 tok/s** | ~79 tok/s | FRIDAY found all 10 bugs across both runs. The competitor missed 3: lock TTL expiration during slow payments, null product row dereference, and Redis type mismatch on `lpush`. FRIDAY also flagged the Redis distributed lock as architecturally redundant given proper DB-level locking. ## Training | | | |---|---| | **Base model** | [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) | | **Architecture** | MoE — 256 experts, 8 active/token, GDN hybrid attention | | **Method** | Full fine-tune SFT | | **Training data** | 2,472 reasoning traces | | **Sequence length** | 8,192 tokens | | **Epochs** | 3 | | **Learning rate** | 2e-5, cosine schedule | | **Precision** | BF16 + TF32 | | **Framework** | TRL SFTTrainer + DeepSpeed ZeRO-3 | | **Hardware** | 8× A100 80GB | ## Usage ### With SGLang ```bash python -m sglang.launch_server \ --model dangell7/Friday-35B \ --dtype bfloat16 \ --tp 8 \ --trust-remote-code ``` ### With Transformers ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained( "dangell7/Friday-35B", torch_dtype="bfloat16", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("dangell7/Friday-35B") ``` ## Limitations - Autoregressive LLM; may hallucinate technical details - MoE architecture requires significant VRAM (~8× A100 or equivalent) - Not a substitute for human code review in production systems ## Acknowledgements - [Qwen](https://huggingface.co/Qwen) team for Qwen3.6-35B-A3B - [SGLang](https://github.com/sgl-project/sglang) for high-performance MoE serving - [TRL](https://github.com/huggingface/trl) and [DeepSpeed](https://github.com/microsoft/DeepSpeed) for training infrastructure ## Citation ```bibtex @misc{Friday_35B, title = {FRIDAY-35B}, author = {dangell7}, year = {2026}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/dangell7/Friday-35B}} } ```