dangell7
/

Friday-35B

qwen3_5_moe_text

software-engineering

Mixture of Experts

Model card Files Files and versions

Friday-35B / README.md

dangell7's picture

Upload folder using huggingface_hub

4ebf654 verified about 1 month ago

|

history blame contribute delete

3.19 kB

	---
	license: apache-2.0
	base_model: Qwen/Qwen3.6-35B-A3B
	tags:
	- reasoning
	- software-engineering
	- moe
	- code-review
	- architecture
	model-index:
	- name: Friday-35B
	results: []
	---

	# FRIDAY-35B

	A reasoning-enhanced 35B parameter Mixture-of-Experts model fine-tuned for senior software engineering. Built on [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) (256 experts, 8 active per token, ~3B active parameters per forward pass).

	FRIDAY reasons at a staff+ engineer level — architectural thinking, tradeoff analysis, and code review with root-cause depth.

	## What FRIDAY Does

	- Code review: Identifies concurrency bugs, data consistency issues, and architectural anti-patterns
	- System design: Diagnosis → root causes → short-term/long-term solutions
	- Architectural reasoning: Evaluates tradeoffs rather than prescribing a single answer
	- Multi-language: Rust, Python, TypeScript, C++, Go, Java

	## Eval

	Buggy async Python checkout service with 10 planted bugs:

	\| \| FRIDAY-35B \| Competitor (API) \|
	\|---\|---\|---\|
	\| Bugs found \| 10/10 \| 7/10 \|
	\| Time \| 19.5s \| 53.2s \|
	\| Tokens out \| 3,156 \| 4,226 \|
	\| Throughput \| ~162 tok/s \| ~79 tok/s \|

	FRIDAY found all 10 bugs across both runs. The competitor missed 3: lock TTL expiration during slow payments, null product row dereference, and Redis type mismatch on `lpush`. FRIDAY also flagged the Redis distributed lock as architecturally redundant given proper DB-level locking.

	## Training

	\| \| \|
	\|---\|---\|
	\| Base model \| [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) \|
	\| Architecture \| MoE — 256 experts, 8 active/token, GDN hybrid attention \|
	\| Method \| Full fine-tune SFT \|
	\| Training data \| 2,472 reasoning traces \|
	\| Sequence length \| 8,192 tokens \|
	\| Epochs \| 3 \|
	\| Learning rate \| 2e-5, cosine schedule \|
	\| Precision \| BF16 + TF32 \|
	\| Framework \| TRL SFTTrainer + DeepSpeed ZeRO-3 \|
	\| Hardware \| 8× A100 80GB \|

	## Usage

	### With SGLang

	```bash
	python -m sglang.launch_server \
	--model dangell7/Friday-35B \
	--dtype bfloat16 \
	--tp 8 \
	--trust-remote-code
	```

	### With Transformers

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained(
	"dangell7/Friday-35B",
	torch_dtype="bfloat16",
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained("dangell7/Friday-35B")
	```

	## Limitations

	- Autoregressive LLM; may hallucinate technical details
	- MoE architecture requires significant VRAM (~8× A100 or equivalent)
	- Not a substitute for human code review in production systems

	## Acknowledgements

	- [Qwen](https://huggingface.co/Qwen) team for Qwen3.6-35B-A3B
	- [SGLang](https://github.com/sgl-project/sglang) for high-performance MoE serving
	- [TRL](https://github.com/huggingface/trl) and [DeepSpeed](https://github.com/microsoft/DeepSpeed) for training infrastructure

	## Citation

	```bibtex
	@misc{Friday_35B,
	title = {FRIDAY-35B},
	author = {dangell7},
	year = {2026},
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/dangell7/Friday-35B}}
	}
	```