--- base_model: Qwen/Qwen2.5-Coder-3B-Instruct datasets: - my-ai-stack/Stack-4.0-Dataset license: apache-2.0 language: - en library_name: transformers pipeline_tag: text-generation tags: - qwen2.5-coder - qwen2.5-coder-3b - code-generation - agentic-ai - tool-use - fine-tuned-llm - stack-4 - stack-ai - sovereign-ai - enterprise - local-inference - 3b-parameter-model model-index: - name: Stack 4.0 Omni-Nexus Merged results: - task: type: text-generation description: HellaSwag commonsense reasoning dataset: name: HellaSwag type: hellaswag metrics: - type: acc_norm value: 74.0% - task: type: text-generation description: ARC-Challenge reasoning dataset: name: ARC-Challenge type: ai2_arc metrics: - type: acc_norm value: 52.0% ---
--- # Stack 4.0 Omni-Nexus — Merged **Model ID:** `my-ai-stack/Stack-4.0-Qwen-3B-Merged` A 3-billion parameter instruction-tuned coding model, fully merged from Qwen2.5-Coder-3B-Instruct with 55,000 agentic tool-use conversations baked in. This is the standalone version — no adapter needed, runs directly on any compatible hardware. ## Performance Benchmarks | Benchmark | Score | Notes | |-----------|-------|-------| | HellaSwag (acc_norm) | **74.0%** | 50-sample eval | | ARC-Challenge (acc_norm) | **52.0%** | 50-sample eval | | Internal coding sample | **10/10** | All valid Python produced | ## Key Metrics | Metric | Value | |--------|-------| | Parameters | **3B** | | Training loss (final) | **0.1411** | | Training steps | 1,000 | | Hardware | GCP Tesla V100 16GB | | Training time | ~10 hours | ## Why Merged? The merged version ships the full model in a single file — no LoRA adapters, no base model dependency. Deploy anywhere that supports Hugging Face Transformers. ## Usage ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch MODEL = "my-ai-stack/Stack-4.0-Qwen-3B-Merged" tokenizer = AutoTokenizer.from_pretrained(MODEL, trust_remote_code=True) tokenizer.pad_token = tokenizer.eos_token model = AutoModelForCausalLM.from_pretrained( MODEL, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True ) model.eval() messages = [{"role": "user", "content": "Write a quicksort in Python"}] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt").to(model.device) with torch.no_grad(): out = model.generate(**inputs, max_new_tokens=512, temperature=0.7) print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)) ``` ## Training Details | Parameter | Value | |-----------|-------| | Method | QLoRA → Merged | | LoRA rank | 16 | | Trainable params | 7.3M / 3.1B (0.24%) | | Batch size | 1 | | Grad accumulation | 16 | | Max length | 512 | | Learning rate | 2e-4 | | Optimizer | AdamW (bf16) | | Hardware | GCP V100 16GB | ## Limitations - **3B model** — smaller than 7B models; less capable on complex multi-step reasoning - **English-optimized** — other language performance may vary - **Tool execution** — tool calls are generated but actual execution requires an agent loop in your application ## See Also - [LoRA Adapter version](https://huggingface.co/my-ai-stack/Stack-4.0-Qwen-3B-Agentic) — smaller, needs base model - [Training dataset](https://huggingface.co/my-ai-stack/Stack-4.0-Dataset) - [Stack 3.0 (7B)](https://huggingface.co/my-ai-stack/Stack-3.0-Omni-Nexus) ## Citation ```bibtex @misc{stack-4-merged-2026, title={Stack 4.0 Omni-Nexus — Merged}, author={Stack AI Team}, year={2026}, url={https://huggingface.co/my-ai-stack/Stack-4.0-Qwen-3B-Merged} } ```