Text Generation
Transformers
Safetensors
English
qwen2
qwen2.5-coder
qwen2.5-coder-3b
code-generation
agentic-ai
tool-use
fine-tuned-llm
stack-4
stack-ai
sovereign-ai
enterprise
local-inference
3b-parameter-model
Eval Results (legacy)
text-generation-inference
Instructions to use my-ai-stack/Stack-4.0-Qwen-3B-Merged with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use my-ai-stack/Stack-4.0-Qwen-3B-Merged with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="my-ai-stack/Stack-4.0-Qwen-3B-Merged")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("my-ai-stack/Stack-4.0-Qwen-3B-Merged") model = AutoModelForCausalLM.from_pretrained("my-ai-stack/Stack-4.0-Qwen-3B-Merged") - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use my-ai-stack/Stack-4.0-Qwen-3B-Merged with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "my-ai-stack/Stack-4.0-Qwen-3B-Merged" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-4.0-Qwen-3B-Merged", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/my-ai-stack/Stack-4.0-Qwen-3B-Merged
- SGLang
How to use my-ai-stack/Stack-4.0-Qwen-3B-Merged with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "my-ai-stack/Stack-4.0-Qwen-3B-Merged" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-4.0-Qwen-3B-Merged", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "my-ai-stack/Stack-4.0-Qwen-3B-Merged" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-4.0-Qwen-3B-Merged", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use my-ai-stack/Stack-4.0-Qwen-3B-Merged with Docker Model Runner:
docker model run hf.co/my-ai-stack/Stack-4.0-Qwen-3B-Merged
File size: 6,213 Bytes
1471f73 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 | ---
base_model: Qwen/Qwen2.5-Coder-3B-Instruct
datasets:
- my-ai-stack/Stack-4.0-Dataset
license: apache-2.0
language:
- en
library_name: transformers
pipeline_tag: text-generation
tags:
- qwen2.5-coder
- qwen2.5-coder-3b
- code-generation
- agentic-ai
- tool-use
- fine-tuned-llm
- stack-4
- stack-ai
- sovereign-ai
- enterprise
- local-inference
- 3b-parameter-model
model-index:
- name: Stack 4.0 Omni-Nexus Merged
results:
- task:
type: text-generation
description: HellaSwag commonsense reasoning
dataset:
name: HellaSwag
type: hellaswag
metrics:
- type: acc_norm
value: 74.0%
- task:
type: text-generation
description: ARC-Challenge reasoning
dataset:
name: ARC-Challenge
type: ai2_arc
metrics:
- type: acc_norm
value: 52.0%
---
<div style="background-color: #030406; padding: 60px 40px; border-radius: 40px 40px 0 0; border: 1px solid #111827; border-bottom: none; font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif; color: #ffffff; text-align: center; position: relative; overflow: hidden;">
<div style="position: absolute; top: -100px; left: -100px; width: 400px; height: 400px; background: radial-gradient(circle, rgba(219, 39, 119, 0.08) 0%, transparent 70%);"></div>
<div style="margin: 0 auto 30px; width: 80px; height: 60px; position: relative;">
<div style="position: absolute; width: 100%; height: 18px; background: linear-gradient(135deg, #c084fc 0%, #db2777 100%); border-radius: 6px; top: 0px; z-index: 3; box-shadow: 0 10px 20px rgba(0,0,0,0.5); border-bottom: 2px solid rgba(0,0,0,0.2);"></div>
<div style="position: absolute; width: 100%; height: 18px; background: linear-gradient(135deg, #c084fc 0%, #db2777 100%); border-radius: 6px; top: 22px; z-index: 2; opacity: 0.7; border-bottom: 2px solid rgba(0,0,0,0.2);"></div>
<div style="position: absolute; width: 100%; height: 18px; background: linear-gradient(135deg, #c084fc 0%, #db2777 100%); border-radius: 6px; top: 44px; z-index: 1; opacity: 0.4; border-bottom: 2px solid rgba(0,0,0,0.2);"></div>
</div>
<h1 style="background: linear-gradient(135deg, #ffffff 0%, #a1a1aa 100%); -webkit-background-clip: text; -webkit-text-fill-color: transparent; font-size: 3rem; letter-spacing: -1.5px; margin: 10px 0; font-weight: 800;">Stack 4.0 Omni-Nexus</h1>
<p style="color: #db2777; font-weight: 600; letter-spacing: 3px; text-transform: uppercase; font-size: 0.85rem; margin-bottom: 30px; opacity: 0.9;">Merged · 3B Parameters · Sovereign Agentic Infrastructure</p>
<div align="center" style="display: flex; justify-content: center; gap: 10px; flex-wrap: wrap;">
<img src="https://img.shields.io/badge/Release-v4.0_Alpha-db2777?style=for-the-badge" alt="Version">
<img src="https://img.shields.io/badge/Network-Global-111827?style=for-the-badge&border=db2777" alt="Network">
<img src="https://img.shields.io/badge/Security-Sovereign-c084fc?style=for-the-badge" alt="Security">
</div>
<div style="height: 1px; width: 100%; background: linear-gradient(to right, transparent, #111827, #db2777, #111827, transparent); margin-top: 50px; opacity: 0.5;"></div>
</div>
---
# Stack 4.0 Omni-Nexus — Merged
**Model ID:** `my-ai-stack/Stack-4.0-Qwen-3B-Merged`
A 3-billion parameter instruction-tuned coding model, fully merged from Qwen2.5-Coder-3B-Instruct with 55,000 agentic tool-use conversations baked in. This is the standalone version — no adapter needed, runs directly on any compatible hardware.
## Performance Benchmarks
| Benchmark | Score | Notes |
|-----------|-------|-------|
| HellaSwag (acc_norm) | **74.0%** | 50-sample eval |
| ARC-Challenge (acc_norm) | **52.0%** | 50-sample eval |
| Internal coding sample | **10/10** | All valid Python produced |
## Key Metrics
| Metric | Value |
|--------|-------|
| Parameters | **3B** |
| Training loss (final) | **0.1411** |
| Training steps | 1,000 |
| Hardware | GCP Tesla V100 16GB |
| Training time | ~10 hours |
## Why Merged?
The merged version ships the full model in a single file — no LoRA adapters, no base model dependency. Deploy anywhere that supports Hugging Face Transformers.
## Usage
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
MODEL = "my-ai-stack/Stack-4.0-Qwen-3B-Merged"
tokenizer = AutoTokenizer.from_pretrained(MODEL, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(
MODEL, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True
)
model.eval()
messages = [{"role": "user", "content": "Write a quicksort in Python"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
```
## Training Details
| Parameter | Value |
|-----------|-------|
| Method | QLoRA → Merged |
| LoRA rank | 16 |
| Trainable params | 7.3M / 3.1B (0.24%) |
| Batch size | 1 |
| Grad accumulation | 16 |
| Max length | 512 |
| Learning rate | 2e-4 |
| Optimizer | AdamW (bf16) |
| Hardware | GCP V100 16GB |
## Limitations
- **3B model** — smaller than 7B models; less capable on complex multi-step reasoning
- **English-optimized** — other language performance may vary
- **Tool execution** — tool calls are generated but actual execution requires an agent loop in your application
## See Also
- [LoRA Adapter version](https://huggingface.co/my-ai-stack/Stack-4.0-Qwen-3B-Agentic) — smaller, needs base model
- [Training dataset](https://huggingface.co/my-ai-stack/Stack-4.0-Dataset)
- [Stack 3.0 (7B)](https://huggingface.co/my-ai-stack/Stack-3.0-Omni-Nexus)
## Citation
```bibtex
@misc{stack-4-merged-2026,
title={Stack 4.0 Omni-Nexus — Merged},
author={Stack AI Team},
year={2026},
url={https://huggingface.co/my-ai-stack/Stack-4.0-Qwen-3B-Merged}
}
``` |