Instructions to use my-ai-stack/Stack-2-9-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use my-ai-stack/Stack-2-9-finetuned with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="my-ai-stack/Stack-2-9-finetuned")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("my-ai-stack/Stack-2-9-finetuned")
model = AutoModelForCausalLM.from_pretrained("my-ai-stack/Stack-2-9-finetuned")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use my-ai-stack/Stack-2-9-finetuned with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "my-ai-stack/Stack-2-9-finetuned"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-2-9-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/my-ai-stack/Stack-2-9-finetuned

SGLang

How to use my-ai-stack/Stack-2-9-finetuned with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "my-ai-stack/Stack-2-9-finetuned" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-2-9-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "my-ai-stack/Stack-2-9-finetuned" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-2-9-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use my-ai-stack/Stack-2-9-finetuned with Docker Model Runner:
```
docker model run hf.co/my-ai-stack/Stack-2-9-finetuned
```

Stack-2-9-finetuned

File size: 8,924 Bytes

b03a8a0

# Stack 2.9 — 5-Minute Quick Start

> **Goal:** Get Stack 2.9 running and solving coding tasks in under 5 minutes.

Stack 2.9 is an AI coding assistant powered by **Qwen2.5-Coder-32B** with Pattern Memory — it learns from your interactions and improves over time.

---

## 📋 Prerequisites

### Required
| Requirement | Version | Check |
|-------------|---------|-------|
| Python | 3.10+ | `python3 --version` |
| Git | Any recent | `git --version` |
| pip | Latest | `pip --version` |

### Optional (Recommended)
| Resource | Why You Need It | Minimum |
|----------|----------------|---------|
| **GPU** | Fast code generation | RTX 3070 / M1 Pro |
| **16GB VRAM** | Run 32B model smoothly | 8GB for 7B quantized |

> **No GPU?** Stack 2.9 works on CPU via Ollama or cloud providers (OpenAI, Together AI, etc.).

---

## ⚡ Step 1 — Install in 60 Seconds

```bash
# 1. Clone the repository
git clone https://github.com/my-ai-stack/stack-2.9.git
cd stack-2.9

# 2. Create a virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate    # On Windows: venv\Scripts\activate

# 3. Install dependencies
pip install --upgrade pip
pip install -r requirements.txt

# 4. Copy environment template
cp .env.example .env
```

**That's it.** If you hit errors, see [Troubleshooting](#-troubleshooting) below.

---

## 🔑 Step 2 — Configure Your Model Provider

Stack 2.9 supports multiple LLM providers. **Pick one that matches your setup:**

### Option A: Ollama (Recommended — Local, Private)

```bash
# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.ai/install.sh | sh

# Pull the Qwen model
ollama pull qwen2.5-coder:32b

# Set environment
export MODEL_PROVIDER=ollama
export OLLAMA_MODEL=qwen2.5-coder:32b
```

Edit your `.env` file:
```env
MODEL_PROVIDER=ollama
OLLAMA_MODEL=qwen2.5-coder:32b
```

### Option B: Together AI (Best for Qwen, Cloud)

```bash
# Get your API key at https://together.ai
export TOGETHER_API_KEY=tog-your-key-here
```

Edit your `.env`:
```env
MODEL_PROVIDER=together
TOGETHER_API_KEY=tog-your-key-here
TOGETHER_MODEL=togethercomputer/qwen2.5-32b-instruct
```

### Option C: OpenAI (GPT-4o)

```env
MODEL_PROVIDER=openai
OPENAI_API_KEY=sk-your-key-here
OPENAI_MODEL=gpt-4o
```

### Option D: Anthropic (Claude)

```env
MODEL_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-your-key-here
ANTHROPIC_MODEL=claude-3-5-sonnet-20240229
```

### Option E: OpenRouter (Unified Access)

```env
MODEL_PROVIDER=openrouter
OPENROUTER_API_KEY=sk-or-your-key-here
OPENROUTER_MODEL=openai/gpt-4o
```

---

## 🚀 Step 3 — Run Your First Task

### Interactive Chat Mode

```bash
python stack.py
```

You'll see:
```
╔══════════════════════════════════════════════╗
║         Stack 2.9 — AI Coding Assistant     ║
║  Pattern Memory: Active | Tools: 46          ║
╚══════════════════════════════════════════════╝

You: Write a Python function to reverse a string
```

### Single Query Mode

```bash
python stack.py -c "Write a Python function to reverse a string"
```

**Expected output:**
```python
def reverse_string(s):
    """Reverse a string and return it."""
    return s[::-1]

# Or for a more robust version:
def reverse_string(s):
    return ''.join(reversed(s))
```

### Ask About Your Codebase

```bash
python stack.py -c "Find all Python files modified in the last week and list them"
```

### Generate and Run Code

```bash
python stack.py -c "Create a hello world Flask app with one route"
```

---

## 📊 Step 4 — Run Evaluation (Optional)

> **Note:** Evaluation requires a GPU with ~16GB VRAM or more.

### Prepare Your Fine-Tuned Model

After training Stack 2.9 on your data, your merged model will be in:
```
./output/merged/
```

### Run HumanEval Benchmark

```bash
python evaluate_model.py \
    --model-path ./output/merged \
    --benchmark humaneval \
    --num-samples 10 \
    --output results.json
```

### Run MBPP Benchmark

```bash
python evaluate_model.py \
    --model-path ./output/merged \
    --benchmark mbpp \
    --num-samples 10 \
    --output results.json
```

### Run Both Benchmarks

```bash
python evaluate_model.py \
    --model-path ./output/merged \
    --benchmark both \
    --num-samples 10 \
    --k-values 1,10 \
    --output results.json
```

**Expected output format:**
```
============================================================
HumanEval Results
============================================================
  pass@1: 65.00%
  pass@10: 82.00%
  Total problems evaluated: 12
============================================================

============================================================
MBPP Results
============================================================
  pass@1: 70.00%
  pass@10: 85.00%
  Total problems evaluated: 12
============================================================
```

### Quick Evaluation (5 Problems Only)

```bash
python evaluate_model.py \
    --model-path ./output/merged \
    --benchmark humaneval \
    --num-problems 5 \
    --num-samples 5
```

---

## 🐳 Step 5 — Deploy Stack 2.9

### Deploy Locally with Docker

```bash
# Start the container
docker build -t stack-2.9 .
docker run -p 7860:7860 \
    -e MODEL_PROVIDER=ollama \
    -e OLLAMA_MODEL=qwen2.5-coder:32b \
    stack-2.9
```

Access at: **http://localhost:7860**

### Deploy to RunPod (Cloud GPU)

```bash
# Edit runpod_deploy.sh with your config first
bash runpod_deploy.sh --gpu a100 --instance hourly
```

### Deploy to Kubernetes

```bash
# 1. Edit k8s/secret.yaml with your HuggingFace token
# 2. Apply the manifests
kubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/secret.yaml
kubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/pvc.yaml
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml

# Check status
kubectl get pods -n stack-29
kubectl logs -n stack-29 deployment/stack-29
```

### Hardware Requirements for Deployment

| Model Size | Minimum GPU | Recommended | Quantized (4-bit) |
|------------|-------------|-------------|-------------------|
| 7B | RTX 3070 (8GB) | A100 40GB | RTX 3060 (6GB) |
| 32B | A100 40GB | A100 80GB | RTX 3090 (24GB) |

---

## 🧠 Pattern Memory Quick Guide

Stack 2.9 stores successful patterns to help with future tasks.

### List Your Patterns

```bash
python stack.py --patterns list
python stack.py --patterns stats
```

### Extract Patterns from Your Git History

```bash
python scripts/extract_patterns_from_git.py \
    --repo-path . \
    --output patterns.jsonl \
    --since-date "2024-01-01"
```

### Merge LoRA Adapters (Team Sharing)

```bash
python scripts/merge_lora_adapters.py \
    --adapters adapter_a.safetensors adapter_b.safetensors \
    --weights 0.7 0.3 \
    --output merged.safetensors
```

---

## 🛠️ Troubleshooting

### "Module not found" errors

```bash
pip install -r requirements.txt
```

### "CUDA out of memory" during evaluation

```bash
# Reduce batch size
python evaluate_model.py --model-path ./merged --num-samples 5

# Or use 4-bit quantization
# (See docs/TRAINING_7B.md for quantized training)
```

### "Model not found" with Ollama

```bash
ollama pull qwen2.5-coder:32b
ollama list   # Verify it's installed
```

### "API key not set" errors

```bash
# Double-check your .env file
cat .env

# For testing, you can also set inline
export TOGETHER_API_KEY=tog-your-key
```

### Slow inference on CPU

```bash
# Use a smaller model
export OLLAMA_MODEL=qwen2.5-coder:7b

# Or switch to cloud
export MODEL_PROVIDER=together
```

### Docker build fails

```bash
# Use Python 3.10 explicitly
docker build --build-arg PYTHON_VERSION=3.10 -t stack-2.9 .
```

### Kubernetes GPU not found

```bash
# Verify nvidia.com/gpu label on your node
kubectl get nodes -L nvidia.com/gpu

# Install NVIDIA GPU Operator if missing
# https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/
```

---

## 📚 What's Next?

| Goal | Go To |
|------|-------|
| Train on my own data | `docs/TRAINING_7B.md` |
| Learn all 46 tools | `TOOLS.md` |
| Set up team pattern sharing | `docs/pattern-moat.md` |
| Understand the architecture | `docs/reference/ARCHITECTURE.md` |
| Report a bug | `SECURITY.md` / GitHub Issues |

---

## ⚡ Quick Reference Card

```bash
# Install
git clone https://github.com/my-ai-stack/stack-2.9.git
cd stack-2.9 && pip install -r requirements.txt

# Configure
cp .env.example .env   # Edit with your API keys

# Run
python stack.py                              # Interactive
python stack.py -c "your code request"        # Single query

# Evaluate
python evaluate_model.py --model-path ./merged --benchmark humaneval

# Deploy
docker build -t stack-2.9 . && docker run -p 7860:7860 stack-2.9
```

---

*Stack 2.9 — AI that learns your patterns and grows with you.*