Instructions to use my-ai-stack/Stack-2-9-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use my-ai-stack/Stack-2-9-finetuned with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="my-ai-stack/Stack-2-9-finetuned")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("my-ai-stack/Stack-2-9-finetuned")
model = AutoModelForCausalLM.from_pretrained("my-ai-stack/Stack-2-9-finetuned")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use my-ai-stack/Stack-2-9-finetuned with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "my-ai-stack/Stack-2-9-finetuned"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-2-9-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/my-ai-stack/Stack-2-9-finetuned

SGLang

How to use my-ai-stack/Stack-2-9-finetuned with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "my-ai-stack/Stack-2-9-finetuned" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-2-9-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "my-ai-stack/Stack-2-9-finetuned" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-2-9-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use my-ai-stack/Stack-2-9-finetuned with Docker Model Runner:
```
docker model run hf.co/my-ai-stack/Stack-2-9-finetuned
```

Stack-2-9-finetuned / stack /internal /SETUP.md

walidsobhie-code

refactor: Squeeze folders further - cleaner structure

65888d5 about 2 months ago

preview code

raw

history blame

11.3 kB

	# Stack 2.9 - Detailed Setup Guide

	This guide provides comprehensive instructions for setting up Stack 2.9 in various environments.

	## Table of Contents

	- [Prerequisites](#prerequisites)
	- [Hardware Requirements](#hardware-requirements)
	- [Software Prerequisites](#software-prerequisites)
	- [Installation Methods](#installation-methods)
	- [Configuration](#configuration)
	- [Docker Setup](#docker-setup)
	- [Development Setup](#development-setup)
	- [Troubleshooting](#troubleshooting)

	---

	## Prerequisites

	### Minimum Requirements

	\| Component \| Minimum \| Recommended \|
	\|-----------\|---------\|-------------\|
	\| Python \| 3.8+ \| 3.11+ \|
	\| Node.js \| 18+ \| 20 LTS \|
	\| RAM \| 8 GB \| 16 GB \|
	\| GPU VRAM \| 8 GB \| 24 GB \|
	\| Disk Space \| 10 GB \| 50 GB \|

	### Operating Systems

	- ✅ Linux (Ubuntu 20.04+, CentOS 8+, Debian 11+)
	- ✅ macOS (12 Monterey or later)
	- ⚠️ Windows (via WSL2 or Docker)

	---

	## Hardware Requirements

	### CPU-Only Inference (Testing/Development)

	For local development and testing without GPU:

	\| Component \| Specification \|
	\|-----------\|---------------\|
	\| CPU \| 4+ cores \|
	\| RAM \| 8 GB minimum \|
	\| Storage \| 10 GB free space \|

	### GPU Inference (Production)

	For production deployment with optimal performance:

	\| Component \| Specification \| Notes \|
	\|-----------\|---------------\|-------\|
	\| GPU \| NVIDIA GPU with 24GB+ VRAM \| A100, H100, RTX 3090, RTX 4090 \|
	\| CPU \| 8+ cores \| AMD Ryzen 9, Intel i9 \|
	\| RAM \| 32 GB \| 64 GB recommended \|
	\| NVMe Storage \| 50 GB+ \| For model caching \|

	### Multi-GPU Setup

	For high-throughput production workloads:

	```bash
	# Example: 2x A100 80GB setup
	export CUDA_VISIBLE_DEVICES=0,1

	# Stack 2.9 will automatically use tensor parallelism
	python stack.py --num-gpus 2
	```

	---

	## Software Prerequisites

	### 1. Python Installation

	Linux/macOS:

	```bash
	# Using pyenv (recommended)
	curl https://pyenv.run \| bash
	pyenv install 3.11.4
	pyenv global 3.11.4

	# Verify installation
	python --version
	```

	Windows (WSL2):

	```bash
	# Install WSL2 first
	wsl --install -d Ubuntu-22.04

	# Then install Python
	sudo apt update
	sudo apt install python3.11 python3.11-venv python3-pip
	```

	### 2. Node.js Installation

	Linux/macOS:

	```bash
	# Using nvm (recommended)
	curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.0/install.sh \| bash
	nvm install 20
	nvm use 20

	# Verify installation
	node --version
	```

	Windows:

	Download from [nodejs.org](https://nodejs.org/) or use winget:

	```powershell
	winget install OpenJS.NodeJS.LTS
	```

	### 3. CUDA Setup (GPU Support)

	```bash
	# Check CUDA version
	nvidia-smi

	# Install CUDA Toolkit (if needed)
	# Download from: https://developer.nvidia.com/cuda-downloads

	# Install cuDNN
	sudo apt install libcudnn8 libcudnn8-dev

	# Verify CUDA
	python -c "import torch; print(torch.cuda.is_available())"
	```

	### 4. Docker Installation (Optional)

	```bash
	# Linux
	curl -fsSL https://get.docker.com \| bash
	sudo usermod -aG docker $USER

	# macOS
	brew install --cask docker

	# Windows
	# Download Docker Desktop from https://docker.com
	```

	---

	## Installation Methods

	### Method 1: Standard Installation

	```bash
	# Clone repository
	git clone https://github.com/openclaw/stack-2.9.git
	cd stack-2.9

	# Create virtual environment
	python -m venv .venv
	source .venv/bin/activate # Linux/macOS
	# or: .venv\Scripts\activate # Windows

	# Install Python dependencies
	pip install -r requirements.txt

	# Install Node.js dependencies (for voice features)
	npm install

	# Verify installation
	python stack.py --version
	```

	### Method 2: Development Installation

	```bash
	# Clone and setup
	git clone https://github.com/openclaw/stack-2.9.git
	cd stack-2.9

	# Install in editable mode with dev dependencies
	pip install -e ".[dev]"

	# Install pre-commit hooks
	pre-commit install

	# Run tests to verify
	pytest
	```

	### Method 3: Docker Installation

	```bash
	# Build the image
	docker build -t stack-2.9 .

	# Run with GPU support (Linux)
	docker run --gpus all -p 3000:3000 \
	-v $(pwd)/data:/app/data \
	stack-2.9

	# Run on macOS (uses Metal acceleration)
	docker run -p 3000:3000 \
	-v $(pwd)/data:/app/data \
	stack-2.9
	```

	### Method 4: Kubernetes Deployment

	```yaml
	# stack-2.9-deployment.yaml
	apiVersion: apps/v1
	kind: Deployment
	metadata:
	name: stack-2-9
	spec:
	replicas: 2
	selector:
	matchLabels:
	app: stack-2-9
	template:
	metadata:
	labels:
	app: stack-2-9
	spec:
	containers:
	- name: stack-2-9
	image: stack-2.9:latest
	resources:
	limits:
	nvidia.com/gpu: 1
	memory: 32Gi
	ports:
	- containerPort: 3000
	env:
	- name: API_KEY
	valueFrom:
	secretKeyRef:
	name: stack-2-9-secrets
	key: api-key
	```

	---

	## Configuration

	### Environment Variables

	Create a `.env` file in the project root:

	```bash
	# API Configuration
	API_HOST=0.0.0.0
	API_PORT=3000
	API_KEY=your-secret-api-key-here

	# Model Configuration
	MODEL_NAME=qwen/qwen2.5-coder-32b
	CONTEXT_WINDOW=131072
	QUANTIZATION=awq

	# GPU Configuration
	CUDA_VISIBLE_DEVICES=0
	NUM_GPUS=1

	# Self-Evolution
	ENABLE_SELF_EVOLUTION=true
	EVOLUTION_INTERVAL_HOURS=24

	# Logging
	LOG_LEVEL=INFO
	LOG_FILE=stack-2.9.log

	# Database
	MEMORY_DB_PATH=./self_evolution/data/memory.db
	```

	### Configuration File

	Create `config.yaml` for advanced configuration:

	```yaml
	# Stack 2.9 Configuration
	server:
	host: 0.0.0.0
	port: 3000
	workers: 4
	timeout: 300

	model:
	name: qwen/qwen2.5-coder-32b
	device: cuda
	quantization: awq
	context_window: 131072
	temperature: 0.7
	max_tokens: 4096

	tools:
	enabled:
	- file_operations
	- git_operations
	- shell_commands
	- api_calls
	- search
	- voice
	sandbox:
	enabled: true
	timeout: 30

	self_evolution:
	enabled: true
	interval_hours: 24
	min_success_for_pattern: 3
	min_failure_for_pattern: 2
	max_memories: 10000

	memory:
	db_path: ./self_evolution/data/memory.db
	embedding_dim: 128
	similarity_threshold: 0.3

	rate_limiting:
	enabled: true
	requests_per_minute: 100
	tokens_per_day: 100000
	concurrent_requests: 5

	logging:
	level: INFO
	file: stack-2.9.log
	format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
	```

	### API Configuration

	#### Authentication

	```bash
	# Generate API key
	openssl rand -hex 32

	# Set in environment
	export API_KEY=your-generated-key
	```

	#### Rate Limiting

	\| Tier \| Requests/min \| Tokens/day \| Concurrent \|
	\|------\|-------------\|------------\|------------\|
	\| Free \| 100 \| 100,000 \| 5 \|
	\| Pro \| 1,000 \| 10,000,000 \| 20 \|
	\| Enterprise \| Custom \| Custom \| Custom \|

	---

	## Docker Setup

	### Building the Image

	```bash
	# Build with GPU support
	docker build -t stack-2.9:gpu -f Dockerfile.gpu .

	# Build for CPU only
	docker build -t stack-2.9:cpu -f Dockerfile.cpu .
	```

	### Running with Docker Compose

	```yaml
	# docker-compose.yml
	version: '3.8'

	services:
	stack-2-9:
	build: .
	ports:
	- "3000:3000"
	environment:
	- API_KEY=${API_KEY}
	- MODEL_NAME=qwen/qwen2.5-coder-32b
	volumes:
	- ./data:/app/data
	- ./config.yaml:/app/config.yaml
	deploy:
	resources:
	reservations:
	devices:
	- driver: nvidia
	count: 1
	capabilities: [gpu]

	redis:
	image: redis:7-alpine
	ports:
	- "6379:6379"
	volumes:
	- redis_data:/data

	volumes:
	redis_data:
	```

	```bash
	# Start services
	docker-compose up -d

	# View logs
	docker-compose logs -f stack-2-9
	```

	---

	## Development Setup

	### Setting Up Development Environment

	```bash
	# Clone repository
	git clone https://github.com/openclaw/stack-2.9.git
	cd stack-2.9

	# Create virtual environment
	python -m venv .venv
	source .venv/bin/activate

	# Install with dev dependencies
	pip install -e ".[dev]"

	# Install pre-commit
	pre-commit install

	# Run pre-commit checks
	pre-commit run --all-files

	# Run tests
	pytest -v
	```

	### IDE Setup

	VS Code:

	```json
	// .vscode/settings.json
	{
	"python.defaultInterpreterPath": ".venv/bin/python",
	"python.linting.enabled": true,
	"python.linting.pylintEnabled": true,
	"python.formatting.provider": "black",
	"editor.formatOnSave": true,
	"files.exclude": {
	"**/__pycache__": true,
	"*/.pyc": true
	}
	}
	```

	PyCharm:

	1. Open Settings → Project → Python Interpreter
	2. Add Interpreter → Existing Environment
	3. Select `.venv/bin/python`
	4. Enable Black for formatting

	### Running Development Server

	```bash
	# Start with auto-reload
	python -m stack_cli.dev

	# Or with debug mode
	DEBUG=1 python stack.py
	```

	---

	## Troubleshooting

	### Common Issues

	#### 1. Import Errors

	Problem: `ModuleNotFoundError: No module named 'stack_cli'`

	Solution:

	```bash
	# Reinstall in editable mode
	pip install -e .

	# Or add to PYTHONPATH
	export PYTHONPATH="${PYTHONPATH}:$(pwd)"
	```

	#### 2. CUDA/GPU Issues

	Problem: `CUDA out of memory` or `RuntimeError: CUDA not available`

	Solutions:

	```bash
	# Check GPU availability
	python -c "import torch; print(torch.cuda.is_available())"

	# Clear GPU cache
	nvidia-smi --gpu-reset

	# Use smaller batch size
	python stack.py --batch-size 1

	# Use quantization
	python stack.py --quantization awq
	```

	#### 3. Memory Issues

	Problem: `OutOfMemoryError` during inference

	Solutions:

	```bash
	# Increase swap space (Linux)
	sudo fallocate -l 16G /swapfile
	sudo chmod 600 /swapfile
	sudo mkswap /swapfile
	sudo swapon /swapfile

	# Use model quantization
	python stack.py --quantization 4bit

	# Reduce context window
	python stack.py --context-window 32768
	```

	#### 4. Permission Issues (Linux)

	Problem: `Permission denied` errors

	Solutions:

	```bash
	# Fix script permissions
	chmod +x stack.py install.sh setup.sh

	# Fix directory permissions
	chmod 755 self_evolution/data

	# Add user to docker group
	sudo usermod -aG docker $USER
	newgrp docker
	```

	#### 5. Node.js Issues

	Problem: `npm ERR!` during installation

	Solutions:

	```bash
	# Clear npm cache
	npm cache clean --force

	# Install with legacy peer deps
	npm install --legacy-peer-deps

	# Use specific Node version
	nvm use 20
	```

	#### 6. Port Already in Use

	Problem: `OSError: [Errno 98] Address already in use`

	Solutions:

	```bash
	# Find process using port 3000
	lsof -i :3000

	# Kill the process
	kill -9 <PID>

	# Or use a different port
	python stack.py --port 3001
	```

	### Diagnostic Commands

	```bash
	# Check system resources
	nvidia-smi
	free -h
	df -h

	# Check Python environment
	python --version
	pip list \| grep -E "(torch\|transformers\|openai)"

	# Verify installation
	python -c "from stack_cli import cli; print('OK')"

	# Run diagnostics
	python scripts/diagnostics.py
	```

	### Getting Help

	If you encounter issues not covered here:

	1. Check existing issues: [GitHub Issues](https://github.com/openclaw/stack-2.9/issues)
	2. Ask in discussions: [GitHub Discussions](https://github.com/openclaw/stack-2.9/discussions)
	3. Email support: support@stack2.9.openclaw.org

	---

	## Next Steps

	- [API Documentation](API.md) - Integrate Stack 2.9 into your applications
	- [Architecture Guide](ARCHITECTURE.md) - Understand the technical internals
	- [Contributing Guide](CONTRIBUTING.md) - Help improve Stack 2.9