Instructions to use my-ai-stack/Stack-2-9-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use my-ai-stack/Stack-2-9-finetuned with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="my-ai-stack/Stack-2-9-finetuned")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("my-ai-stack/Stack-2-9-finetuned")
model = AutoModelForCausalLM.from_pretrained("my-ai-stack/Stack-2-9-finetuned")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use my-ai-stack/Stack-2-9-finetuned with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "my-ai-stack/Stack-2-9-finetuned"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-2-9-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/my-ai-stack/Stack-2-9-finetuned

SGLang

How to use my-ai-stack/Stack-2-9-finetuned with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "my-ai-stack/Stack-2-9-finetuned" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-2-9-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "my-ai-stack/Stack-2-9-finetuned" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-2-9-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use my-ai-stack/Stack-2-9-finetuned with Docker Model Runner:
```
docker model run hf.co/my-ai-stack/Stack-2-9-finetuned
```

Stack-2-9-finetuned / stack /internal /API.md

walidsobhie-code

refactor: Squeeze folders further - cleaner structure

65888d5 about 2 months ago

preview code

raw

history blame

17.7 kB

	# Stack 2.9 API Documentation

	Complete API reference for integrating Stack 2.9 into your applications.

	## Table of Contents

	- [Overview](#overview)
	- [Authentication](#authentication)
	- [Rate Limits](#rate-limits)
	- [REST Endpoints](#rest-endpoints)
	- [WebSocket Streaming](#websocket-streaming)
	- [Request/Response Formats](#requestresponse-formats)
	- [Error Handling](#error-handling)
	- [SDKs and Examples](#sdks-and-examples)

	---

	## Overview

	Stack 2.9 provides an OpenAI-compatible API for seamless integration with existing tools and workflows.

	### Base URL

	```
	Production: https://api.stack2.9.openclaw.org/v1
	Local: http://localhost:3000/v1
	```

	### API Versioning

	The current API version is `v1`. Version information is included in all responses.

	```json
	{
	"api_version": "1.0",
	"deprecation_date": null
	}
	```

	---

	## Authentication

	### API Key Authentication

	Include your API key in the `Authorization` header:

	```bash
	curl -H "Authorization: Bearer YOUR_API_KEY" \
	-H "Content-Type: application/json" \
	https://api.stack2.9.openclaw.org/v1/chat/completions
	```

	### Obtaining an API Key

	1. Self-hosted: Set `API_KEY` environment variable
	2. Cloud: Sign up at [stack2.9.openclaw.org](https://stack2.9.openclaw.org)

	### Authentication Errors

	\| Status \| Error Type \| Description \|
	\|--------\|------------\|-------------\|
	\| 401 \| `invalid_api_key` \| API key is missing or invalid \|
	\| 403 \| `account_disabled` \| Account has been disabled \|
	\| 429 \| `rate_limit_exceeded` \| Too many requests \|

	---

	## Rate Limits

	### Tier Limits

	\| Tier \| Requests/min \| Tokens/day \| Concurrent \| WebSocket \|
	\|------\|-------------\|------------\|------------\|-----------\|
	\| Free \| 100 \| 100,000 \| 5 \| ✅ \|
	\| Pro \| 1,000 \| 10,000,000 \| 20 \| ✅ \|
	\| Enterprise \| Custom \| Custom \| Custom \| ✅ \|

	### Rate Limit Headers

	Every response includes rate limit information:

	```
	X-RateLimit-Limit: 100
	X-RateLimit-Remaining: 95
	X-RateLimit-Reset: 1640995200
	X-RateLimit-Used: 5
	```

	### Handling Rate Limits

	```python
	import time
	import openai

	client = openai.OpenAI(api_key="your-api-key")

	for i in range(100):
	try:
	response = client.chat.completions.create(
	model="qwen/qwen2.5-coder-32b",
	messages=[{"role": "user", "content": "Hello"}]
	)
	except openai.RateLimitError:
	time.sleep(60) # Wait 1 minute
	continue
	```

	---

	## REST Endpoints

	### Chat Completions

	Endpoint: `POST /chat/completions`

	Generate chat completions with optional streaming.

	#### Request Body

	```json
	{
	"model": "qwen/qwen2.5-coder-32b",
	"messages": [
	{
	"role": "system",
	"content": "You are a helpful coding assistant."
	},
	{
	"role": "user",
	"content": "Write a Python function to calculate Fibonacci numbers."
	}
	],
	"temperature": 0.7,
	"max_tokens": 1000,
	"top_p": 1.0,
	"frequency_penalty": 0.0,
	"presence_penalty": 0.0,
	"stream": false,
	"stop": null,
	"tools": [],
	"tool_choice": "auto",
	"user": "user-identifier"
	}
	```

	#### Parameters

	\| Parameter \| Type \| Required \| Default \| Description \|
	\|-----------\|------\|----------\|---------\|-------------\|
	\| `model` \| string \| ✅ \| - \| Model identifier \|
	\| `messages` \| array \| ✅ \| - \| Conversation messages \|
	\| `temperature` \| number \| ❌ \| 0.7 \| Sampling temperature (0-2) \|
	\| `max_tokens` \| integer \| ❌ \| 1000 \| Maximum tokens to generate \|
	\| `top_p` \| number \| ❌ \| 1.0 \| Nucleus sampling \|
	\| `frequency_penalty` \| number \| ❌ \| 0.0 \| Frequency penalty (-2 to 2) \|
	\| `presence_penalty` \| number \| ❌ \| 0.0 \| Presence penalty (-2 to 2) \|
	\| `stream` \| boolean \| ❌ \| false \| Enable streaming \|
	\| `stop` \| string/array \| ❌ \| null \| Stop sequences \|
	\| `tools` \| array \| ❌ \| [] \| Available tools \|
	\| `tool_choice` \| string \| ❌ \| "auto" \| Tool selection strategy \|
	\| `user` \| string \| ❌ \| - \| User identifier \|

	#### Response (Non-Streaming)

	```json
	{
	"id": "chatcmpl-1234567890",
	"object": "chat.completion",
	"created": 1234567890,
	"model": "qwen/qwen2.5-coder-32b",
	"choices": [
	{
	"index": 0,
	"message": {
	"role": "assistant",
	"content": "def fibonacci(n):\n if n <= 1:\n return n\n return fibonacci(n-1) + fibonacci(n-2)"
	},
	"finish_reason": "stop"
	}
	],
	"usage": {
	"prompt_tokens": 25,
	"completion_tokens": 35,
	"total_tokens": 60
	},
	"system_fingerprint": "fp_1234567890"
	}
	```

	#### Streaming Response Format

	```bash
	curl -X POST http://localhost:3000/v1/chat/completions \
	-H "Authorization: Bearer YOUR_API_KEY" \
	-H "Content-Type: application/json" \
	-d '{"model": "qwen/qwen2.5-coder-32b", "messages": [{"role": "user", "content": "Hello"}], "stream": true}'
	```

	```json
	{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"qwen/qwen2.5-coder-32b","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
	{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"qwen/qwen2.5-coder-32b","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}
	{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"qwen/qwen2.5-coder-32b","choices":[{"index":0,"delta":{"content":" How"},"finish_reason":null}]}
	{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"qwen/qwen2.5-coder-32b","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
	```

	---

	### Models List

	Endpoint: `GET /models`

	List available models.

	#### Response

	```json
	{
	"object": "list",
	"data": [
	{
	"id": "qwen/qwen2.5-coder-32b",
	"object": "model",
	"created": 1234567890,
	"owned_by": "openclaw",
	"permission": [],
	"root": "qwen/qwen2.5-coder-32b",
	"parent": null,
	"context_window": 131072,
	"Capabilities": {
	"streaming": true,
	"tools": true,
	"voice": true
	}
	},
	{
	"id": "qwen/qwen2.5-coder-14b",
	"object": "model",
	"created": 1234567890,
	"owned_by": "openclaw",
	"permission": [],
	"root": "qwen/qwen2.5-coder-14b",
	"parent": null,
	"context_window": 16384,
	"capabilities": {
	"streaming": true,
	"tools": true,
	"voice": false
	}
	}
	]
	}
	```

	### Get Model

	Endpoint: `GET /models/{model_id}`

	Get details about a specific model.

	```bash
	curl http://localhost:3000/v1/models/qwen/qwen2.5-coder-32b
	```

	```json
	{
	"id": "qwen/qwen2.5-coder-32b",
	"object": "model",
	"created": 1234567890,
	"owned_by": "openclaw",
	"context_window": 131072,
	"capabilities": {
	"streaming": true,
	"tools": true,
	"voice": true
	}
	}
	```

	---

	## WebSocket Streaming

	### Connection

	```javascript
	const ws = new WebSocket('wss://api.stack2.9.openclaw.org/v1/ws/chat');

	ws.onopen = () => {
	ws.send(JSON.stringify({
	type: 'start',
	model: 'qwen/qwen2.5-coder-32b',
	messages: [{role: 'user', content: 'Hello'}],
	temperature: 0.7
	}));
	};

	ws.onmessage = (event) => {
	const data = JSON.parse(event.data);
	console.log(data.content); // Streamed content
	};

	ws.onerror = (error) => {
	console.error('WebSocket error:', error);
	};

	ws.onclose = () => {
	console.log('Connection closed');
	};
	```

	### WebSocket Message Types

	#### Client → Server

	\| Type \| Description \|
	\|------\|-------------\|
	\| `start` \| Start a new chat session \|
	\| `stop` \| Stop current generation \|
	\| `ping` \| Keep-alive ping \|

	#### Server → Client

	\| Type \| Description \|
	\|------\|-------------\|
	\| `content` \| Streamed content chunk \|
	\| `tool_call` \| Tool invocation request \|
	\| `tool_result` \| Tool execution result \|
	\| `done` \| Generation complete \|
	\| `error` \| Error occurred \|
	\| `pong` \| Keep-alive response \|

	### Full WebSocket Example

	```javascript
	class Stack29WebSocket {
	constructor(apiKey, model = 'qwen/qwen2.5-coder-32b') {
	this.apiKey = apiKey;
	this.model = model;
	this.ws = null;
	}

	connect() {
	return new Promise((resolve, reject) => {
	this.ws = new WebSocket('wss://api.stack2.9.openclaw.org/v1/ws/chat');

	this.ws.onopen = () => resolve();
	this.ws.onerror = (e) => reject(e);
	});
	}

	async sendMessage(messages, onChunk, onComplete) {
	await this.connect();

	this.ws.onmessage = (event) => {
	const data = JSON.parse(event.data);

	if (data.type === 'content') {
	onChunk(data.content);
	} else if (data.type === 'done') {
	onComplete(data);
	this.ws.close();
	} else if (data.type === 'error') {
	console.error('Error:', data.message);
	}
	};

	this.ws.send(JSON.stringify({
	type: 'start',
	model: this.model,
	messages: messages,
	temperature: 0.7
	}));
	}
	}

	// Usage
	const client = new Stack29WebSocket('your-api-key');
	client.sendMessage(
	[{role: 'user', content: 'Write a hello world function'}],
	(chunk) => process.stdout.write(chunk),
	(final) => console.log('\n\nDone!')
	);
	```

	---

	## Request/Response Formats

	### Message Format

	```json
	{
	"role": "user\|assistant\|system",
	"content": "Message content",
	"name": "optional-name",
	"tool_calls": [],
	"tool_call_id": "optional-id"
	}
	```

	### Tool Call Format

	```json
	{
	"type": "function",
	"id": "call_abc123",
	"function": {
	"name": "execute_code",
	"description": "Execute code in a sandboxed environment",
	"parameters": {
	"type": "object",
	"properties": {
	"code": {
	"type": "string",
	"description": "The code to execute"
	},
	"language": {
	"type": "string",
	"description": "Programming language"
	}
	},
	"required": ["code", "language"]
	}
	}
	}
	```

	### Tool Call Response Format

	```json
	{
	"tool_call_id": "call_abc123",
	"output": "Hello, World!",
	"error": null,
	"execution_time_ms": 150
	}
	```

	### Vision Support (Image Input)

	```json
	{
	"role": "user",
	"content": [
	{
	"type": "text",
	"text": "What does this code do?"
	},
	{
	"type": "image_url",
	"image_url": {
	"url": "https://example.com/screenshot.png",
	"detail": "low"
	}
	}
	]
	}
	```

	---

	## Error Handling

	### Error Response Format

	```json
	{
	"error": {
	"message": "Invalid API key",
	"type": "invalid_request_error",
	"param": "authorization",
	"code": 401
	}
	}
	```

	### Error Codes

	\| Code \| Type \| HTTP Status \| Description \|
	\|------\|------\|-------------\|-------------\|
	\| `invalid_api_key` \| Authentication \| 401 \| API key is invalid \|
	\| `account_disabled` \| Authentication \| 403 \| Account disabled \|
	\| `rate_limit_exceeded` \| Rate Limit \| 429 \| Too many requests \|
	\| `context_length_exceeded` \| Invalid Request \| 400 \| Context too long \|
	\| `invalid_request` \| Invalid Request \| 400 \| Malformed request \|
	\| `model_not_found` \| Invalid Request \| 404 \| Model doesn't exist \|
	\| `tool_error` \| Tool Error \| 422 \| Tool execution failed \|
	\| `internal_error` \| Server Error \| 500 \| Server-side error \|
	\| `service_unavailable` \| Server Error \| 503 \| Service temporarily down \|

	### Handling Errors in Code

	```python
	import openai
	from openai import APIError, RateLimitError

	client = openai.OpenAI(api_key="your-api-key")

	try:
	response = client.chat.completions.create(
	model="qwen/qwen2.5-coder-32b",
	messages=[{"role": "user", "content": "Hello"}]
	)
	except RateLimitError:
	print("Rate limit exceeded. Please wait.")
	# Implement backoff logic
	except APIError as e:
	print(f"API error: {e}")
	# Handle error appropriately
	```

	```javascript
	try {
	const response = await client.chat.completions.create({
	model: 'qwen/qwen2.5-coder-32b',
	messages: [{role: 'user', content: 'Hello'}]
	});
	} catch (error) {
	if (error.status === 429) {
	console.log('Rate limit exceeded');
	} else if (error.status === 401) {
	console.log('Invalid API key');
	} else {
	console.error('API error:', error.message);
	}
	}
	```

	---

	## SDKs and Examples

	### Python SDK

	```bash
	pip install openai
	```

	```python
	from openai import OpenAI

	client = OpenAI(
	api_key="your-api-key",
	base_url="https://api.stack2.9.openclaw.org/v1"
	)

	# Non-streaming
	response = client.chat.completions.create(
	model="qwen/qwen2.5-coder-32b",
	messages=[
	{"role": "system", "content": "You are a coding assistant."},
	{"role": "user", "content": "Write a Python class for a stack data structure."}
	],
	temperature=0.7,
	max_tokens=1000
	)

	print(response.choices[0].message.content)

	# Streaming
	stream = client.chat.completions.create(
	model="qwen/qwen2.5-coder-32b",
	messages=[{"role": "user", "content": "Explain recursion"}],
	stream=True
	)

	for chunk in stream:
	if chunk.choices[0].delta.content:
	print(chunk.choices[0].delta.content, end="", flush=True)
	```

	### JavaScript/Node.js SDK

	```bash
	npm install openai
	```

	```javascript
	import OpenAI from 'openai';

	const client = new OpenAI({
	apiKey: 'your-api-key',
	baseURL: 'https://api.stack2.9.openclaw.org/v1'
	});

	// Non-streaming
	const response = await client.chat.completions.create({
	model: 'qwen/qwen2.5-coder-32b',
	messages: [
	{role: 'system', content: 'You are a coding assistant.'},
	{role: 'user', content: 'Write a Python class for a stack data structure.'}
	],
	temperature: 0.7,
	max_tokens: 1000
	});

	console.log(response.choices[0].message.content);

	// Streaming
	const stream = await client.chat.completions.create({
	model: 'qwen/qwen2.5-coder-32b',
	messages: [{role: 'user', content: 'Explain recursion'}],
	stream: true
	});

	for await (const chunk of stream) {
	process.stdout.write(chunk.choices[0].delta.content \|\| '');
	}
	```

	### cURL Examples

	```bash
	# Basic chat completion
	curl -X POST https://api.stack2.9.openclaw.org/v1/chat/completions \
	-H "Authorization: Bearer YOUR_API_KEY" \
	-H "Content-Type: application/json" \
	-d '{
	"model": "qwen/qwen2.5-coder-32b",
	"messages": [{"role": "user", "content": "Hello"}]
	}'

	# Streaming completion
	curl -X POST https://api.stack2.9.openclaw.org/v1/chat/completions \
	-H "Authorization: Bearer YOUR_API_KEY" \
	-H "Content-Type: application/json" \
	-d '{"model": "qwen/qwen2.5-coder-32b", "messages": [{"role": "user", "content": "Hello"}], "stream": true}'

	# With system prompt
	curl -X POST https://api.stack2.9.openclaw.org/v1/chat/completions \
	-H "Authorization: Bearer YOUR_API_KEY" \
	-H "Content-Type: application/json" \
	-d '{
	"model": "qwen/qwen2.5-coder-32b",
	"messages": [
	{"role": "system", "content": "You are an expert Python programmer."},
	{"role": "user", "content": "Write a decorator that caches function results."}
	]
	}'

	# With tools
	curl -X POST https://api.stack2.9.openclaw.org/v1/chat/completions \
	-H "Authorization: Bearer YOUR_API_KEY" \
	-H "Content-Type: application/json" \
	-d '{
	"model": "qwen/qwen2.5-coder-32b",
	"messages": [{"role": "user", "content": "Create a file called hello.py"}],
	"tools": [
	{
	"type": "function",
	"function": {
	"name": "write_file",
	"description": "Write content to a file",
	"parameters": {
	"type": "object",
	"properties": {
	"path": {"type": "string"},
	"content": {"type": "string"}
	}
	}
	}
	}
	]
	}'
	```

	### OpenAI-Compatible Client Usage

	Stack 2.9 is compatible with the OpenAI client library:

	```python
	# Works with LangChain
	from langchain.chat_models import ChatOpenAI
	from langchain.schema import HumanMessage

	chat = ChatOpenAI(
	model="qwen/qwen2.5-coder-32b",
	openai_api_base="https://api.stack2.9.openclaw.org/v1",
	openai_api_key="your-api-key"
	)

	response = chat([HumanMessage(content="Hello!")])
	```

	---

	## Webhooks

	Configure webhooks for asynchronous events:

	```json
	{
	"webhook_url": "https://your-server.com/webhook",
	"events": [
	"tool_call.started",
	"tool_call.completed",
	"tool_call.failed",
	"generation.done"
	]
	}
	```

	### Webhook Payload

	```json
	{
	"event": "tool_call.completed",
	"timestamp": "2026-04-01T12:00:00Z",
	"data": {
	"tool_call_id": "call_abc123",
	"tool_name": "execute_code",
	"execution_time_ms": 150,
	"result": "Hello, World!"
	}
	}
	```

	---

	## Best Practices

	### 1. Use Streaming for Better UX

	Always use streaming for long-form content to provide real-time feedback to users.

	### 2. Implement Proper Error Handling

	Always handle rate limits and authentication errors gracefully with exponential backoff.

	### 3. Cache Responses

	Cache frequent queries to reduce API calls and improve response times.

	```python
	from functools import lru_cache

	@lru_cache(maxsize=100)
	def cached_completion(prompt_hash):
	# Implement caching logic
	pass
	```

	### 4. Use Appropriate Temperature

	\| Task \| Temperature \|
	\|------\|-------------\|
	\| Code generation \| 0.0 - 0.3 \|
	\| Factual Q&A \| 0.0 - 0.2 \|
	\| Creative writing \| 0.7 - 1.0 \|
	\| brainstorming \| 0.8 - 1.2 \|

	### 5. Monitor Token Usage

	Track usage to stay within rate limits:

	```python
	response = client.chat.completions.create(...)
	print(f"Tokens used: {response.usage.total_tokens}")
	```

	---

	## Support

	- Documentation: [docs/index.html](docs/index.html)
	- API Status: [status.stack2.9.openclaw.org](https://status.stack2.9.openclaw.org)
	- Issues: [GitHub Issues](https://github.com/openclaw/stack-2.9/issues)
	- Email: api@stack2.9.openclaw.org

	---

	API Version: 1.0
	Last Updated: 2026-04-01
	Status: Active