Instructions to use my-ai-stack/Stack-2-9-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use my-ai-stack/Stack-2-9-finetuned with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="my-ai-stack/Stack-2-9-finetuned")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("my-ai-stack/Stack-2-9-finetuned")
model = AutoModelForCausalLM.from_pretrained("my-ai-stack/Stack-2-9-finetuned")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use my-ai-stack/Stack-2-9-finetuned with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "my-ai-stack/Stack-2-9-finetuned"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-2-9-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/my-ai-stack/Stack-2-9-finetuned

SGLang

How to use my-ai-stack/Stack-2-9-finetuned with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "my-ai-stack/Stack-2-9-finetuned" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-2-9-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "my-ai-stack/Stack-2-9-finetuned" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-2-9-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use my-ai-stack/Stack-2-9-finetuned with Docker Model Runner:
```
docker model run hf.co/my-ai-stack/Stack-2-9-finetuned
```

Stack-2-9-finetuned / stack /deploy /README.md

walidsobhie-code

refactor: Squeeze folders further - cleaner structure

65888d5 about 2 months ago

preview code

raw

history blame

3.24 kB

🚀 Stack 2.9 - Pattern-Based AI Coding Assistant

A HuggingFace Spaces demo for Stack 2.9, a pattern-based AI coding assistant powered by Qwen2.5-Coder-7B.

✨ Features

🤖 Qwen2.5-Coder-7B - State-of-the-art code generation model
🔧 7 Integrated Tools - File operations, git, web search, shell commands
🧠 Pattern Memory - Learns from each interaction
⚡ Fast Streaming - Real-time token-by-token generation
💾 4-bit Quantization - Runs on 16GB GPU (~4GB VRAM)

🔧 Available Tools

Tool	Description
`file_read`	Read files from the filesystem
`file_write`	Write content to files
`git_status`	Check git repository status
`web_search`	Search the web for information
`run_command`	Execute shell commands
`create_directory`	Create new directories
`list_directory`	List directory contents

🏃‍♂️ Quick Start

Local Development

# Clone the repository
git clone https://github.com/your-repo/stack-2.9.git
cd stack-2.9/space

# Install dependencies
pip install -r requirements.txt

# Run the demo
python app.py --share

HuggingFace Spaces

Create a new Space on HuggingFace
Select "Gradio" as the SDK
Upload the files from this directory:
- app.py
- requirements.txt
- README.md
The model will load automatically on startup

💻 Usage

Example Prompts

Hello! What can you help me with?
Check git status of this repository
Search for best practices for Python async programming
List the files in the current directory
Write a simple Python function to calculate fibonacci
How do I use Git to create a new branch?
What's your memory of our conversation?

Python API

from app import StackModel, memory

# Initialize model
model = StackModel()
model.load()

# Generate response
response = model.generate("Write a hello world in Python")
print(response)

# Check memory stats
print(memory.get_stats())

🔐 Environment Variables

HF_TOKEN - Your HuggingFace token for private models (optional)
MODEL_ID - Override default model (default: Qwen/Qwen2.5-Coder-7B-Instruct)

📊 Memory System

Stack 2.9 includes a pattern memory system that:

Tracks Interactions - Records every user-assistant exchange
Learns Patterns - Identifies frequently used tools
Stores Code - Saves useful code snippets
Adapts Behavior - Uses learned context to improve responses

🛠️ Tech Stack

Model: Qwen2.5-Coder-7B-Instruct
Quantization: 4-bit (bitsandbytes)
Framework: Gradio 4.0+
Backend: Transformers + Accelerate
GPU: 16GB VRAM recommended

📝 License

MIT License - see LICENSE file for details.

🙏 Acknowledgments

Qwen - Base model
HuggingFace - Spaces hosting
Gradio - UI framework

Made with ❤️ by Stack 2.9