Text Generation
Transformers
English
qwen2
code-generation
python
fine-tuning
Qwen
tools
agent-framework
multi-agent
conversational
Eval Results (legacy)
Instructions to use my-ai-stack/Stack-2-9-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use my-ai-stack/Stack-2-9-finetuned with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="my-ai-stack/Stack-2-9-finetuned") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("my-ai-stack/Stack-2-9-finetuned") model = AutoModelForCausalLM.from_pretrained("my-ai-stack/Stack-2-9-finetuned") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use my-ai-stack/Stack-2-9-finetuned with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "my-ai-stack/Stack-2-9-finetuned" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-2-9-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/my-ai-stack/Stack-2-9-finetuned
- SGLang
How to use my-ai-stack/Stack-2-9-finetuned with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "my-ai-stack/Stack-2-9-finetuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-2-9-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "my-ai-stack/Stack-2-9-finetuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-2-9-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use my-ai-stack/Stack-2-9-finetuned with Docker Model Runner:
docker model run hf.co/my-ai-stack/Stack-2-9-finetuned
π Stack 2.9 - Pattern-Based AI Coding Assistant
A HuggingFace Spaces demo for Stack 2.9, a pattern-based AI coding assistant powered by Qwen2.5-Coder-7B.
β¨ Features
- π€ Qwen2.5-Coder-7B - State-of-the-art code generation model
- π§ 7 Integrated Tools - File operations, git, web search, shell commands
- π§ Pattern Memory - Learns from each interaction
- β‘ Fast Streaming - Real-time token-by-token generation
- πΎ 4-bit Quantization - Runs on 16GB GPU (~4GB VRAM)
π§ Available Tools
| Tool | Description |
|---|---|
file_read |
Read files from the filesystem |
file_write |
Write content to files |
git_status |
Check git repository status |
web_search |
Search the web for information |
run_command |
Execute shell commands |
create_directory |
Create new directories |
list_directory |
List directory contents |
πββοΈ Quick Start
Local Development
# Clone the repository
git clone https://github.com/your-repo/stack-2.9.git
cd stack-2.9/space
# Install dependencies
pip install -r requirements.txt
# Run the demo
python app.py --share
HuggingFace Spaces
- Create a new Space on HuggingFace
- Select "Gradio" as the SDK
- Upload the files from this directory:
app.pyrequirements.txtREADME.md
- The model will load automatically on startup
π» Usage
Example Prompts
Hello! What can you help me with?
Check git status of this repository
Search for best practices for Python async programming
List the files in the current directory
Write a simple Python function to calculate fibonacci
How do I use Git to create a new branch?
What's your memory of our conversation?
Python API
from app import StackModel, memory
# Initialize model
model = StackModel()
model.load()
# Generate response
response = model.generate("Write a hello world in Python")
print(response)
# Check memory stats
print(memory.get_stats())
π Environment Variables
HF_TOKEN- Your HuggingFace token for private models (optional)MODEL_ID- Override default model (default: Qwen/Qwen2.5-Coder-7B-Instruct)
π Memory System
Stack 2.9 includes a pattern memory system that:
- Tracks Interactions - Records every user-assistant exchange
- Learns Patterns - Identifies frequently used tools
- Stores Code - Saves useful code snippets
- Adapts Behavior - Uses learned context to improve responses
π οΈ Tech Stack
- Model: Qwen2.5-Coder-7B-Instruct
- Quantization: 4-bit (bitsandbytes)
- Framework: Gradio 4.0+
- Backend: Transformers + Accelerate
- GPU: 16GB VRAM recommended
π License
MIT License - see LICENSE file for details.
π Acknowledgments
- Qwen - Base model
- HuggingFace - Spaces hosting
- Gradio - UI framework
Made with β€οΈ by Stack 2.9