Text Generation
Transformers
English
qwen2
code-generation
python
fine-tuning
Qwen
tools
agent-framework
multi-agent
conversational
Eval Results (legacy)
Instructions to use my-ai-stack/Stack-2-9-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use my-ai-stack/Stack-2-9-finetuned with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="my-ai-stack/Stack-2-9-finetuned") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("my-ai-stack/Stack-2-9-finetuned") model = AutoModelForCausalLM.from_pretrained("my-ai-stack/Stack-2-9-finetuned") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use my-ai-stack/Stack-2-9-finetuned with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "my-ai-stack/Stack-2-9-finetuned" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-2-9-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/my-ai-stack/Stack-2-9-finetuned
- SGLang
How to use my-ai-stack/Stack-2-9-finetuned with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "my-ai-stack/Stack-2-9-finetuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-2-9-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "my-ai-stack/Stack-2-9-finetuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-2-9-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use my-ai-stack/Stack-2-9-finetuned with Docker Model Runner:
docker model run hf.co/my-ai-stack/Stack-2-9-finetuned
File size: 2,377 Bytes
f80360c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 | #!/bin/bash
# Stack 2.9 Quick HumanEval Evaluation Wrapper
# Usage: ./quick_human_eval.sh [provider] [model] [num_samples]
# Example: ./quick_human_eval.sh ollama qwen2.5-coder:32b 20
set -e
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
cd "$SCRIPT_DIR"
# Defaults
PROVIDER="${1:-ollama}"
MODEL="${2:-qwen2.5-coder:32b}"
MAX_PROBLEMS="${3:-20}"
echo "========================================"
echo "Stack 2.9 HumanEval Quick Evaluation"
echo "========================================"
echo "Provider: $PROVIDER"
echo "Model: $MODEL"
echo "Problems: $MAX_PROBLEMS"
echo ""
# Check if vllm is available
if command -v vllm &> /dev/null; then
USE_VLLM="--use-vllm"
echo "✓ vLLM detected - will use for faster inference"
else
USE_VLLM=""
echo "⚠ vLLM not found - using standard inference"
fi
# Check provider availability
case "$PROVIDER" in
ollama)
if command -v ollama &> /dev/null; then
echo "✓ Ollama available"
# Check if model is loaded
if curl -s http://localhost:11434/api/tags &> /dev/null; then
echo "✓ Ollama server running"
else
echo "⚠ Ollama server not running - start with: ollama serve"
fi
else
echo "⚠ Ollama not installed - will attempt anyway"
fi
;;
openai)
if [ -z "$OPENAI_API_KEY" ]; then
echo "⚠ OPENAI_API_KEY not set"
else
echo "✓ OpenAI API key configured"
fi
;;
anthropic)
if [ -z "$ANTHROPIC_API_KEY" ]; then
echo "⚠ ANTHROPIC_API_KEY not set"
else
echo "✓ Anthropic API key configured"
fi
;;
esac
echo ""
echo "Running evaluation..."
echo "----------------------------------------"
# Run the evaluation
python3 -m benchmarks.human_eval \
--provider "$PROVIDER" \
--model "$MODEL" \
--max-problems "$MAX_PROBLEMS" \
--timeout 30 \
$USE_VLLM
echo ""
echo "========================================"
echo "Evaluation complete!"
echo "========================================"
echo ""
echo "Results saved to: results/humaneval.json"
echo ""
echo "To run full 164-problem benchmark:"
echo " 1. Download full HumanEval dataset"
echo " 2. Use GPU with 80GB VRAM (A100/H100)"
echo " 3. See HUMAN_EVAL_PLAN.md for details" |