Text Generation
Transformers
English
qwen2
code-generation
python
fine-tuning
Qwen
tools
agent-framework
multi-agent
conversational
Eval Results (legacy)
Instructions to use my-ai-stack/Stack-2-9-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use my-ai-stack/Stack-2-9-finetuned with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="my-ai-stack/Stack-2-9-finetuned") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("my-ai-stack/Stack-2-9-finetuned") model = AutoModelForCausalLM.from_pretrained("my-ai-stack/Stack-2-9-finetuned") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use my-ai-stack/Stack-2-9-finetuned with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "my-ai-stack/Stack-2-9-finetuned" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-2-9-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/my-ai-stack/Stack-2-9-finetuned
- SGLang
How to use my-ai-stack/Stack-2-9-finetuned with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "my-ai-stack/Stack-2-9-finetuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-2-9-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "my-ai-stack/Stack-2-9-finetuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-2-9-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use my-ai-stack/Stack-2-9-finetuned with Docker Model Runner:
docker model run hf.co/my-ai-stack/Stack-2-9-finetuned
| # Deploy Stack 2.9 to RunPod | |
| # Requires: runpodctl installed and configured | |
| set -euo pipefail | |
| echo "π Deploying Stack 2.9 to RunPod" | |
| echo "================================" | |
| echo "" | |
| # Color codes | |
| RED='\033[0;31m' | |
| GREEN='\033[0;32m' | |
| YELLOW='\033[1;33m' | |
| NC='\033[0m' | |
| # Configuration (can be overridden by environment variables) | |
| IMAGE="${RUNPOD_IMAGE:-docker.io/library/pytorch:2.1.0-cuda11.8-cudnn8-runtime}" | |
| TEMPLATE_NAME="${RUNPOD_TEMPLATE_NAME:-stack-2.9-template}" | |
| CONTAINER_NAME="${RUNPOD_CONTAINER_NAME:-stack-2.9-server}" | |
| GPU_TYPE="${RUNPOD_GPU_TYPE:-NVIDIA RTX A6000}" | |
| DISK_SIZE="${RUNPOD_DISK_SIZE:-50}" | |
| MODEL_PATH="${MODEL_PATH:-/workspace/models/stack-2.9-awq}" | |
| VLLM_PORT="${VLLM_PORT:-8000}" | |
| # Check prerequisites | |
| command -v runpodctl >/dev/null 2>&1 || { | |
| echo -e "${RED}β runpodctl not found. Install from: https://github.com/runpod/runpodctl${NC}" | |
| exit 1 | |
| } | |
| echo "π Configuration:" | |
| echo " GPU: $GPU_TYPE" | |
| echo " Disk: ${DISK_SIZE}GB" | |
| echo " Image: $IMAGE" | |
| echo " Model path: $MODEL_PATH" | |
| echo "" | |
| # Step 1: Create template (one-time, may already exist) | |
| echo "π¦ Creating/verifying RunPod template..." | |
| if ! runpodctl get template "$TEMPLATE_NAME" &>/dev/null; then | |
| runpodctl create template \ | |
| --name "$TEMPLATE_NAME" \ | |
| --image "$IMAGE" \ | |
| --docker-run-args "--gpus all -e MODEL_PATH=$MODEL_PATH -e VLLM_PORT=$VLLM_PORT -p $VLLM_PORT:8000" \ | |
| --volume "/workspace/models:$MODEL_PATH:ro" \ | |
| --volume "/workspace/output:/workspace/output" \ | |
| --container-disk-size "${DISK_SIZE}GB" | |
| echo -e "${GREEN}β Template created${NC}" | |
| else | |
| echo -e "${YELLOW}β οΈ Template already exists, using existing${NC}" | |
| fi | |
| # Step 2: Deploy pod | |
| echo "βοΈ Deploying pod..." | |
| POD_ID=$(runpodctl create pod \ | |
| --name "$CONTAINER_NAME" \ | |
| --gpu-type "$GPU_TYPE" \ | |
| --disk-size "${DISK_SIZE}GB" \ | |
| --template "$TEMPLATE_NAME" \ | |
| --env "MODEL_PATH=$MODEL_PATH" \ | |
| --env "VLLM_PORT=$VLLM_PORT" \ | |
| --port "$VLLM_PORT" \ | |
| --query id) | |
| echo -e "${GREEN}β Pod created: $POD_ID${NC}" | |
| echo " Waiting for startup (this may take 2-3 minutes for first-time model load)..." | |
| sleep 60 | |
| # Step 3: Copy deployment files | |
| echo "π€ Copying code to pod..." | |
| # Create deployment package | |
| TEMP_PACKAGE="/tmp/stack-2.9-deployment-$(date +%s).tar.gz" | |
| tar czf "$TEMP_PACKAGE" \ | |
| stack-2.9-deploy/ \ | |
| requirements.txt \ | |
| 2>/dev/null || { | |
| echo -e "${RED}β Failed to create deployment package${NC}" | |
| exit 1 | |
| } | |
| # Copy to pod | |
| if ! runpodctl cp "$TEMP_PACKAGE" "$POD_ID:/workspace/" ; then | |
| echo -e "${RED}β Failed to copy package to pod${NC}" | |
| exit 1 | |
| fi | |
| # Extract and setup | |
| echo "π§ Setting up on pod..." | |
| runpodctl ssh "$POD_ID" bash -c "' | |
| set -euo pipefail | |
| cd /workspace | |
| tar xzf stack-2.9-*.tar.gz | |
| # Install system dependencies | |
| apt-get update && apt-get install -y --no-install-recommends \ | |
| python3-pip \ | |
| python3-venv \ | |
| curl \ | |
| && rm -rf /var/lib/apt/lists/* | |
| # Upgrade pip and install requirements | |
| python3 -m pip install --upgrade pip setuptools wheel | |
| python3 -m pip install -r requirements.txt | |
| # Check if model exists | |
| if [ ! -d \"$MODEL_PATH\" ] || [ -z \"$(ls -A $MODEL_PATH 2>/dev/null)\" ]; then | |
| echo \"β οΈ Model not found at $MODEL_PATH\" | |
| echo \" You have two options:\" | |
| echo \" 1. Upload your model to: $MODEL_PATH\"\n echo \" 2. Set MODEL_PATH to a HuggingFace model name and it will be downloaded\"\n echo \" Example: export MODEL_PATH=meta-llama/Llama-3.1-8B-Instruct\"\n echo \" Note: Downloading large models may take hours and exceed pod disk space.\"\n echo \" Recommendation: Upload AWQ-quantized model to S3 and download it.\"\nfi | |
| echo \"Starting vLLM server...\" | |
| cd /workspace/stack-2.9-deploy | |
| nohup python vllm_server.py > vllm.log 2>&1 & | |
| echo \$! > /tmp/vllm.pid | |
| '" || { | |
| echo -e "${RED}β Failed to setup pod${NC}" | |
| exit 1 | |
| } | |
| # Step 4: Wait and check status | |
| echo "β³ Waiting for vLLM server to start..." | |
| sleep 30 | |
| # Get pod status | |
| echo "" | |
| echo "π Pod status:" | |
| runpodctl get pod "$POD_ID" | |
| # Get public URL | |
| PUBLIC_URL=$(runpodctl get pod "$POD_ID" --query "url" --output text 2>/dev/null || echo "pending") | |
| echo "" | |
| echo -e "${GREEN}β Deployment initiated!${NC}" | |
| echo " Pod ID: $POD_ID" | |
| echo " vLLM API: http://$PUBLIC_URL:8000" | |
| echo " Health: http://$PUBLIC_URL:8000/health" | |
| echo "" | |
| echo "π To monitor:" | |
| echo " runpodctl logs $POD_ID # View logs" | |
| echo " runpodctl ssh $POD_ID # SSH into pod" | |
| echo " runpodctl stop pod $POD_ID # Stop (saves disk)" | |
| echo " runpodctl delete pod $POD_ID # Delete (you lose data)" | |
| echo "" | |
| echo -e "${YELLOW}β οΈ First server startup may take 5-15 minutes as the model loads${NC}" | |
| echo -e "${YELLOW}β οΈ Monitor logs: runpodctl logs $POD_ID${NC}" | |