Instructions to use my-ai-stack/Stack-2-9-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use my-ai-stack/Stack-2-9-finetuned with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="my-ai-stack/Stack-2-9-finetuned") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("my-ai-stack/Stack-2-9-finetuned") model = AutoModelForCausalLM.from_pretrained("my-ai-stack/Stack-2-9-finetuned") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use my-ai-stack/Stack-2-9-finetuned with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "my-ai-stack/Stack-2-9-finetuned" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-2-9-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/my-ai-stack/Stack-2-9-finetuned
- SGLang
How to use my-ai-stack/Stack-2-9-finetuned with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "my-ai-stack/Stack-2-9-finetuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-2-9-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "my-ai-stack/Stack-2-9-finetuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-2-9-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use my-ai-stack/Stack-2-9-finetuned with Docker Model Runner:
docker model run hf.co/my-ai-stack/Stack-2-9-finetuned
Together AI Submission Package - Completion Summary
Date: 2025-04-02 Status: β Complete Deliverables: All 5 tasks fulfilled
1. MODEL_CARD.md β
Location: stack-2.9/MODEL_CARD.md
Contents:
- Model Description: Stack 2.9 as a fine-tuned Qwen2.5-Coder-32B with Pattern Memory
- Training Data Sources: Detailed breakdown of synthetic examples, pattern memory data, public datasets (OpenAssistant, CodeAct, CodeContests, StarCoder), code-comment pairs
- Training Procedure: LoRA fine-tuning with 3 epochs, gradient accumulation 16, learning rate 1e-4, 4-bit quantization, training pipeline steps (prepare_data.py, train_lora.py, merge_adapter.py)
- Hyperparameters: Complete configuration from train_config.yaml (r=64, alpha=128, dropout=0.05, target modules, max_length=131072, etc.)
- Intended Uses: AI-assisted coding, education, research; NOT for safety-critical or autonomous deployment
- Limitations: 128K context may degrade at max length, hallucinations possible, requires human oversight, tool dependencies on OpenClaw
- License: MIT License (fine-tuned code + model wrapper), Apache 2.0 (base Qwen model + training data)
2. inference_examples.py β
Location: stack-2.9/inference_examples.py
Contents: 15 diverse coding demonstration tasks covering:
- Simple function (factorial recursion)
- Data structure (LRU cache)
- Code explanation (quicksort)
- Debugging (find duplicates bug)
- Refactoring (Pythonic list comprehension)
- API integration (REST with retries)
- File operations (pattern memory)
- Multi-step workflow (project scaffolding)
- System design (task queue)
- Web development (Flask/FastAPI)
- Code translation (JS to Python)
- Unit testing (pytest for binary search)
- Data processing (CSV aggregation)
- Async programming (concurrent URL fetch)
- Pattern retrieval (tree traversal)
Features:
- CLI with
--providersupporting ollama, openai, anthropic, openrouter, together - Uses model_client to run actual inference or documentation mode
- Reports token counts and latency
- Can be extended with more examples
3. README Badges β
Updated: stack-2.9/README.md
Changes:
- Added Together AI badge:
Together_AI-Supported-green - Updated Multi-Provider feature to include Together AI
- Added Together AI environment variables to Configuration section
- Evaluation badges remain as "Evaluation In Progress" (pending real scores)
Note: Real benchmark scores will be updated after evaluation completes. The infrastructure is in place.
4. Together AI Documentation β
Files Created:
stack-2.9/TOGETHER_AI.md- Comprehensive guide (350+ lines)
Contents:
- Overview of Together AI integration
- Prerequisites and setup
- Environment variables:
TOGETHER_API_KEY,TOGETHER_MODEL,MODEL_PROVIDER - Recommended models:
togethercomputer/Qwen2.5-Coder-32B-Instruct(primary), plus alternatives (Llama-3-70B, CodeLlama-34B) - Usage examples: CLI, Python API, chat mode, tool calls
- Cost estimation table
- Performance considerations
- Error handling and retries
- Comparison with other providers
- Troubleshooting guide
- Security best practices
Integration in model_client.py (stack-2.9/stack-2.9-eval/model_client.py):
- Added
TogetherClientclass implementing OpenAI-compatible API - Uses base URL
https://api.together.xyz/v1 - Default model:
togethercomputer/Qwen2.5-Coder-32B-Instruct - Updated
create_model_clientfactory to supportprovider="together" - Updated CLI parser to include "together" option
5. License Compatibility Verification β
File Created: stack-2.9/LICENSES.md
Verification Results:
- β Project Code: MIT License (permissive)
- β Base Model: Qwen2.5-Coder-32B - Apache 2.0 (Alibaba)
- β Training Data: Apache 2.0 (manifest.json)
- β Dependencies: All MIT, Apache 2.0, or BSD (torch, transformers, peft, bitsandbytes, datasets, openai, anthropic, requests, etc.)
- β Public Datasets: OpenAssistant (Apache 2.0), CodeAct (permissive), CodeContests (permissive), StarCoder (permissive)
Conclusion: All components licensed under permissive terms that allow redistribution, modification, and commercial use. Stack 2.9 can be distributed under MIT for code + Apache 2.0 for model/data.
Additional Artifacts
- MODEL_CARD.md conforms to standard model card format (model description, training data, procedure, uses, limitations, license)
- inference_examples.py is executable and demonstrates real capabilities
- TOGETHER_AI.md provides complete coverage for Together AI deployment
- LICENSES.md provides legal clarity for distribution
File Structure (new files)
stack-2.9/
βββ MODEL_CARD.md (NEW)
βββ TOGETHER_AI.md (NEW)
βββ LICENSES.md (NEW)
βββ inference_examples.py (NEW)
βββ stack-2.9-eval/
β βββ model_client.py (MODIFIED - added TogetherClient)
βββ README.md (MODIFIED - badges, config)
βββ ... (existing files)
Quick Start for Together AI Users
# 1. Set environment
export TOGETHER_API_KEY="tog-..."
export MODEL_PROVIDER="together"
export TOGETHER_MODEL="togethercomputer/Qwen2.5-Coder-32B-Instruct"
# 2. Run inference
python stack.py "Write a function to calculate fibonacci"
# Or run examples
python inference_examples.py --provider together
All tasks completed successfully. The submission package is ready for Together AI integration.