Text Generation
Transformers
English
qwen2
code-generation
python
fine-tuning
Qwen
tools
agent-framework
multi-agent
conversational
Eval Results (legacy)
Instructions to use my-ai-stack/Stack-2-9-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use my-ai-stack/Stack-2-9-finetuned with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="my-ai-stack/Stack-2-9-finetuned") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("my-ai-stack/Stack-2-9-finetuned") model = AutoModelForCausalLM.from_pretrained("my-ai-stack/Stack-2-9-finetuned") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use my-ai-stack/Stack-2-9-finetuned with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "my-ai-stack/Stack-2-9-finetuned" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-2-9-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/my-ai-stack/Stack-2-9-finetuned
- SGLang
How to use my-ai-stack/Stack-2-9-finetuned with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "my-ai-stack/Stack-2-9-finetuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-2-9-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "my-ai-stack/Stack-2-9-finetuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-2-9-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use my-ai-stack/Stack-2-9-finetuned with Docker Model Runner:
docker model run hf.co/my-ai-stack/Stack-2-9-finetuned
| # Together AI Submission Package - Completion Summary | |
| **Date**: 2025-04-02 | |
| **Status**: β Complete | |
| **Deliverables**: All 5 tasks fulfilled | |
| --- | |
| ## 1. MODEL_CARD.md β | |
| **Location**: `stack-2.9/MODEL_CARD.md` | |
| **Contents**: | |
| - **Model Description**: Stack 2.9 as a fine-tuned Qwen2.5-Coder-32B with Pattern Memory | |
| - **Training Data Sources**: Detailed breakdown of synthetic examples, pattern memory data, public datasets (OpenAssistant, CodeAct, CodeContests, StarCoder), code-comment pairs | |
| - **Training Procedure**: LoRA fine-tuning with 3 epochs, gradient accumulation 16, learning rate 1e-4, 4-bit quantization, training pipeline steps (prepare_data.py, train_lora.py, merge_adapter.py) | |
| - **Hyperparameters**: Complete configuration from train_config.yaml (r=64, alpha=128, dropout=0.05, target modules, max_length=131072, etc.) | |
| - **Intended Uses**: AI-assisted coding, education, research; NOT for safety-critical or autonomous deployment | |
| - **Limitations**: 128K context may degrade at max length, hallucinations possible, requires human oversight, tool dependencies on OpenClaw | |
| - **License**: MIT License (fine-tuned code + model wrapper), Apache 2.0 (base Qwen model + training data) | |
| --- | |
| ## 2. inference_examples.py β | |
| **Location**: `stack-2.9/inference_examples.py` | |
| **Contents**: 15 diverse coding demonstration tasks covering: | |
| 1. Simple function (factorial recursion) | |
| 2. Data structure (LRU cache) | |
| 3. Code explanation (quicksort) | |
| 4. Debugging (find duplicates bug) | |
| 5. Refactoring (Pythonic list comprehension) | |
| 6. API integration (REST with retries) | |
| 7. File operations (pattern memory) | |
| 8. Multi-step workflow (project scaffolding) | |
| 9. System design (task queue) | |
| 10. Web development (Flask/FastAPI) | |
| 11. Code translation (JS to Python) | |
| 12. Unit testing (pytest for binary search) | |
| 13. Data processing (CSV aggregation) | |
| 14. Async programming (concurrent URL fetch) | |
| 15. Pattern retrieval (tree traversal) | |
| **Features**: | |
| - CLI with `--provider` supporting ollama, openai, anthropic, openrouter, together | |
| - Uses model_client to run actual inference or documentation mode | |
| - Reports token counts and latency | |
| - Can be extended with more examples | |
| --- | |
| ## 3. README Badges β | |
| **Updated**: `stack-2.9/README.md` | |
| **Changes**: | |
| - Added Together AI badge: `Together_AI-Supported-green` | |
| - Updated Multi-Provider feature to include Together AI | |
| - Added Together AI environment variables to Configuration section | |
| - Evaluation badges remain as "Evaluation In Progress" (pending real scores) | |
| **Note**: Real benchmark scores will be updated after evaluation completes. The infrastructure is in place. | |
| --- | |
| ## 4. Together AI Documentation β | |
| **Files Created**: | |
| - `stack-2.9/TOGETHER_AI.md` - Comprehensive guide (350+ lines) | |
| **Contents**: | |
| - Overview of Together AI integration | |
| - Prerequisites and setup | |
| - Environment variables: `TOGETHER_API_KEY`, `TOGETHER_MODEL`, `MODEL_PROVIDER` | |
| - Recommended models: `togethercomputer/Qwen2.5-Coder-32B-Instruct` (primary), plus alternatives (Llama-3-70B, CodeLlama-34B) | |
| - Usage examples: CLI, Python API, chat mode, tool calls | |
| - Cost estimation table | |
| - Performance considerations | |
| - Error handling and retries | |
| - Comparison with other providers | |
| - Troubleshooting guide | |
| - Security best practices | |
| **Integration in model_client.py** (`stack-2.9/stack-2.9-eval/model_client.py`): | |
| - Added `TogetherClient` class implementing OpenAI-compatible API | |
| - Uses base URL `https://api.together.xyz/v1` | |
| - Default model: `togethercomputer/Qwen2.5-Coder-32B-Instruct` | |
| - Updated `create_model_client` factory to support `provider="together"` | |
| - Updated CLI parser to include "together" option | |
| --- | |
| ## 5. License Compatibility Verification β | |
| **File Created**: `stack-2.9/LICENSES.md` | |
| **Verification Results**: | |
| - β **Project Code**: MIT License (permissive) | |
| - β **Base Model**: Qwen2.5-Coder-32B - Apache 2.0 (Alibaba) | |
| - β **Training Data**: Apache 2.0 (manifest.json) | |
| - β **Dependencies**: All MIT, Apache 2.0, or BSD (torch, transformers, peft, bitsandbytes, datasets, openai, anthropic, requests, etc.) | |
| - β **Public Datasets**: OpenAssistant (Apache 2.0), CodeAct (permissive), CodeContests (permissive), StarCoder (permissive) | |
| **Conclusion**: All components licensed under permissive terms that allow redistribution, modification, and commercial use. Stack 2.9 can be distributed under MIT for code + Apache 2.0 for model/data. | |
| --- | |
| ## Additional Artifacts | |
| - **MODEL_CARD.md** conforms to standard model card format (model description, training data, procedure, uses, limitations, license) | |
| - **inference_examples.py** is executable and demonstrates real capabilities | |
| - **TOGETHER_AI.md** provides complete coverage for Together AI deployment | |
| - **LICENSES.md** provides legal clarity for distribution | |
| --- | |
| ## File Structure (new files) | |
| ``` | |
| stack-2.9/ | |
| βββ MODEL_CARD.md (NEW) | |
| βββ TOGETHER_AI.md (NEW) | |
| βββ LICENSES.md (NEW) | |
| βββ inference_examples.py (NEW) | |
| βββ stack-2.9-eval/ | |
| β βββ model_client.py (MODIFIED - added TogetherClient) | |
| βββ README.md (MODIFIED - badges, config) | |
| βββ ... (existing files) | |
| ``` | |
| --- | |
| ## Quick Start for Together AI Users | |
| ```bash | |
| # 1. Set environment | |
| export TOGETHER_API_KEY="tog-..." | |
| export MODEL_PROVIDER="together" | |
| export TOGETHER_MODEL="togethercomputer/Qwen2.5-Coder-32B-Instruct" | |
| # 2. Run inference | |
| python stack.py "Write a function to calculate fibonacci" | |
| # Or run examples | |
| python inference_examples.py --provider together | |
| ``` | |
| --- | |
| **All tasks completed successfully. The submission package is ready for Together AI integration.** | |