Text Generation
Transformers
English
qwen2
code-generation
python
fine-tuning
Qwen
tools
agent-framework
multi-agent
conversational
Eval Results (legacy)
Instructions to use my-ai-stack/Stack-2-9-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use my-ai-stack/Stack-2-9-finetuned with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="my-ai-stack/Stack-2-9-finetuned") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("my-ai-stack/Stack-2-9-finetuned") model = AutoModelForCausalLM.from_pretrained("my-ai-stack/Stack-2-9-finetuned") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use my-ai-stack/Stack-2-9-finetuned with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "my-ai-stack/Stack-2-9-finetuned" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-2-9-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/my-ai-stack/Stack-2-9-finetuned
- SGLang
How to use my-ai-stack/Stack-2-9-finetuned with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "my-ai-stack/Stack-2-9-finetuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-2-9-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "my-ai-stack/Stack-2-9-finetuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-2-9-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use my-ai-stack/Stack-2-9-finetuned with Docker Model Runner:
docker model run hf.co/my-ai-stack/Stack-2-9-finetuned
| # OpenRouter Submission - Stack 2.9 | |
| ## Model Information | |
| **Model Name**: Qwen/Qwen2.5-Coder-32B | |
| **Fine-Tuned Version**: Stack 2.9 (OpenClaw tool patterns) | |
| **Context Length**: 131072 tokens | |
| **Architecture**: Transformer-based | |
| **Parameters**: 32 billion | |
| ## Capabilities | |
| ### Core Capabilities | |
| - **Code Generation**: Multi-language code writing and completion | |
| - **Tool Use**: Native integration with OpenClaw tool patterns | |
| - **Voice Integration Ready**: Compatible with voice cloning systems | |
| - **API Compatibility**: OpenAI-compatible endpoints | |
| ### Advanced Features | |
| - **Context Understanding**: 128K token context window | |
| - **Multi-file Operations**: Work across entire codebases | |
| - **Error Detection**: Identify and suggest fixes | |
| - **Code Review**: Automated quality analysis | |
| - **Documentation Generation**: Auto-create API docs | |
| ## Pricing Proposal | |
| ### Free Tier | |
| - **Requests**: 100,000 tokens/day | |
| - **Concurrent Requests**: 5 | |
| - **Features**: All core capabilities | |
| ### Pay-Per-Use | |
| - **Tier 1**: $0.50 per 1M tokens | |
| - **Tier 2**: $0.40 per 1M tokens (for volumes > 100M tokens) | |
| - **Tier 3**: $0.30 per 1M tokens (for volumes > 500M tokens) | |
| ### Enterprise | |
| - **Custom Pricing**: Contact for volume discounts | |
| - **SLA**: 99.9% uptime guarantee | |
| - **Support**: Priority support included | |
| ## Review Process Timeline | |
| ### Submission Phase (Week 1) | |
| - Initial submission and documentation review | |
| - Model capabilities verification | |
| - API endpoint testing | |
| ### Testing Phase (Weeks 2-3) | |
| - Performance benchmarking | |
| - Safety and bias evaluation | |
| - Integration testing | |
| ### Approval Phase (Week 4) | |
| - Final review and approval | |
| - Listing preparation | |
| - Launch planning | |
| ## Contact Information | |
| **Primary Contact**: Stack 2.9 Team | |
| **Email**: stack29@openclaw.org | |
| **Website**: https://stack2.9.openclaw.org | |
| **GitHub**: https://github.com/my-ai-stack/stack-2.9 | |
| ## Unique Value Proposition | |
| ### Why Stack 2.9? | |
| 1. **Voice-Enabled Coding**: The only open-source coding assistant with native voice integration | |
| 2. **Tool Pattern Excellence**: Fine-tuned on OpenClaw's extensive tool-use patterns | |
| 3. **Cost-Effective**: Significantly cheaper than commercial alternatives | |
| 4. **Self-Hosting Freedom**: Apache 2.0 license allows unrestricted deployment | |
| 5. **Community-Driven**: Developed by the open-source community | |
| ### Competitive Advantages | |
| - **Voice Integration**: Unlike Claude Code or GitHub Copilot, Stack 2.9 supports voice commands | |
| - **Open Source**: Fully transparent with Apache 2.0 licensing | |
| - **Tool Patterns**: Specialized in OpenClaw tool patterns for superior tool use | |
| - **Cost**: Free tier available, pay-per-use model | |
| - **Flexibility**: Self-hosting option for complete control | |
| ### Target Markets | |
| - **Individual Developers**: Free tier for hobbyists and students | |
| - **Startups**: Cost-effective alternative to commercial solutions | |
| - **Enterprises**: Self-hosting option for data privacy | |
| - **Educational Institutions**: Open source for learning and research | |
| ## Safety and Ethics | |
| ### Safety Measures | |
| - **Bias Mitigation**: Fine-tuning includes bias reduction techniques | |
| - **Content Filtering**: Built-in content safety filters | |
| - **Tool Validation**: All tool calls are validated before execution | |
| ### Ethical Considerations | |
| - **Open Source**: Transparent development process | |
| - **Community Governance**: Community-driven development | |
| - **Responsible AI**: Committed to ethical AI development | |
| ## Performance Metrics | |
| ### Benchmark Results | |
| - **HumanEval**: 75% pass@1 (estimated) | |
| - **MBPP**: 80% pass@1 (estimated) | |
| - **Tokens/Second**: 25-30 tokens/second on A100 GPU | |
| ### Latency | |
| - **Average Response Time**: 2-3 seconds | |
| - **Streaming**: Real-time response generation | |
| --- | |
| **Stack 2.9** - Revolutionizing coding with voice and open source. Ready for OpenRouter listing approval. |