Text Generation
Transformers
English
qwen2
code-generation
python
fine-tuning
Qwen
tools
agent-framework
multi-agent
conversational
Eval Results (legacy)
Instructions to use my-ai-stack/Stack-2-9-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use my-ai-stack/Stack-2-9-finetuned with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="my-ai-stack/Stack-2-9-finetuned") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("my-ai-stack/Stack-2-9-finetuned") model = AutoModelForCausalLM.from_pretrained("my-ai-stack/Stack-2-9-finetuned") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use my-ai-stack/Stack-2-9-finetuned with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "my-ai-stack/Stack-2-9-finetuned" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-2-9-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/my-ai-stack/Stack-2-9-finetuned
- SGLang
How to use my-ai-stack/Stack-2-9-finetuned with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "my-ai-stack/Stack-2-9-finetuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-2-9-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "my-ai-stack/Stack-2-9-finetuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-2-9-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use my-ai-stack/Stack-2-9-finetuned with Docker Model Runner:
docker model run hf.co/my-ai-stack/Stack-2-9-finetuned
OpenRouter Submission - Stack 2.9
Model Information
Model Name: Qwen/Qwen2.5-Coder-32B Fine-Tuned Version: Stack 2.9 (OpenClaw tool patterns) Context Length: 131072 tokens Architecture: Transformer-based Parameters: 32 billion
Capabilities
Core Capabilities
- Code Generation: Multi-language code writing and completion
- Tool Use: Native integration with OpenClaw tool patterns
- Voice Integration Ready: Compatible with voice cloning systems
- API Compatibility: OpenAI-compatible endpoints
Advanced Features
- Context Understanding: 128K token context window
- Multi-file Operations: Work across entire codebases
- Error Detection: Identify and suggest fixes
- Code Review: Automated quality analysis
- Documentation Generation: Auto-create API docs
Pricing Proposal
Free Tier
- Requests: 100,000 tokens/day
- Concurrent Requests: 5
- Features: All core capabilities
Pay-Per-Use
- Tier 1: $0.50 per 1M tokens
- Tier 2: $0.40 per 1M tokens (for volumes > 100M tokens)
- Tier 3: $0.30 per 1M tokens (for volumes > 500M tokens)
Enterprise
- Custom Pricing: Contact for volume discounts
- SLA: 99.9% uptime guarantee
- Support: Priority support included
Review Process Timeline
Submission Phase (Week 1)
- Initial submission and documentation review
- Model capabilities verification
- API endpoint testing
Testing Phase (Weeks 2-3)
- Performance benchmarking
- Safety and bias evaluation
- Integration testing
Approval Phase (Week 4)
- Final review and approval
- Listing preparation
- Launch planning
Contact Information
Primary Contact: Stack 2.9 Team Email: stack29@openclaw.org Website: https://stack2.9.openclaw.org GitHub: https://github.com/my-ai-stack/stack-2.9
Unique Value Proposition
Why Stack 2.9?
- Voice-Enabled Coding: The only open-source coding assistant with native voice integration
- Tool Pattern Excellence: Fine-tuned on OpenClaw's extensive tool-use patterns
- Cost-Effective: Significantly cheaper than commercial alternatives
- Self-Hosting Freedom: Apache 2.0 license allows unrestricted deployment
- Community-Driven: Developed by the open-source community
Competitive Advantages
- Voice Integration: Unlike Claude Code or GitHub Copilot, Stack 2.9 supports voice commands
- Open Source: Fully transparent with Apache 2.0 licensing
- Tool Patterns: Specialized in OpenClaw tool patterns for superior tool use
- Cost: Free tier available, pay-per-use model
- Flexibility: Self-hosting option for complete control
Target Markets
- Individual Developers: Free tier for hobbyists and students
- Startups: Cost-effective alternative to commercial solutions
- Enterprises: Self-hosting option for data privacy
- Educational Institutions: Open source for learning and research
Safety and Ethics
Safety Measures
- Bias Mitigation: Fine-tuning includes bias reduction techniques
- Content Filtering: Built-in content safety filters
- Tool Validation: All tool calls are validated before execution
Ethical Considerations
- Open Source: Transparent development process
- Community Governance: Community-driven development
- Responsible AI: Committed to ethical AI development
Performance Metrics
Benchmark Results
- HumanEval: 75% pass@1 (estimated)
- MBPP: 80% pass@1 (estimated)
- Tokens/Second: 25-30 tokens/second on A100 GPU
Latency
- Average Response Time: 2-3 seconds
- Streaming: Real-time response generation
Stack 2.9 - Revolutionizing coding with voice and open source. Ready for OpenRouter listing approval.