Instructions to use my-ai-stack/Stack-2-9-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use my-ai-stack/Stack-2-9-finetuned with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="my-ai-stack/Stack-2-9-finetuned")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("my-ai-stack/Stack-2-9-finetuned")
model = AutoModelForCausalLM.from_pretrained("my-ai-stack/Stack-2-9-finetuned")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use my-ai-stack/Stack-2-9-finetuned with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "my-ai-stack/Stack-2-9-finetuned"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-2-9-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/my-ai-stack/Stack-2-9-finetuned

SGLang

How to use my-ai-stack/Stack-2-9-finetuned with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "my-ai-stack/Stack-2-9-finetuned" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-2-9-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "my-ai-stack/Stack-2-9-finetuned" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-2-9-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use my-ai-stack/Stack-2-9-finetuned with Docker Model Runner:
```
docker model run hf.co/my-ai-stack/Stack-2-9-finetuned
```

Stack-2-9-finetuned / docs /ROADMAP.md

walidsobhie-code

feat: add inference API, quickstart guide, roadmap, and combined tool data

b03a8a0 about 1 month ago

preview code

raw

history blame contribute delete

5.78 kB

Stack 2.9 Roadmap

Last updated: April 2026

Current Status

What's Working ✅

Basic code generation - The model can generate Python, JavaScript, and other code based on prompts
CLI interface - Working command-line interface (stack.py, src/cli/)
Multi-provider support - Ollama, OpenAI, Anthropic, OpenRouter, Together AI integrations
46 built-in tools - File operations, git, shell, web search, memory, task planning
Pattern Memory infrastructure - Observer, Learner, Memory components implemented
Training pipeline - LoRA fine-tuning scripts, data preparation, model merging
Deployment options - Docker, RunPod, Vast.ai, Kubernetes, HuggingFace Spaces
128K context window - Extended from base model's 32K

What's Broken or Missing ⚠️

Tool calling not trained - Model doesn't reliably use tools; needs fine-tuning on tool patterns
Benchmark scores unverifiable - Previous claims removed after audit found only 20/164 HumanEval problems tested
Self-evolution not functional - Observer/Learner components exist but not connected to training pipeline
Voice integration incomplete - Coqui XTTS integration present but not tested
Evaluation infrastructure in progress - New proper evaluation framework built but not run on full benchmarks

What Needs Testing 🔧

Full HumanEval (164 problems) evaluation
Full MBPP (500 problems) evaluation
Tool-calling accuracy with real tasks
Pattern Memory retrieval and effectiveness
Voice input/output pipeline
Multi-provider compatibility

What Needs Documentation 📚

Tool definitions and schemas
API reference (internal/ARCHITECTURE.md exists but needs updating)
Pattern Memory usage guide
Deployment troubleshooting
Evaluation methodology

Timeline with Milestones

Short-Term (1-2 Weeks)

Milestone	Description	Status
S1.1	Run full HumanEval (164 problems) with proper inference	Not started
S1.2	Run full MBPP (500 problems) with proper inference	Not started
S1.3	Document all 46 tool definitions in `docs/TOOLS.md`	In progress
S1.4	Fix evaluation scripts to use real model inference	Needed
S1.5	Create minimal reproducible test for tool calling	Not started

Owner: Community contribution welcome

Medium-Term (1-3 Months)

Milestone	Description	Status
M2.1	Fine-tune model on tool-calling patterns (RTMP data)	Not started
M2.2	Implement and test self-evolution loop (Observer → Learner → Memory → Trainer)	Not started
M2.3	Run full benchmark evaluation and publish verified scores	Not started
M2.4	Add MCP server support for external tool integration	Partial
M2.5	Voice integration end-to-end testing	Not started
M2.6	Implement pattern extraction from production usage	Not started

Owner: Requires training compute budget or community contribution

Long-Term (6+ Months)

Milestone	Description	Status
L3.1	RLHF training for improved tool selection	Future
L3.2	Team sync infrastructure (PostgreSQL + FastAPI)	Designed, not implemented
L3.3	Federated learning for privacy-preserving updates	Future
L3.4	Multi-modal support (images → code)	Future
L3.5	Real-time voice-to-voice conversation	Future

Owner: Long-term vision, needs significant resources

How to Contribute

By Priority

Run evaluations - Help us verify benchmark scores by running python stack_2_9_eval/run_proper_evaluation.py
Test tool calling - Try the model with various tools and report what works/doesn't
Documentation - Improve docs, especially tool definitions and API reference
Bug reports - Open issues with reproduction steps
Code contributions - See CONTRIBUTING.md for guidelines

Contribution Areas

Area	Skill Needed	Priority
Evaluation	Python, ML benchmarking	High
Tool calling tests	Python, CLI usage	High
Documentation	Technical writing	Medium
Training scripts	PyTorch, PEFT	Medium
Deployment	Docker, K8s, Cloud	Low
Pattern Memory	Vector databases, ML	Low

Quick Wins for Contributors

Run python stack.py -c "List files in current directory" and report if tools work
Review stack/eval/results/ and verify evaluation logs
Check docs/TOOLS.md accuracy against actual tool implementations
Test with different providers (--provider ollama|openai|anthropic)

Technical Notes

Known Limitations

Tool calling is not trained - The base model has tool capabilities but Stack 2.9 hasn't been fine-tuned to use them reliably
Pattern Memory is read-only - The system stores patterns but doesn't automatically retrain on them yet
Evaluation uses stub data - Some eval scripts return pre-canned answers instead of running model
Voice integration untested - Code exists but hasn't been validated end-to-end

Next Training Run Requirements

To fix tool calling, the next training run needs:

Dataset: data/rtmp-tools/combined_tools.jsonl (already generated)
Compute: ~1 hour on A100 for LoRA fine-tuning
Configuration: Target tool_call logits, use tool_use_examples.jsonl

Contact

Issues: https://github.com/my-ai-stack/stack-2.9/issues
Discussions: https://github.com/my-ai-stack/stack-2.9/discussions
Discord: (link in README)

This roadmap is a living document. Updates based on community feedback and project progress.