Instructions to use my-ai-stack/Stack-2-9-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use my-ai-stack/Stack-2-9-finetuned with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="my-ai-stack/Stack-2-9-finetuned")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("my-ai-stack/Stack-2-9-finetuned")
model = AutoModelForCausalLM.from_pretrained("my-ai-stack/Stack-2-9-finetuned")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use my-ai-stack/Stack-2-9-finetuned with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "my-ai-stack/Stack-2-9-finetuned"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-2-9-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/my-ai-stack/Stack-2-9-finetuned

SGLang

How to use my-ai-stack/Stack-2-9-finetuned with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "my-ai-stack/Stack-2-9-finetuned" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-2-9-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "my-ai-stack/Stack-2-9-finetuned" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-2-9-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use my-ai-stack/Stack-2-9-finetuned with Docker Model Runner:
```
docker model run hf.co/my-ai-stack/Stack-2-9-finetuned
```

Stack-2-9-finetuned / stack /docs /archive /SUBMISSION_PACKAGE_SUMMARY.md

walidsobhie-code

refactor: Squeeze folders further - cleaner structure

65888d5 about 2 months ago

preview code

raw

history blame contribute delete

5.65 kB

Together AI Submission Package - Completion Summary

Date: 2025-04-02 Status: ✅ Complete Deliverables: All 5 tasks fulfilled

1. MODEL_CARD.md ✅

Location: stack-2.9/MODEL_CARD.md

Contents:

Model Description: Stack 2.9 as a fine-tuned Qwen2.5-Coder-32B with Pattern Memory
Training Data Sources: Detailed breakdown of synthetic examples, pattern memory data, public datasets (OpenAssistant, CodeAct, CodeContests, StarCoder), code-comment pairs
Training Procedure: LoRA fine-tuning with 3 epochs, gradient accumulation 16, learning rate 1e-4, 4-bit quantization, training pipeline steps (prepare_data.py, train_lora.py, merge_adapter.py)
Hyperparameters: Complete configuration from train_config.yaml (r=64, alpha=128, dropout=0.05, target modules, max_length=131072, etc.)
Intended Uses: AI-assisted coding, education, research; NOT for safety-critical or autonomous deployment
Limitations: 128K context may degrade at max length, hallucinations possible, requires human oversight, tool dependencies on OpenClaw
License: MIT License (fine-tuned code + model wrapper), Apache 2.0 (base Qwen model + training data)

2. inference_examples.py ✅

Location: stack-2.9/inference_examples.py

Contents: 15 diverse coding demonstration tasks covering:

Simple function (factorial recursion)
Data structure (LRU cache)
Code explanation (quicksort)
Debugging (find duplicates bug)
Refactoring (Pythonic list comprehension)
API integration (REST with retries)
File operations (pattern memory)
Multi-step workflow (project scaffolding)
System design (task queue)
Web development (Flask/FastAPI)
Code translation (JS to Python)
Unit testing (pytest for binary search)
Data processing (CSV aggregation)
Async programming (concurrent URL fetch)
Pattern retrieval (tree traversal)

Features:

CLI with --provider supporting ollama, openai, anthropic, openrouter, together
Uses model_client to run actual inference or documentation mode
Reports token counts and latency
Can be extended with more examples

3. README Badges ✅

Updated: stack-2.9/README.md

Changes:

Added Together AI badge: Together_AI-Supported-green
Updated Multi-Provider feature to include Together AI
Added Together AI environment variables to Configuration section
Evaluation badges remain as "Evaluation In Progress" (pending real scores)

Note: Real benchmark scores will be updated after evaluation completes. The infrastructure is in place.

4. Together AI Documentation ✅

Files Created:

stack-2.9/TOGETHER_AI.md - Comprehensive guide (350+ lines)

Contents:

Overview of Together AI integration
Prerequisites and setup
Environment variables: TOGETHER_API_KEY, TOGETHER_MODEL, MODEL_PROVIDER
Recommended models: togethercomputer/Qwen2.5-Coder-32B-Instruct (primary), plus alternatives (Llama-3-70B, CodeLlama-34B)
Usage examples: CLI, Python API, chat mode, tool calls
Cost estimation table
Performance considerations
Error handling and retries
Comparison with other providers
Troubleshooting guide
Security best practices

Integration in model_client.py (stack-2.9/stack-2.9-eval/model_client.py):

Added TogetherClient class implementing OpenAI-compatible API
Uses base URL https://api.together.xyz/v1
Default model: togethercomputer/Qwen2.5-Coder-32B-Instruct
Updated create_model_client factory to support provider="together"
Updated CLI parser to include "together" option

5. License Compatibility Verification ✅

File Created: stack-2.9/LICENSES.md

Verification Results:

✅ Project Code: MIT License (permissive)
✅ Base Model: Qwen2.5-Coder-32B - Apache 2.0 (Alibaba)
✅ Training Data: Apache 2.0 (manifest.json)
✅ Dependencies: All MIT, Apache 2.0, or BSD (torch, transformers, peft, bitsandbytes, datasets, openai, anthropic, requests, etc.)
✅ Public Datasets: OpenAssistant (Apache 2.0), CodeAct (permissive), CodeContests (permissive), StarCoder (permissive)

Conclusion: All components licensed under permissive terms that allow redistribution, modification, and commercial use. Stack 2.9 can be distributed under MIT for code + Apache 2.0 for model/data.

Additional Artifacts

MODEL_CARD.md conforms to standard model card format (model description, training data, procedure, uses, limitations, license)
inference_examples.py is executable and demonstrates real capabilities
TOGETHER_AI.md provides complete coverage for Together AI deployment
LICENSES.md provides legal clarity for distribution

File Structure (new files)

stack-2.9/
├── MODEL_CARD.md                  (NEW)
├── TOGETHER_AI.md                 (NEW)
├── LICENSES.md                    (NEW)
├── inference_examples.py          (NEW)
├── stack-2.9-eval/
│   └── model_client.py           (MODIFIED - added TogetherClient)
├── README.md                      (MODIFIED - badges, config)
└── ... (existing files)

Quick Start for Together AI Users

# 1. Set environment
export TOGETHER_API_KEY="tog-..."
export MODEL_PROVIDER="together"
export TOGETHER_MODEL="togethercomputer/Qwen2.5-Coder-32B-Instruct"

# 2. Run inference
python stack.py "Write a function to calculate fibonacci"

# Or run examples
python inference_examples.py --provider together

All tasks completed successfully. The submission package is ready for Together AI integration.