Instructions to use my-ai-stack/Stack-2-9-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use my-ai-stack/Stack-2-9-finetuned with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="my-ai-stack/Stack-2-9-finetuned")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("my-ai-stack/Stack-2-9-finetuned")
model = AutoModelForCausalLM.from_pretrained("my-ai-stack/Stack-2-9-finetuned")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use my-ai-stack/Stack-2-9-finetuned with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "my-ai-stack/Stack-2-9-finetuned"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-2-9-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/my-ai-stack/Stack-2-9-finetuned

SGLang

How to use my-ai-stack/Stack-2-9-finetuned with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "my-ai-stack/Stack-2-9-finetuned" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-2-9-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "my-ai-stack/Stack-2-9-finetuned" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-2-9-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use my-ai-stack/Stack-2-9-finetuned with Docker Model Runner:
```
docker model run hf.co/my-ai-stack/Stack-2-9-finetuned
```

Stack-2-9-finetuned / training /IMPROVEMENTS.md

walidsobhie-code

fix: optimize model card badges and clean YAML frontmatter

2064035 about 2 months ago

preview code

raw

history blame contribute delete

7.26 kB

	# Training Infrastructure Improvements

	## Status: Audit Complete — Issues Found & Documented

	---

	## 🔴 CRITICAL: Data Format Mismatch (Training Won't Run)

	### The Problem
	All training scripts expect simple text/chat formats, but the actual training data uses a messages-array format with tool calls:

	```python
	# What scripts expect (WRONG):
	{"text": "...", "instruction": "...", "output": "..."}

	# What the data actually contains (CORRECT):
	{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": null, "tool_calls": [...]}, {"role": "tool", ...}], "tools": [...]}
	```

	### Affected Scripts
	\| Script \| Issue \|
	\|--------\|-------\|
	\| `train_simple_nobnb.py` \| `tokenize_function` looks for `instruction`/`output` fields — these don't exist \|
	\| `train_local.py` \| References `./data/final/train.jsonl` — wrong path and wrong format \|
	\| `train_extended_context.py` \| Same `text` field assumption — won't tokenize properly \|
	\| `t4-qlora.yaml` \| `text_field: "text"` and `dataset_path: "./data/final/train_combined.jsonl"` — wrong \|
	\| `extended-context-128k.yaml` \| `dataset_path: "./training-data/final/train.jsonl"` — file doesn't exist \|

	### Fix Required
	A proper data loader that converts the `messages` format to training tokens, handling:
	- System message prepending
	- Tool-call turns (skip or flatten)
	- User/assistant turns for language modeling
	- Padding and truncation at `max_length`

	---

	## 🔴 train_local.py Issues

	1. Broken import path — `sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'stack/training'))` points to a directory that doesn't exist
	2. Wrong data path — `./data/final/train.jsonl` should be `./training-data/tool_examples_combined.jsonl`
	3. Wrong config path — `stack/training/train_config_local.yaml` doesn't exist
	4. MPS check bug — `torch.backends.mps.is_built()` would raise `AttributeError` on non-Apple hardware
	5. No 4-bit quantization — loads full model in FP32, will OOM on Mac MPS

	---

	## 🟡 t4-qlora.yaml Issues

	1. Wrong data path: `./data/final/train_combined.jsonl` doesn't exist
	2. Wrong format field: `text_field: "text"` won't work with messages format
	3. Includes `neat_ft: false` — this is not a valid HF TrainingArguments field
	4. No `push_to_hub_model_id` despite `push_to_hub: true` being templated

	---

	## 🟡 extended-context-128k.yaml Issues

	1. Wrong data path: `./training-data/final/train.jsonl` doesn't exist
	2. File references `Qwen/Qwen2.5-Coder-1.5B` but it's not clear if this model already has extended RoPE config
	3. No verification that the base model actually has `rope_scaling` in its config.json

	---

	## 🟡 evaluate_model.py Issues

	1. Wrong HumanEval format — expects `test_cases` in problem dict, but HumanEval typically uses `canonical_solution` + `test` strings that need to be executed
	2. Code execution sandbox is limited — only allows specific builtins; many standard library functions missing
	3. No handling of `assert` statements in test code
	4. `calculate_pass_at_k` has a bug: `correct_in_k = sum(correct_flags[:min(k, len(correct_flags))])` is wrong for pass@k — should be number of correct out of k samples drawn, not just first k

	---

	## 🟢 What's Working Well

	- `train_simple_nobnb.py` — Good mixed precision logic, proper bf16/fp16 detection, paged AdamW optimizer, gradient checkpointing with `use_reentrant=False`
	- Training configs — Comprehensive hardware-specific settings, well-documented
	- Recipes — Good documentation of GPU requirements and expected runtimes
	- LoRA config — Properly targets all relevant modules for Qwen

	---

	## ✅ Recommended Fixes (Priority Order)

	### 1. Fix Data Loaders (Highest Priority)
	Add a proper `load_chat_data()` function to `train_simple_nobnb.py`:

	```python
	def load_chat_data(data_path: str, tokenizer, max_length: int = 2048, train_split: float = 0.9):
	"""Load messages-format dataset and convert to training tokens."""
	raw_dataset = load_dataset("json", data_files=data_path, split="train")

	def tokenize_messages(example):
	messages = example["messages"]
	# Flatten to: system + user + assistant turns
	text = ""
	for msg in messages:
	role = msg["role"]
	content = msg.get("content", "") or ""
	if role == "system":
	text += f"<\|system\|>\n{content}\n"
	elif role == "user":
	text += f"<\|user\|>\n{content}\n"
	elif role == "assistant":
	# Skip tool calls in content for now, just use text response
	text += f"<\|assistant\|>\n{content}\n"
	elif role == "tool":
	text += f"<\|tool\|>\n{content}\n"
	text += "<\|assistant\|>"

	result = tokenizer(text, truncation=True, max_length=max_length, padding="max_length")
	result["labels"] = result["input_ids"].copy()
	return result

	tokenized = raw_dataset.map(tokenize_messages, remove_columns=raw_dataset.column_names)
	# ... train/test split
	return train_dataset, eval_dataset
	```

	### 2. Fix All Data Paths
	\| Config File \| Current (Wrong) \| Correct \|
	\|-------------\|-----------------\|---------\|
	\| `t4-qlora.yaml` \| `./data/final/train_combined.jsonl` \| `./training-data/tool_examples_combined.jsonl` \|
	\| `extended-context-128k.yaml` \| `./training-data/final/train.jsonl` \| `./training-data/tool_examples_combined.jsonl` \|
	\| `train_local.py` \| `./data/final/train.jsonl` \| `./training-data/tool_examples_combined.jsonl` \|

	### 3. Fix t4-qlora.yaml
	- Remove `neat_ft: false` (not a valid field)
	- Add `output_dir` override or create `training-configs/t4-qlora-data-fix.yaml`

	### 4. Fix evaluate_model.py
	- Add proper HumanEval problem loading (use `openai/humaneval` dataset from HuggingFace)
	- Fix pass@k calculation
	- Expand safe builtins for code execution

	### 5. Fix train_local.py
	- Remove broken `stack/training` import path
	- Add proper 4-bit quantization support for MPS (or detect CUDA availability)
	- Fix data and config paths

	---

	## 📁 Actual Training Data Location

	```
	/Users/walidsobhi/stack-2.9/training/training-data/
	├── tool_examples.jsonl (1000 lines)
	├── tool_examples_combined.jsonl (1500 lines)
	└── tool_examples.json (same data, json format)
	```

	Format: `{"messages": [...], "tools": [...]}` — messages-array with tool calls.

	---

	## 🚀 Quick Test Command

	To verify training would work after fixes:

	```bash
	cd /Users/walidsobhi/stack-2.9/training
	python -c "
	from datasets import load_dataset
	ds = load_dataset('json', data_files='training-data/tool_examples_combined.jsonl', split='train')
	print(f'Total examples: {len(ds)}')
	print(f'Keys: {ds.column_names}')
	print(f'Example: {ds[0]}')
	"
	```

	Expected output: `['messages', 'tools']` — not `['text']` or `['instruction', 'output']`.

	---

	## Next Steps

	1. Write a proper `load_chat_data()` function in a shared `data_utils.py`
	2. Update `train_simple_nobnb.py` to use it
	3. Update all YAML configs with correct data paths
	4. Test with 1 epoch on small sample
	5. Then scale to full training on Kaggle/A100