Text Generation
Transformers
English
qwen2
code-generation
python
fine-tuning
Qwen
tools
agent-framework
multi-agent
conversational
Eval Results (legacy)
Instructions to use my-ai-stack/Stack-2-9-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use my-ai-stack/Stack-2-9-finetuned with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="my-ai-stack/Stack-2-9-finetuned") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("my-ai-stack/Stack-2-9-finetuned") model = AutoModelForCausalLM.from_pretrained("my-ai-stack/Stack-2-9-finetuned") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use my-ai-stack/Stack-2-9-finetuned with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "my-ai-stack/Stack-2-9-finetuned" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-2-9-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/my-ai-stack/Stack-2-9-finetuned
- SGLang
How to use my-ai-stack/Stack-2-9-finetuned with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "my-ai-stack/Stack-2-9-finetuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-2-9-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "my-ai-stack/Stack-2-9-finetuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-2-9-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use my-ai-stack/Stack-2-9-finetuned with Docker Model Runner:
docker model run hf.co/my-ai-stack/Stack-2-9-finetuned
File size: 7,261 Bytes
2064035 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 | # Training Infrastructure Improvements
## Status: Audit Complete β Issues Found & Documented
---
## π΄ CRITICAL: Data Format Mismatch (Training Won't Run)
### The Problem
All training scripts expect simple text/chat formats, but the actual training data uses a **messages-array format with tool calls**:
```python
# What scripts expect (WRONG):
{"text": "...", "instruction": "...", "output": "..."}
# What the data actually contains (CORRECT):
{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": null, "tool_calls": [...]}, {"role": "tool", ...}], "tools": [...]}
```
### Affected Scripts
| Script | Issue |
|--------|-------|
| `train_simple_nobnb.py` | `tokenize_function` looks for `instruction`/`output` fields β these don't exist |
| `train_local.py` | References `./data/final/train.jsonl` β wrong path and wrong format |
| `train_extended_context.py` | Same `text` field assumption β won't tokenize properly |
| `t4-qlora.yaml` | `text_field: "text"` and `dataset_path: "./data/final/train_combined.jsonl"` β wrong |
| `extended-context-128k.yaml` | `dataset_path: "./training-data/final/train.jsonl"` β file doesn't exist |
### Fix Required
A proper data loader that converts the `messages` format to training tokens, handling:
- System message prepending
- Tool-call turns (skip or flatten)
- User/assistant turns for language modeling
- Padding and truncation at `max_length`
---
## π΄ train_local.py Issues
1. **Broken import path** β `sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'stack/training'))` points to a directory that doesn't exist
2. **Wrong data path** β `./data/final/train.jsonl` should be `./training-data/tool_examples_combined.jsonl`
3. **Wrong config path** β `stack/training/train_config_local.yaml` doesn't exist
4. **MPS check bug** β `torch.backends.mps.is_built()` would raise `AttributeError` on non-Apple hardware
5. **No 4-bit quantization** β loads full model in FP32, will OOM on Mac MPS
---
## π‘ t4-qlora.yaml Issues
1. **Wrong data path**: `./data/final/train_combined.jsonl` doesn't exist
2. **Wrong format field**: `text_field: "text"` won't work with messages format
3. **Includes `neat_ft: false`** β this is not a valid HF TrainingArguments field
4. **No `push_to_hub_model_id`** despite `push_to_hub: true` being templated
---
## π‘ extended-context-128k.yaml Issues
1. **Wrong data path**: `./training-data/final/train.jsonl` doesn't exist
2. **File references `Qwen/Qwen2.5-Coder-1.5B`** but it's not clear if this model already has extended RoPE config
3. **No verification** that the base model actually has `rope_scaling` in its config.json
---
## π‘ evaluate_model.py Issues
1. **Wrong HumanEval format** β expects `test_cases` in problem dict, but HumanEval typically uses `canonical_solution` + `test` strings that need to be executed
2. **Code execution sandbox is limited** β only allows specific builtins; many standard library functions missing
3. **No handling** of `assert` statements in test code
4. **`calculate_pass_at_k`** has a bug: `correct_in_k = sum(correct_flags[:min(k, len(correct_flags))])` is wrong for pass@k β should be number of correct out of k samples drawn, not just first k
---
## π’ What's Working Well
- **`train_simple_nobnb.py`** β Good mixed precision logic, proper bf16/fp16 detection, paged AdamW optimizer, gradient checkpointing with `use_reentrant=False`
- **Training configs** β Comprehensive hardware-specific settings, well-documented
- **Recipes** β Good documentation of GPU requirements and expected runtimes
- **LoRA config** β Properly targets all relevant modules for Qwen
---
## β
Recommended Fixes (Priority Order)
### 1. Fix Data Loaders (Highest Priority)
Add a proper `load_chat_data()` function to `train_simple_nobnb.py`:
```python
def load_chat_data(data_path: str, tokenizer, max_length: int = 2048, train_split: float = 0.9):
"""Load messages-format dataset and convert to training tokens."""
raw_dataset = load_dataset("json", data_files=data_path, split="train")
def tokenize_messages(example):
messages = example["messages"]
# Flatten to: system + user + assistant turns
text = ""
for msg in messages:
role = msg["role"]
content = msg.get("content", "") or ""
if role == "system":
text += f"<|system|>\n{content}\n"
elif role == "user":
text += f"<|user|>\n{content}\n"
elif role == "assistant":
# Skip tool calls in content for now, just use text response
text += f"<|assistant|>\n{content}\n"
elif role == "tool":
text += f"<|tool|>\n{content}\n"
text += "<|assistant|>"
result = tokenizer(text, truncation=True, max_length=max_length, padding="max_length")
result["labels"] = result["input_ids"].copy()
return result
tokenized = raw_dataset.map(tokenize_messages, remove_columns=raw_dataset.column_names)
# ... train/test split
return train_dataset, eval_dataset
```
### 2. Fix All Data Paths
| Config File | Current (Wrong) | Correct |
|-------------|-----------------|---------|
| `t4-qlora.yaml` | `./data/final/train_combined.jsonl` | `./training-data/tool_examples_combined.jsonl` |
| `extended-context-128k.yaml` | `./training-data/final/train.jsonl` | `./training-data/tool_examples_combined.jsonl` |
| `train_local.py` | `./data/final/train.jsonl` | `./training-data/tool_examples_combined.jsonl` |
### 3. Fix t4-qlora.yaml
- Remove `neat_ft: false` (not a valid field)
- Add `output_dir` override or create `training-configs/t4-qlora-data-fix.yaml`
### 4. Fix evaluate_model.py
- Add proper HumanEval problem loading (use `openai/humaneval` dataset from HuggingFace)
- Fix pass@k calculation
- Expand safe builtins for code execution
### 5. Fix train_local.py
- Remove broken `stack/training` import path
- Add proper 4-bit quantization support for MPS (or detect CUDA availability)
- Fix data and config paths
---
## π Actual Training Data Location
```
/Users/walidsobhi/stack-2.9/training/training-data/
βββ tool_examples.jsonl (1000 lines)
βββ tool_examples_combined.jsonl (1500 lines)
βββ tool_examples.json (same data, json format)
```
Format: `{"messages": [...], "tools": [...]}` β messages-array with tool calls.
---
## π Quick Test Command
To verify training would work after fixes:
```bash
cd /Users/walidsobhi/stack-2.9/training
python -c "
from datasets import load_dataset
ds = load_dataset('json', data_files='training-data/tool_examples_combined.jsonl', split='train')
print(f'Total examples: {len(ds)}')
print(f'Keys: {ds.column_names}')
print(f'Example: {ds[0]}')
"
```
Expected output: `['messages', 'tools']` β not `['text']` or `['instruction', 'output']`.
---
## Next Steps
1. Write a proper `load_chat_data()` function in a shared `data_utils.py`
2. Update `train_simple_nobnb.py` to use it
3. Update all YAML configs with correct data paths
4. Test with 1 epoch on small sample
5. Then scale to full training on Kaggle/A100
|