Text Generation
Transformers
English
qwen2
code-generation
python
fine-tuning
Qwen
tools
agent-framework
multi-agent
conversational
Eval Results (legacy)
Instructions to use my-ai-stack/Stack-2-9-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use my-ai-stack/Stack-2-9-finetuned with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="my-ai-stack/Stack-2-9-finetuned") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("my-ai-stack/Stack-2-9-finetuned") model = AutoModelForCausalLM.from_pretrained("my-ai-stack/Stack-2-9-finetuned") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use my-ai-stack/Stack-2-9-finetuned with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "my-ai-stack/Stack-2-9-finetuned" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-2-9-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/my-ai-stack/Stack-2-9-finetuned
- SGLang
How to use my-ai-stack/Stack-2-9-finetuned with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "my-ai-stack/Stack-2-9-finetuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-2-9-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "my-ai-stack/Stack-2-9-finetuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-2-9-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use my-ai-stack/Stack-2-9-finetuned with Docker Model Runner:
docker model run hf.co/my-ai-stack/Stack-2-9-finetuned
Training Infrastructure Improvements
Status: Audit Complete β Issues Found & Documented
π΄ CRITICAL: Data Format Mismatch (Training Won't Run)
The Problem
All training scripts expect simple text/chat formats, but the actual training data uses a messages-array format with tool calls:
# What scripts expect (WRONG):
{"text": "...", "instruction": "...", "output": "..."}
# What the data actually contains (CORRECT):
{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": null, "tool_calls": [...]}, {"role": "tool", ...}], "tools": [...]}
Affected Scripts
| Script | Issue |
|---|---|
train_simple_nobnb.py |
tokenize_function looks for instruction/output fields β these don't exist |
train_local.py |
References ./data/final/train.jsonl β wrong path and wrong format |
train_extended_context.py |
Same text field assumption β won't tokenize properly |
t4-qlora.yaml |
text_field: "text" and dataset_path: "./data/final/train_combined.jsonl" β wrong |
extended-context-128k.yaml |
dataset_path: "./training-data/final/train.jsonl" β file doesn't exist |
Fix Required
A proper data loader that converts the messages format to training tokens, handling:
- System message prepending
- Tool-call turns (skip or flatten)
- User/assistant turns for language modeling
- Padding and truncation at
max_length
π΄ train_local.py Issues
- Broken import path β
sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'stack/training'))points to a directory that doesn't exist - Wrong data path β
./data/final/train.jsonlshould be./training-data/tool_examples_combined.jsonl - Wrong config path β
stack/training/train_config_local.yamldoesn't exist - MPS check bug β
torch.backends.mps.is_built()would raiseAttributeErroron non-Apple hardware - No 4-bit quantization β loads full model in FP32, will OOM on Mac MPS
π‘ t4-qlora.yaml Issues
- Wrong data path:
./data/final/train_combined.jsonldoesn't exist - Wrong format field:
text_field: "text"won't work with messages format - Includes
neat_ft: falseβ this is not a valid HF TrainingArguments field - No
push_to_hub_model_iddespitepush_to_hub: truebeing templated
π‘ extended-context-128k.yaml Issues
- Wrong data path:
./training-data/final/train.jsonldoesn't exist - File references
Qwen/Qwen2.5-Coder-1.5Bbut it's not clear if this model already has extended RoPE config - No verification that the base model actually has
rope_scalingin its config.json
π‘ evaluate_model.py Issues
- Wrong HumanEval format β expects
test_casesin problem dict, but HumanEval typically usescanonical_solution+teststrings that need to be executed - Code execution sandbox is limited β only allows specific builtins; many standard library functions missing
- No handling of
assertstatements in test code calculate_pass_at_khas a bug:correct_in_k = sum(correct_flags[:min(k, len(correct_flags))])is wrong for pass@k β should be number of correct out of k samples drawn, not just first k
π’ What's Working Well
train_simple_nobnb.pyβ Good mixed precision logic, proper bf16/fp16 detection, paged AdamW optimizer, gradient checkpointing withuse_reentrant=False- Training configs β Comprehensive hardware-specific settings, well-documented
- Recipes β Good documentation of GPU requirements and expected runtimes
- LoRA config β Properly targets all relevant modules for Qwen
β Recommended Fixes (Priority Order)
1. Fix Data Loaders (Highest Priority)
Add a proper load_chat_data() function to train_simple_nobnb.py:
def load_chat_data(data_path: str, tokenizer, max_length: int = 2048, train_split: float = 0.9):
"""Load messages-format dataset and convert to training tokens."""
raw_dataset = load_dataset("json", data_files=data_path, split="train")
def tokenize_messages(example):
messages = example["messages"]
# Flatten to: system + user + assistant turns
text = ""
for msg in messages:
role = msg["role"]
content = msg.get("content", "") or ""
if role == "system":
text += f"<|system|>\n{content}\n"
elif role == "user":
text += f"<|user|>\n{content}\n"
elif role == "assistant":
# Skip tool calls in content for now, just use text response
text += f"<|assistant|>\n{content}\n"
elif role == "tool":
text += f"<|tool|>\n{content}\n"
text += "<|assistant|>"
result = tokenizer(text, truncation=True, max_length=max_length, padding="max_length")
result["labels"] = result["input_ids"].copy()
return result
tokenized = raw_dataset.map(tokenize_messages, remove_columns=raw_dataset.column_names)
# ... train/test split
return train_dataset, eval_dataset
2. Fix All Data Paths
| Config File | Current (Wrong) | Correct |
|---|---|---|
t4-qlora.yaml |
./data/final/train_combined.jsonl |
./training-data/tool_examples_combined.jsonl |
extended-context-128k.yaml |
./training-data/final/train.jsonl |
./training-data/tool_examples_combined.jsonl |
train_local.py |
./data/final/train.jsonl |
./training-data/tool_examples_combined.jsonl |
3. Fix t4-qlora.yaml
- Remove
neat_ft: false(not a valid field) - Add
output_diroverride or createtraining-configs/t4-qlora-data-fix.yaml
4. Fix evaluate_model.py
- Add proper HumanEval problem loading (use
openai/humanevaldataset from HuggingFace) - Fix pass@k calculation
- Expand safe builtins for code execution
5. Fix train_local.py
- Remove broken
stack/trainingimport path - Add proper 4-bit quantization support for MPS (or detect CUDA availability)
- Fix data and config paths
π Actual Training Data Location
/Users/walidsobhi/stack-2.9/training/training-data/
βββ tool_examples.jsonl (1000 lines)
βββ tool_examples_combined.jsonl (1500 lines)
βββ tool_examples.json (same data, json format)
Format: {"messages": [...], "tools": [...]} β messages-array with tool calls.
π Quick Test Command
To verify training would work after fixes:
cd /Users/walidsobhi/stack-2.9/training
python -c "
from datasets import load_dataset
ds = load_dataset('json', data_files='training-data/tool_examples_combined.jsonl', split='train')
print(f'Total examples: {len(ds)}')
print(f'Keys: {ds.column_names}')
print(f'Example: {ds[0]}')
"
Expected output: ['messages', 'tools'] β not ['text'] or ['instruction', 'output'].
Next Steps
- Write a proper
load_chat_data()function in a shareddata_utils.py - Update
train_simple_nobnb.pyto use it - Update all YAML configs with correct data paths
- Test with 1 epoch on small sample
- Then scale to full training on Kaggle/A100