Text Generation
Transformers
English
qwen2
code-generation
python
fine-tuning
Qwen
tools
agent-framework
multi-agent
conversational
Eval Results (legacy)
Instructions to use my-ai-stack/Stack-2-9-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use my-ai-stack/Stack-2-9-finetuned with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="my-ai-stack/Stack-2-9-finetuned") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("my-ai-stack/Stack-2-9-finetuned") model = AutoModelForCausalLM.from_pretrained("my-ai-stack/Stack-2-9-finetuned") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use my-ai-stack/Stack-2-9-finetuned with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "my-ai-stack/Stack-2-9-finetuned" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-2-9-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/my-ai-stack/Stack-2-9-finetuned
- SGLang
How to use my-ai-stack/Stack-2-9-finetuned with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "my-ai-stack/Stack-2-9-finetuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-2-9-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "my-ai-stack/Stack-2-9-finetuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-2-9-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use my-ai-stack/Stack-2-9-finetuned with Docker Model Runner:
docker model run hf.co/my-ai-stack/Stack-2-9-finetuned
| # Training Infrastructure Improvements | |
| ## Status: Audit Complete β Issues Found & Documented | |
| --- | |
| ## π΄ CRITICAL: Data Format Mismatch (Training Won't Run) | |
| ### The Problem | |
| All training scripts expect simple text/chat formats, but the actual training data uses a **messages-array format with tool calls**: | |
| ```python | |
| # What scripts expect (WRONG): | |
| {"text": "...", "instruction": "...", "output": "..."} | |
| # What the data actually contains (CORRECT): | |
| {"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": null, "tool_calls": [...]}, {"role": "tool", ...}], "tools": [...]} | |
| ``` | |
| ### Affected Scripts | |
| | Script | Issue | | |
| |--------|-------| | |
| | `train_simple_nobnb.py` | `tokenize_function` looks for `instruction`/`output` fields β these don't exist | | |
| | `train_local.py` | References `./data/final/train.jsonl` β wrong path and wrong format | | |
| | `train_extended_context.py` | Same `text` field assumption β won't tokenize properly | | |
| | `t4-qlora.yaml` | `text_field: "text"` and `dataset_path: "./data/final/train_combined.jsonl"` β wrong | | |
| | `extended-context-128k.yaml` | `dataset_path: "./training-data/final/train.jsonl"` β file doesn't exist | | |
| ### Fix Required | |
| A proper data loader that converts the `messages` format to training tokens, handling: | |
| - System message prepending | |
| - Tool-call turns (skip or flatten) | |
| - User/assistant turns for language modeling | |
| - Padding and truncation at `max_length` | |
| --- | |
| ## π΄ train_local.py Issues | |
| 1. **Broken import path** β `sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'stack/training'))` points to a directory that doesn't exist | |
| 2. **Wrong data path** β `./data/final/train.jsonl` should be `./training-data/tool_examples_combined.jsonl` | |
| 3. **Wrong config path** β `stack/training/train_config_local.yaml` doesn't exist | |
| 4. **MPS check bug** β `torch.backends.mps.is_built()` would raise `AttributeError` on non-Apple hardware | |
| 5. **No 4-bit quantization** β loads full model in FP32, will OOM on Mac MPS | |
| --- | |
| ## π‘ t4-qlora.yaml Issues | |
| 1. **Wrong data path**: `./data/final/train_combined.jsonl` doesn't exist | |
| 2. **Wrong format field**: `text_field: "text"` won't work with messages format | |
| 3. **Includes `neat_ft: false`** β this is not a valid HF TrainingArguments field | |
| 4. **No `push_to_hub_model_id`** despite `push_to_hub: true` being templated | |
| --- | |
| ## π‘ extended-context-128k.yaml Issues | |
| 1. **Wrong data path**: `./training-data/final/train.jsonl` doesn't exist | |
| 2. **File references `Qwen/Qwen2.5-Coder-1.5B`** but it's not clear if this model already has extended RoPE config | |
| 3. **No verification** that the base model actually has `rope_scaling` in its config.json | |
| --- | |
| ## π‘ evaluate_model.py Issues | |
| 1. **Wrong HumanEval format** β expects `test_cases` in problem dict, but HumanEval typically uses `canonical_solution` + `test` strings that need to be executed | |
| 2. **Code execution sandbox is limited** β only allows specific builtins; many standard library functions missing | |
| 3. **No handling** of `assert` statements in test code | |
| 4. **`calculate_pass_at_k`** has a bug: `correct_in_k = sum(correct_flags[:min(k, len(correct_flags))])` is wrong for pass@k β should be number of correct out of k samples drawn, not just first k | |
| --- | |
| ## π’ What's Working Well | |
| - **`train_simple_nobnb.py`** β Good mixed precision logic, proper bf16/fp16 detection, paged AdamW optimizer, gradient checkpointing with `use_reentrant=False` | |
| - **Training configs** β Comprehensive hardware-specific settings, well-documented | |
| - **Recipes** β Good documentation of GPU requirements and expected runtimes | |
| - **LoRA config** β Properly targets all relevant modules for Qwen | |
| --- | |
| ## β Recommended Fixes (Priority Order) | |
| ### 1. Fix Data Loaders (Highest Priority) | |
| Add a proper `load_chat_data()` function to `train_simple_nobnb.py`: | |
| ```python | |
| def load_chat_data(data_path: str, tokenizer, max_length: int = 2048, train_split: float = 0.9): | |
| """Load messages-format dataset and convert to training tokens.""" | |
| raw_dataset = load_dataset("json", data_files=data_path, split="train") | |
| def tokenize_messages(example): | |
| messages = example["messages"] | |
| # Flatten to: system + user + assistant turns | |
| text = "" | |
| for msg in messages: | |
| role = msg["role"] | |
| content = msg.get("content", "") or "" | |
| if role == "system": | |
| text += f"<|system|>\n{content}\n" | |
| elif role == "user": | |
| text += f"<|user|>\n{content}\n" | |
| elif role == "assistant": | |
| # Skip tool calls in content for now, just use text response | |
| text += f"<|assistant|>\n{content}\n" | |
| elif role == "tool": | |
| text += f"<|tool|>\n{content}\n" | |
| text += "<|assistant|>" | |
| result = tokenizer(text, truncation=True, max_length=max_length, padding="max_length") | |
| result["labels"] = result["input_ids"].copy() | |
| return result | |
| tokenized = raw_dataset.map(tokenize_messages, remove_columns=raw_dataset.column_names) | |
| # ... train/test split | |
| return train_dataset, eval_dataset | |
| ``` | |
| ### 2. Fix All Data Paths | |
| | Config File | Current (Wrong) | Correct | | |
| |-------------|-----------------|---------| | |
| | `t4-qlora.yaml` | `./data/final/train_combined.jsonl` | `./training-data/tool_examples_combined.jsonl` | | |
| | `extended-context-128k.yaml` | `./training-data/final/train.jsonl` | `./training-data/tool_examples_combined.jsonl` | | |
| | `train_local.py` | `./data/final/train.jsonl` | `./training-data/tool_examples_combined.jsonl` | | |
| ### 3. Fix t4-qlora.yaml | |
| - Remove `neat_ft: false` (not a valid field) | |
| - Add `output_dir` override or create `training-configs/t4-qlora-data-fix.yaml` | |
| ### 4. Fix evaluate_model.py | |
| - Add proper HumanEval problem loading (use `openai/humaneval` dataset from HuggingFace) | |
| - Fix pass@k calculation | |
| - Expand safe builtins for code execution | |
| ### 5. Fix train_local.py | |
| - Remove broken `stack/training` import path | |
| - Add proper 4-bit quantization support for MPS (or detect CUDA availability) | |
| - Fix data and config paths | |
| --- | |
| ## π Actual Training Data Location | |
| ``` | |
| /Users/walidsobhi/stack-2.9/training/training-data/ | |
| βββ tool_examples.jsonl (1000 lines) | |
| βββ tool_examples_combined.jsonl (1500 lines) | |
| βββ tool_examples.json (same data, json format) | |
| ``` | |
| Format: `{"messages": [...], "tools": [...]}` β messages-array with tool calls. | |
| --- | |
| ## π Quick Test Command | |
| To verify training would work after fixes: | |
| ```bash | |
| cd /Users/walidsobhi/stack-2.9/training | |
| python -c " | |
| from datasets import load_dataset | |
| ds = load_dataset('json', data_files='training-data/tool_examples_combined.jsonl', split='train') | |
| print(f'Total examples: {len(ds)}') | |
| print(f'Keys: {ds.column_names}') | |
| print(f'Example: {ds[0]}') | |
| " | |
| ``` | |
| Expected output: `['messages', 'tools']` β not `['text']` or `['instruction', 'output']`. | |
| --- | |
| ## Next Steps | |
| 1. Write a proper `load_chat_data()` function in a shared `data_utils.py` | |
| 2. Update `train_simple_nobnb.py` to use it | |
| 3. Update all YAML configs with correct data paths | |
| 4. Test with 1 epoch on small sample | |
| 5. Then scale to full training on Kaggle/A100 | |