Instructions to use my-ai-stack/Stack-2-9-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use my-ai-stack/Stack-2-9-finetuned with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="my-ai-stack/Stack-2-9-finetuned")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("my-ai-stack/Stack-2-9-finetuned")
model = AutoModelForCausalLM.from_pretrained("my-ai-stack/Stack-2-9-finetuned")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use my-ai-stack/Stack-2-9-finetuned with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "my-ai-stack/Stack-2-9-finetuned"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-2-9-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/my-ai-stack/Stack-2-9-finetuned

SGLang

How to use my-ai-stack/Stack-2-9-finetuned with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "my-ai-stack/Stack-2-9-finetuned" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-2-9-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "my-ai-stack/Stack-2-9-finetuned" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-2-9-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use my-ai-stack/Stack-2-9-finetuned with Docker Model Runner:
```
docker model run hf.co/my-ai-stack/Stack-2-9-finetuned
```

Stack-2-9-finetuned / docs /TOOL_DATA_ANALYSIS.md

walidsobhie-code

feat: add inference API, quickstart guide, roadmap, and combined tool data

b03a8a0 about 2 months ago

preview code

raw

history blame contribute delete

6.91 kB

	# Tool Calling Training Data Analysis

	Generated: 2026-04-06
	Files Analyzed:
	- `training-data/tool_examples.jsonl` (original)
	- `training-data_v2/tool_examples.jsonl` (regenerated)

	---

	## Executive Summary

	The original tool calling training data had significant quality issues that limited its usefulness for training a production AI coding assistant. The data was synthetically generated with systematic errors.

	Key Findings on Original Data:
	- ❌ 10.5% of tool calls use incorrect parameters (mismatched search queries, wrong files)
	- ❌ Heavy prompt duplication (7.5x average)
	- ❌ No multi-step tool chains (only 1 tool per example)
	- ❌ All examples use identical tool definitions

	Action Taken: Generated 500 new examples using the project's generator script.

	Recommendation: The original data needs substantial improvements before use in training.

	---

	## 1. Statistics Overview

	### Original Data (tool_examples.jsonl)

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Total Examples \| 1,000 \|
	\| Unique Prompts \| 133 \|
	\| Average Duplication \| 7.52x \|
	\| Unique Tool Sequences \| 5 \|
	\| Examples with Issues \| ~107 (10.7%) \|

	### New Data (tool_examples_v2.jsonl)

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Total Examples \| 500 \|
	\| File Size \| 1.9 MB \|
	\| Tools per Example \| 5 (static definition) \|

	### Tool Call Distribution (Original)

	\| Tool \| Call Count \|
	\|------\|------------\|
	\| Bash \| 200 \|
	\| FileRead \| 200 \|
	\| FileWrite \| 200 \|
	\| WebSearch \| 200 \|
	\| Grep \| 200 \|

	All examples have exactly one tool call - no multi-step chains exist.

	---

	## 2. Prompt Diversity Analysis (Original Data)

	### Prompt Categories

	\| Category \| Count \| Percentage \|
	\|----------\|-------\|------------\|
	\| Python \| 207 \| 20.7% \|
	\| React \| 149 \| 14.9% \|
	\| File Read \| 134 \| 13.4% \|
	\| File Write \| 119 \| 11.9% \|
	\| Other \| 114 \| 11.4% \|
	\| Run Command \| 80 \| 8.0% \|
	\| Docker/K8s \| 67 \| 6.7% \|
	\| Search \| 50 \| 5.0% \|
	\| Git \| 40 \| 4.0% \|
	\| Testing \| 31 \| 3.1% \|
	\| Package Management \| 9 \| 0.9% \|

	### Most Duplicated Prompts

	\| Prompt \| Occurrences \|
	\|--------\|-------------\|
	\| "Run the tests with pytest" \| 40 \|
	\| "Run npm install to install dependencies" \| 40 \|
	\| "Write a simple React component to src/components/Button.jsx" \| 67 \|

	---

	## 3. Tool Usage Breakdown

	### Tool Definitions

	All 1,000 original examples use identical tool definitions with 5 tools:
	- `Bash` - Execute bash commands
	- `FileRead` - Read file contents
	- `FileWrite` - Create/overwrite files
	- `WebSearch` - Search the web
	- `Grep` - Search for patterns in files

	### Tool Call Issues Found (Original Data)

	#### Wrong Search Patterns (105 instances / 10.5%)

	The `WebSearch` tool frequently uses queries that don't match the user's question:

	\| User Question \| Actual Search Query \|
	\|--------------\|---------------------\|
	\| "How do I use async/await in Python?" \| "AWS Lambda cold start optimization" \|
	\| "How do I use React hooks properly?" \| "SQL join types explained" \|
	\| "What's the difference between Docker and Kubernetes?" \| "Git rebase vs merge" \|
	\| "How do I use React hooks properly?" \| "TypeScript generics tutorial" \|
	\| "What's the difference between Docker and Kubernetes?" \| "TypeScript generics tutorial" \|

	#### Wrong File Paths (2 instances)

	The `FileWrite` tool sometimes writes to incorrect file types:

	\| User Request \| Written Path \|
	\|-------------\|--------------\|
	\| "Create a src/components/Header.jsx file" \| Written to `config.json` \|
	\| "Create a src/middleware.py file with settings" \| Written to `config.yaml` \|

	#### Pattern/File Type Mismatches (Grep)

	The `Grep` tool sometimes searches with mismatched patterns:

	\| Pattern \| File Pattern \| Issue \|
	\|---------\|-------------\|-------\|
	\| `class ` \| `*.ts` \| Python pattern in TypeScript files \|
	\| `SELECT ` \| `*.js` \| SQL pattern in JavaScript files \|
	\| `TODO` \| `*.md` \| Searching TODO in markdown files \|

	---

	## 4. Data Quality Issues

	### Critical Issues

	1. No Multi-Step Tool Chains
	- All 1,000 examples use exactly one tool call
	- Real coding tasks typically require 2-5+ tool calls
	- Example: "Read file → Find pattern → Search docs → Write fix"

	2. Search Query Mismatches
	- 10.5% of WebSearch calls have irrelevant queries
	- Indicates the generator script has logic errors

	3. Heavy Prompt Duplication
	- 133 unique prompts duplicated to 1,000 examples
	- "Write a simple React component" appears 67 times
	- This creates overfitting to specific prompts

	4. Identical Tool Definitions
	- All examples use the same 5 tools with identical descriptions
	- No variation in tool schemas or parameter structures

	### Moderate Issues

	5. File Path Hallucination
	- Tool calls reference files that don't exist in actual codebase
	- Example: asking for `tests/test_main.py` but reading `src/app.js`

	6. Response Fabrication
	- Assistant responses sometimes claim to show content that wasn't actually read
	- Example: "Here's the README.md" when README.md wasn't the file requested

	---

	## 5. Recommendations for Improvement

	### Immediate Actions (Completed)

	1. ✅ Regenerated Data
	```
	Generated 500 new examples in training-data_v2/tool_examples.jsonl
	```

	### Script Fixes Needed

	The generator script (`scripts/generate_tool_data.py`) needs:

	1. Fix `TOOL_CALL_PAIRS` mapping - queries don't match questions
	2. Fix `FILE_PATTERNS` - wrong file types for requested content
	3. Add multi-step chain generation
	4. Add prompt variation templates
	5. Add validation to check query/content relevance

	### Future Improvements

	1. Add Multi-Step Examples
	- Real tasks require reading files, searching, editing
	- Generate chains of 2-4 tool calls per example

	2. Increase Prompt Diversity
	- Target 500+ unique prompts instead of duplicating
	- Use template variations and paraphrasing

	3. Vary Tool Definitions
	- Different tools per example
	- Add tool variations (e.g., different Bash commands)

	---

	## 6. Conclusion

	The original `tool_examples.jsonl` data is NOT suitable for production training without significant improvements:

	- ~10% of examples have incorrect tool parameters
	- Heavy duplication leads to overfitting
	- No multi-step chains fail to represent real coding workflows
	- Synthetic generation errors are systematic

	Action Completed: Generated 500 new examples via the project's generator script.

	Remaining Work: Fix the underlying generator script to eliminate the systematic errors before full-scale regeneration.

	---

	## Appendix: Quick Stats

	### Original Data
	```
	Total examples: 1,000
	Unique prompts: 133
	Tool call issues: 107 (10.7%)
	Multi-tool chains: 0 (0%)
	Identical tool defs: 100%
	Average duplication: 7.52x
	```

	### New Data (Generated)
	```
	Total examples: 500
	File size: 1.9 MB
	Location: training-data_v2/tool_examples.jsonl
	```