Instructions to use my-ai-stack/Stack-2-9-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use my-ai-stack/Stack-2-9-finetuned with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="my-ai-stack/Stack-2-9-finetuned")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("my-ai-stack/Stack-2-9-finetuned")
model = AutoModelForCausalLM.from_pretrained("my-ai-stack/Stack-2-9-finetuned")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use my-ai-stack/Stack-2-9-finetuned with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "my-ai-stack/Stack-2-9-finetuned"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-2-9-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/my-ai-stack/Stack-2-9-finetuned

SGLang

How to use my-ai-stack/Stack-2-9-finetuned with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "my-ai-stack/Stack-2-9-finetuned" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-2-9-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "my-ai-stack/Stack-2-9-finetuned" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-2-9-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use my-ai-stack/Stack-2-9-finetuned with Docker Model Runner:
```
docker model run hf.co/my-ai-stack/Stack-2-9-finetuned
```

Stack-2-9-finetuned / docs /DATA_FORMAT.md

walidsobhie-code

feat: add production infrastructure - CI/CD, Docker, code quality, and monitoring

b5998ff 2 months ago

preview code

raw

history blame

4.99 kB

Stack 2.9 Training Data Format

This document describes the format and structure of training data for Stack 2.9.

Overview

Training data is stored in JSONL format (JSON Lines), where each line is a valid JSON object representing a single training example.

File Structure

training-data/
├── tool_examples.jsonl          # Original examples (1000)
├── augmented_tool_examples.jsonl # Augmented examples (2-5x)
└── scaled/                      # Processed datasets
    ├── train.jsonl
    └── val.jsonl

Example Format

{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful AI assistant that can use tools to help users solve problems."
    },
    {
      "role": "user",
      "content": "Can you show me the tests/test_main.py file?"
    },
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [
        {
          "id": "call_$1180",
          "type": "function",
          "function": {
            "name": "FileRead",
            "arguments": "{\"path\": \"src/main.py\"}"
          }
        }
      ]
    },
    {
      "role": "tool",
      "content": "Successfully read file: README.md\n```markdown\n# My Project\n\nA sample project for Stack 2.9.\n```",
      "tool_call_id": "call_$1180",
      "name": "FileRead"
    },
    {
      "role": "assistant",
      "content": "Here's the README.md:\n\n```markdown\n# My Project\n\nA sample project for Stack 2.9.\n```"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "Bash",
        "description": "Execute bash commands in the terminal.",
        "parameters": {
          "type": "object",
          "properties": {
            "command": {"type": "string", "description": "The bash command to execute"},
            "timeout": {"type": "integer", "description": "Timeout in seconds"}
          },
          "required": ["command"]
        }
      }
    },
    {
      "type": "function",
      "function": {
        "name": "FileRead",
        "description": "Read the contents of a file.",
        "parameters": {
          "type": "object",
          "properties": {
            "path": {"type": "string", "description": "Path to the file to read"},
            "offset": {"type": "integer", "description": "Line number to start from"},
            "limit": {"type": "integer", "description": "Max lines to read"}
          },
          "required": ["path"]
        }
      }
    }
  ]
}

Field Definitions

Top-Level Fields

Field	Type	Required	Description
`messages`	array	Yes	Array of message objects
`tools`	array	Yes	Available tools/functions
`source`	string	No	Data source identifier

Message Object

Field	Type	Required	Description
`role`	string	Yes	One of: system, user, assistant, tool
`content`	string	Yes*	Message content (null if tool_calls present)
`tool_calls`	array	No*	Tool call requests
`tool_call_id`	string	No*	ID linking to tool response
`name`	string	No*	Tool name (for tool messages)

*Content is required unless tool_calls is present. tool_call_id and name required for role="tool".

Tool Call Object

Field	Type	Required	Description
`id`	string	Yes	Unique call identifier
`type`	string	Yes	Always "function"
`function`	object	Yes	Function name and arguments
`function.name`	string	Yes	Tool/function name
`function.arguments`	object/string	Yes	JSON arguments

Data Sources

random_synthetic: Auto-generated with random parameters
synthetic_template: Template-based synthetic examples
augmented_*: Augmented from other sources
original: Human-curated examples

Augmentation

The augmentation script applies these transformations:

Paraphrasing: Reword user prompts (70% chance)
Difficulty scaling: Add complexity modifiers
Parameter variation: Change file paths, commands
Filler words: Add "please", "thanks" (30% chance)
Edge cases: Empty input, multi-step, error handling

Run augmentation:

python scripts/augment_training_data.py \
  --input training-data/tool_examples.jsonl \
  --output training-data/augmented.jsonl \
  --multiplier 3

Validation

Run validation to check data quality:

python scripts/validate_training_data.py --input training-data/tool_examples.jsonl

Checks include:

Required fields present
Valid JSON syntax
Message role ordering
Tool call structure
No empty entries

Converting to Training Format

For training, convert to standard format:

# Example conversion
python scripts/combine_datasets.py \
  --input training-data/augmented.jsonl \
  --output data/final/train.jsonl \
  --format chatml