fyodor-agentic-v1.1 / README.md

Kiy-K

Update README.md

7d012c4 verified 6 months ago

preview code

raw

history blame contribute delete

2.96 kB

metadata

license: apache-2.0
base_model: Qwen/Qwen2.5-Coder-3B-Instruct
tags:
  - mixture-of-experts
  - moe
  - code
  - function-calling
  - agentic
  - qwen2.5
language:
  - en
pipeline_tag: text-generation
library_name: transformers

Fyodor Agentic v1.1

A Mixture-of-Experts (MoE) enhanced version of Qwen2.5-Coder-3B-Instruct, optimized for agentic AI workflows and function calling.

Model Details

Base Model: Qwen/Qwen2.5-Coder-3B-Instruct
Architecture: Sparse MoE with 4 experts, top-2 routing
Parameters: 6-7B total (~3B base + ~3-4B MoE experts)
Training: Function calling, Python code, conversational data
Format: SafeTensors (ready to use!)

Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "Kiy-K/fyodor-agentic-v1.1",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained(
    "Kiy-K/fyodor-agentic-v1.1",
    trust_remote_code=True
)

# Generate
prompt = "Write a Python function to calculate Fibonacci numbers:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Use Cases

Function Calling: Built for tool usage and API interactions
Code Generation: Python, JavaScript, and more
Agentic Workflows: Multi-step reasoning and planning
Conversational AI: Natural multi-turn dialogue
Instruction Following: Clear and precise responses

Training Details

Total Steps: 5000
Final Loss: 5.5744
Best Loss: 5.5627
Training Time: 0.31h
Platform: Lightning.AI A100

Training Data:

xLAM Function Calling: 5,000 samples
Python Code: 5,000 samples
UltraChat: 5,000 samples

Architecture

Sparse MoE implementation:

MoE layers added every 3 transformer layers
4 experts per layer
Top-2 routing (2 experts active per token)
Load balancing for efficient utilization
Base model frozen, MoE trained

Tips

Temperature: 0.7-0.8 for creative, 0.3-0.5 for precise
Context Window: 2048 tokens
Batch Size: Supports efficient batch inference
Precision: Works with fp16/bf16/fp32

Advanced Usage

# With custom generation config
outputs = model.generate(
    **inputs,
    max_new_tokens=1024,
    temperature=0.8,
    top_p=0.95,
    top_k=50,
    repetition_penalty=1.1,
    do_sample=True
)

License

Apache 2.0 (inherited from base model)

Acknowledgments

Qwen Team for the excellent base model
Lightning.AI for compute infrastructure
HuggingFace for model hosting

Built with love for the agentic AI community