SkillFlow-Model / README.md
beita6969's picture
Upload merged SkillFlow supervisor checkpoint
375ef3a verified
---
language:
- en
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
base_model:
- Qwen/Qwen3.5-9B
datasets:
- beita6969/SkillFlow-Dataset
tags:
- skillflow
- agentic-ai
- llm-agents
- tool-use
- lora-merged
---
# SkillFlow Merged Supervisor
This repository contains the merged SkillFlow Supervisor model weights.
## Source
- Code: https://github.com/beita6969/SkillFlow
- Dataset: https://huggingface.co/datasets/beita6969/SkillFlow-Dataset
## Merge details
- Base model: Qwen/Qwen3.5-9B
- Adapter type: LoRA
- Adapter role: Supervisor forward policy `theta`
- Checkpoint: `checkpoint_step_0110`
- LoRA rank: 64
- LoRA alpha: 128
- Target modules: `q_proj`, `k_proj`, `v_proj`, `o_proj`
- Merge dtype: bfloat16
The training-time backward policy adapter is not merged into this inference model.