Upload FS-DFM-1.3B-SFT checkpoint
Browse files- README.md +37 -0
- lora_adapter/lora_weights.pt +3 -0
- lora_adapter/tokenizer.json +0 -0
- lora_adapter/tokenizer_config.json +12 -0
README.md
ADDED
|
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
tags:
|
| 3 |
+
- discrete-flow-matching
|
| 4 |
+
- web-action-planning
|
| 5 |
+
- formfactory
|
| 6 |
+
- reinforcement-learning
|
| 7 |
+
- openbrowser
|
| 8 |
+
license: apache-2.0
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# FS-DFM-1.3B-SFT
|
| 12 |
+
|
| 13 |
+
FS-DFM 1.3B (Apple) fine-tuned with SFT on FormFactory web form-filling tasks. Uses LoRA adapters on the DiT architecture with Poisson jump sampling. Achieves 68.5% nonzero reward rate and 0.146 average reward on 124 test tasks. Part of the STAD80 project: Generative Action Planning via Discrete Flow Matching.
|
| 14 |
+
|
| 15 |
+
## Paper
|
| 16 |
+
|
| 17 |
+
**Generative Action Planning via Discrete Flow Matching with Online Reinforcement Fine-Tuning**
|
| 18 |
+
- Authors: Muhammad Enrizky Brillian, Qiang Sun
|
| 19 |
+
- Institution: University of Toronto Scarborough
|
| 20 |
+
|
| 21 |
+
## Training Details
|
| 22 |
+
|
| 23 |
+
- **Dataset**: FormFactory (992 train / 124 val / 124 test tasks, 25 form types, 8 domains)
|
| 24 |
+
- **Infrastructure**: Single NVIDIA A10G GPU (24GB VRAM) on Anyscale
|
| 25 |
+
- **Framework**: PyTorch + PEFT (LoRA/QLoRA)
|
| 26 |
+
|
| 27 |
+
## Citation
|
| 28 |
+
|
| 29 |
+
If you use this model, please cite:
|
| 30 |
+
|
| 31 |
+
```bibtex
|
| 32 |
+
@article{brillian2026flowgrpo,
|
| 33 |
+
title={Generative Action Planning via Discrete Flow Matching with Online Reinforcement Fine-Tuning},
|
| 34 |
+
author={Brillian, Muhammad Enrizky and Sun, Qiang},
|
| 35 |
+
year={2026}
|
| 36 |
+
}
|
| 37 |
+
```
|
lora_adapter/lora_weights.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:09b482151a0c5a66205220d27fd8e31e21976aefff5f8f497b3004f9132f8a1c
|
| 3 |
+
size 11061461
|
lora_adapter/tokenizer.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
lora_adapter/tokenizer_config.json
ADDED
|
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"add_prefix_space": false,
|
| 3 |
+
"backend": "tokenizers",
|
| 4 |
+
"bos_token": "<|endoftext|>",
|
| 5 |
+
"eos_token": "<|endoftext|>",
|
| 6 |
+
"errors": "replace",
|
| 7 |
+
"is_local": false,
|
| 8 |
+
"model_max_length": 1024,
|
| 9 |
+
"pad_token": "<|endoftext|>",
|
| 10 |
+
"tokenizer_class": "GPT2Tokenizer",
|
| 11 |
+
"unk_token": "<|endoftext|>"
|
| 12 |
+
}
|