billyenrizky commited on
Commit
0ebac1b
·
verified ·
1 Parent(s): 0ad4070

Upload FS-DFM-1.3B-SFT checkpoint

Browse files
README.md ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - discrete-flow-matching
4
+ - web-action-planning
5
+ - formfactory
6
+ - reinforcement-learning
7
+ - openbrowser
8
+ license: apache-2.0
9
+ ---
10
+
11
+ # FS-DFM-1.3B-SFT
12
+
13
+ FS-DFM 1.3B (Apple) fine-tuned with SFT on FormFactory web form-filling tasks. Uses LoRA adapters on the DiT architecture with Poisson jump sampling. Achieves 68.5% nonzero reward rate and 0.146 average reward on 124 test tasks. Part of the STAD80 project: Generative Action Planning via Discrete Flow Matching.
14
+
15
+ ## Paper
16
+
17
+ **Generative Action Planning via Discrete Flow Matching with Online Reinforcement Fine-Tuning**
18
+ - Authors: Muhammad Enrizky Brillian, Qiang Sun
19
+ - Institution: University of Toronto Scarborough
20
+
21
+ ## Training Details
22
+
23
+ - **Dataset**: FormFactory (992 train / 124 val / 124 test tasks, 25 form types, 8 domains)
24
+ - **Infrastructure**: Single NVIDIA A10G GPU (24GB VRAM) on Anyscale
25
+ - **Framework**: PyTorch + PEFT (LoRA/QLoRA)
26
+
27
+ ## Citation
28
+
29
+ If you use this model, please cite:
30
+
31
+ ```bibtex
32
+ @article{brillian2026flowgrpo,
33
+ title={Generative Action Planning via Discrete Flow Matching with Online Reinforcement Fine-Tuning},
34
+ author={Brillian, Muhammad Enrizky and Sun, Qiang},
35
+ year={2026}
36
+ }
37
+ ```
lora_adapter/lora_weights.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:09b482151a0c5a66205220d27fd8e31e21976aefff5f8f497b3004f9132f8a1c
3
+ size 11061461
lora_adapter/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
lora_adapter/tokenizer_config.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "backend": "tokenizers",
4
+ "bos_token": "<|endoftext|>",
5
+ "eos_token": "<|endoftext|>",
6
+ "errors": "replace",
7
+ "is_local": false,
8
+ "model_max_length": 1024,
9
+ "pad_token": "<|endoftext|>",
10
+ "tokenizer_class": "GPT2Tokenizer",
11
+ "unk_token": "<|endoftext|>"
12
+ }