Upload FS-DFM-1.3B-SFT checkpoint

Files changed (4) hide show

README.md ADDED Viewed

+---
+tags:
+  - discrete-flow-matching
+  - web-action-planning
+  - formfactory
+  - reinforcement-learning
+  - openbrowser
+license: apache-2.0
+---
+# FS-DFM-1.3B-SFT
+FS-DFM 1.3B (Apple) fine-tuned with SFT on FormFactory web form-filling tasks. Uses LoRA adapters on the DiT architecture with Poisson jump sampling. Achieves 68.5% nonzero reward rate and 0.146 average reward on 124 test tasks. Part of the STAD80 project: Generative Action Planning via Discrete Flow Matching.
+## Paper
+**Generative Action Planning via Discrete Flow Matching with Online Reinforcement Fine-Tuning**
+- Authors: Muhammad Enrizky Brillian, Qiang Sun
+- Institution: University of Toronto Scarborough
+## Training Details
+- **Dataset**: FormFactory (992 train / 124 val / 124 test tasks, 25 form types, 8 domains)
+- **Infrastructure**: Single NVIDIA A10G GPU (24GB VRAM) on Anyscale
+- **Framework**: PyTorch + PEFT (LoRA/QLoRA)
+## Citation
+If you use this model, please cite:
+```bibtex
+@article{brillian2026flowgrpo,
+  title={Generative Action Planning via Discrete Flow Matching with Online Reinforcement Fine-Tuning},
+  author={Brillian, Muhammad Enrizky and Sun, Qiang},
+  year={2026}
+}
+```

lora_adapter/lora_weights.pt ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:09b482151a0c5a66205220d27fd8e31e21976aefff5f8f497b3004f9132f8a1c
+size 11061461

lora_adapter/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

lora_adapter/tokenizer_config.json ADDED Viewed

+{
+  "add_prefix_space": false,
+  "backend": "tokenizers",
+  "bos_token": "<|endoftext|>",
+  "eos_token": "<|endoftext|>",
+  "errors": "replace",
+  "is_local": false,
+  "model_max_length": 1024,
+  "pad_token": "<|endoftext|>",
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "<|endoftext|>"
+}