dmgcsilva
/

VIGiA-8B

Model card Files Files and versions

dmgcsilva commited on Feb 21

Commit

96a610a

·

verified ·

1 Parent(s): 01907a6

Create README.md

Files changed (1) hide show

README.md +38 -0

README.md ADDED Viewed

	@@ -0,0 +1,38 @@

+---
+datasets:
+- dmgcsilva/InstructionVidDial
+language:
+- en
+base_model:
+- meta-llama/Llama-3.1-8B-Instruct
+---
+# VIGiA: Instructional Video Guidance via Dialogue Reasoning and Retrieval
+VIGiA is a multimodal dialogue model for conversational instructional plan guidance, accepted at **EACL 2026**.
+It assists users through complex multi-step procedures (e.g., cooking, DIY) via interactive dialogue, supporting plan-grounded answer generation, plan-aware VQA, visually-informed step generation, and conversational video moment retrieval.
+## Model Details
+- **Language Model**: LLaMA 3.1 8B Instruct
+- **Visual Encoder**: SigLIP SO400M (224x224, patch size 14)
+- **Connector**: 2-layer MLP
+- **Training**: 4-stage progressive training with LoRA
+## Links
+- **Code & full documentation**: [github.com/dmgcsilva/vigia](https://github.com/dmgcsilva/vigia)
+- **Dataset**: [InstructionVidDial](#) <!-- TODO: add link -->
+- **Paper**: *VIGiA: Instructional Video Guidance via Dialogue Reasoning and Retrieval* (EACL 2026)
+## Citation
+```bibtex
+@inproceedings{gloria-silva-etal-2026-vigia,
+    title     = "{VIG}i{A}: Instructional Video Guidance via Dialogue Reasoning and Retrieval",
+    author    = "Gl{\'o}ria-Silva, Diogo and Semedo, David and Magalh{\~a}es, Jo{\~a}o",
+    booktitle = "Proceedings of the 2026 Conference of the European Chapter of the Association for Computational Linguistics",
+    year      = "2026",
+    publisher = "Association for Computational Linguistics",
+}
+```