Gemma-Train

Sleeping

App Files Files Community

turtle170 commited on about 1 month ago

Commit

9e68007

verified ·

1 Parent(s): f939316

Update README.md

Browse files

Files changed (1) hide show

README.md +34 -8

README.md CHANGED Viewed

@@ -1,11 +1,11 @@
 ---
-title: AutoTrain Advanced
-emoji: 🚀
 colorFrom: blue
 colorTo: green
 sdk: docker
 pinned: false
-short_description: Create powerful AI models without code
 hf_oauth: true
 hf_oauth_expiration_minutes: 36000
 hf_oauth_scopes:
@@ -16,14 +16,40 @@ hf_oauth_scopes:
  - read-billing
 tags:
  - autotrain
 ---
-# Docs
-https://huggingface.co/docs/autotrain
-# Citation
 @misc{thakur2024autotrainnocodetrainingstateoftheart,
       title={AutoTrain: No-code training for state-of-the-art models},
       author={Abhishek Thakur},
@@ -31,5 +57,5 @@ https://huggingface.co/docs/autotrain
       eprint={2410.15735},
       archivePrefix={arXiv},
       primaryClass={cs.AI},
-      url={https://arxiv.org/abs/2410.15735},
-}

 ---
+title: Gemma-3-4B-PT Full-Model Reasoning Research
+emoji: 🧠
 colorFrom: blue
 colorTo: green
 sdk: docker
 pinned: false
+short_description: Researching multimodal SFT logic on Gemma-3-4B-PT
 hf_oauth: true
 hf_oauth_expiration_minutes: 36000
 hf_oauth_scopes:
  - read-billing
 tags:
  - autotrain
+ - gemma
+ - multimodal
+ - reasoning
+ - sft
 ---
+# 🎯 Project Objective: Improving Multimodal Logic in Gemma 3
+This Space is dedicated to an educational research project focused on **Full-Model Supervised Fine-Tuning (SFT)** of the `google/gemma-3-4b-pt` architecture.
+The goal is to move beyond standard Low-Rank Adaptation (LoRA) to observe how full-parameter updates affect the model's ability to handle complex chain-of-thought reasoning across multimodal inputs.
+## 🛠️ Hardware Requirements & Grant Justification
+* **Baseline:** Nvidia A10G-small (24GB VRAM)
+* **Preferred:** **Nvidia A10G-large** (Additional CPU RAM for sharding/preprocessing)
+Because Gemma 3 is a multimodal model, the vision-language alignment layers and the full-parameter gradient states require the 24GB VRAM capacity of the A10G. Using an A10G-large will allow for faster dataset tokenization and more efficient model sharding during the "Push to Hub" phase, reducing the total grant time used.
+## 🧪 Methodology
+- **Training Type:** Full-Model SFT (Supervised Fine-Tuning)
+- **Precision:** `bf16` with `adamw_bnb_8bit` optimizer
+- **Data:** Curated reasoning dataset formatted in ChatML for logical consistency.
+## 🤝 Community Commitment
+As per the grant request, once training is finalized:
+1. The **full model weights** will be pushed to the Hub.
+2. Training logs (Loss curves/Perplexity) will be made public.
+3. **The Space will be manually reverted to the Free CPU tier to release resources back to the community.**
+# 📜 Docs & Citation
+Official Documentation: [AutoTrain Docs](https://huggingface.co/docs/autotrain)
+```bibtex
 @misc{thakur2024autotrainnocodetrainingstateoftheart,
       title={AutoTrain: No-code training for state-of-the-art models},
       author={Abhishek Thakur},
       eprint={2410.15735},
       archivePrefix={arXiv},
       primaryClass={cs.AI},
+      url={[https://arxiv.org/abs/2410.15735](https://arxiv.org/abs/2410.15735)},
+}