turtle170 commited on
Commit
9e68007
·
verified ·
1 Parent(s): f939316

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -8
README.md CHANGED
@@ -1,11 +1,11 @@
1
  ---
2
- title: AutoTrain Advanced
3
- emoji: 🚀
4
  colorFrom: blue
5
  colorTo: green
6
  sdk: docker
7
  pinned: false
8
- short_description: Create powerful AI models without code
9
  hf_oauth: true
10
  hf_oauth_expiration_minutes: 36000
11
  hf_oauth_scopes:
@@ -16,14 +16,40 @@ hf_oauth_scopes:
16
  - read-billing
17
  tags:
18
  - autotrain
 
 
 
 
19
  ---
20
 
21
- # Docs
22
 
23
- https://huggingface.co/docs/autotrain
24
 
25
- # Citation
26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
  @misc{thakur2024autotrainnocodetrainingstateoftheart,
28
  title={AutoTrain: No-code training for state-of-the-art models},
29
  author={Abhishek Thakur},
@@ -31,5 +57,5 @@ https://huggingface.co/docs/autotrain
31
  eprint={2410.15735},
32
  archivePrefix={arXiv},
33
  primaryClass={cs.AI},
34
- url={https://arxiv.org/abs/2410.15735},
35
- }
 
1
  ---
2
+ title: Gemma-3-4B-PT Full-Model Reasoning Research
3
+ emoji: 🧠
4
  colorFrom: blue
5
  colorTo: green
6
  sdk: docker
7
  pinned: false
8
+ short_description: Researching multimodal SFT logic on Gemma-3-4B-PT
9
  hf_oauth: true
10
  hf_oauth_expiration_minutes: 36000
11
  hf_oauth_scopes:
 
16
  - read-billing
17
  tags:
18
  - autotrain
19
+ - gemma
20
+ - multimodal
21
+ - reasoning
22
+ - sft
23
  ---
24
 
25
+ # 🎯 Project Objective: Improving Multimodal Logic in Gemma 3
26
 
27
+ This Space is dedicated to an educational research project focused on **Full-Model Supervised Fine-Tuning (SFT)** of the `google/gemma-3-4b-pt` architecture.
28
 
29
+ The goal is to move beyond standard Low-Rank Adaptation (LoRA) to observe how full-parameter updates affect the model's ability to handle complex chain-of-thought reasoning across multimodal inputs.
30
 
31
+ ## 🛠️ Hardware Requirements & Grant Justification
32
+ * **Baseline:** Nvidia A10G-small (24GB VRAM)
33
+ * **Preferred:** **Nvidia A10G-large** (Additional CPU RAM for sharding/preprocessing)
34
+
35
+ Because Gemma 3 is a multimodal model, the vision-language alignment layers and the full-parameter gradient states require the 24GB VRAM capacity of the A10G. Using an A10G-large will allow for faster dataset tokenization and more efficient model sharding during the "Push to Hub" phase, reducing the total grant time used.
36
+
37
+ ## 🧪 Methodology
38
+ - **Training Type:** Full-Model SFT (Supervised Fine-Tuning)
39
+ - **Precision:** `bf16` with `adamw_bnb_8bit` optimizer
40
+ - **Data:** Curated reasoning dataset formatted in ChatML for logical consistency.
41
+
42
+ ## 🤝 Community Commitment
43
+ As per the grant request, once training is finalized:
44
+ 1. The **full model weights** will be pushed to the Hub.
45
+ 2. Training logs (Loss curves/Perplexity) will be made public.
46
+ 3. **The Space will be manually reverted to the Free CPU tier to release resources back to the community.**
47
+
48
+ # 📜 Docs & Citation
49
+
50
+ Official Documentation: [AutoTrain Docs](https://huggingface.co/docs/autotrain)
51
+
52
+ ```bibtex
53
  @misc{thakur2024autotrainnocodetrainingstateoftheart,
54
  title={AutoTrain: No-code training for state-of-the-art models},
55
  author={Abhishek Thakur},
 
57
  eprint={2410.15735},
58
  archivePrefix={arXiv},
59
  primaryClass={cs.AI},
60
+ url={[https://arxiv.org/abs/2410.15735](https://arxiv.org/abs/2410.15735)},
61
+ }