File size: 2,400 Bytes
bd9f0ed
9e68007
 
9d61a2f
 
bd9f0ed
 
9e68007
c8e5c14
24d3520
c8e5c14
 
 
 
 
24d3520
1f10e35
 
9e68007
 
 
 
bd9f0ed
 
9e68007
fe01b2b
9e68007
fe01b2b
9e68007
fe01b2b
9e68007
c872e3d
9e68007
c4662a8
9e68007
 
 
c872e3d
9e68007
 
 
 
 
 
 
 
 
 
 
 
 
fe01b2b
 
 
 
 
 
 
9e68007
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
---
title: Gemma-3-4B-PT Full-Model Reasoning Research
emoji: 🧠
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
short_description: Researching multimodal SFT logic on Gemma-3-4B-PT
hf_oauth: true
hf_oauth_expiration_minutes: 36000
hf_oauth_scopes:
 - read-repos
 - write-repos
 - manage-repos
 - inference-api
 - read-billing
tags:
 - autotrain
 - gemma
 - multimodal
 - reasoning
 - sft
---

# 🎯 Project Objective: Improving Multimodal Logic in Gemma 3

This Space is dedicated to an educational research project focused on **Full-Model Supervised Fine-Tuning (SFT)** of the `google/gemma-3-4b-pt` architecture. 

The goal is to move beyond standard Low-Rank Adaptation (LoRA) to observe how full-parameter updates affect the model's ability to handle complex chain-of-thought reasoning across multimodal inputs.

## 🛠️ Hardware Requirements & Grant Justification
NVIDIA L40S

Because Gemma 3 is a multimodal model, the vision-language alignment layers and the full-parameter gradient states require the **48GB VRAM capacity of the L40S**. This high memory ceiling is essential for maintaining stability during the SFT process and preventing OOM (Out of Memory) errors when calculating multimodal attention gradients at 4B scale. Using an L40S will allow for faster dataset tokenization and more efficient model sharding, significantly reducing the total grant time used.

## 🧪 Methodology
- **Training Type:** Full-Model SFT (Supervised Fine-Tuning)
- **Precision:** `FP8` with `adamw_bnb_8bit` optimizer and `Unsloth`
- **Data:** Curated reasoning dataset formatted in ChatML for logical consistency.

## 🤝 Community Commitment
As per the grant request, once training is finalized:
1. The **full model weights** will be pushed to the Hub.
2. Training logs (Loss curves/Perplexity) will be made public.
3. **The Space will be manually reverted to the Free CPU tier to release resources back to the community.**

# 📜 Docs & Citation

Official Documentation: [AutoTrain Docs](https://huggingface.co/docs/autotrain)

```bibtex
@misc{thakur2024autotrainnocodetrainingstateoftheart,
      title={AutoTrain: No-code training for state-of-the-art models}, 
      author={Abhishek Thakur},
      year={2024},
      eprint={2410.15735},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={[https://arxiv.org/abs/2410.15735](https://arxiv.org/abs/2410.15735)}, 
}