beta3 commited on
Commit
17ec7c0
·
verified ·
1 Parent(s): b0bd7ed

Upload fine-tuned model directly from Google Drive

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,289 @@
1
- ---
2
- license: gemma
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: unsloth/gemma-3-1b-it
3
+ library_name: transformers
4
+ tags:
5
+ - gemma-3
6
+ - fine-tuning
7
+ - sft
8
+ - unsloth
9
+ - academic-title-generation
10
+ - lora
11
+ - 4bit
12
+ - chat-template
13
+ model_name: gemma3_1b_title_generator
14
+ ---
15
+
16
+ <center>
17
+
18
+ # **Gemma 3 — 1B Academic Title Generator**
19
+
20
+ <img src="https://www.geeky-gadgets.com/wp-content/uploads/2025/03/google-gemma-3-advanced-ai-models.webp" width="600"/>
21
+
22
+ </center>
23
+
24
+ ---
25
+
26
+ ## Overview
27
+
28
+ **gemma3_1b_title_generator** is a fine-tuned version of `unsloth/gemma-3-1b-it`, optimized specifically for generating **academic paper titles** from scientific abstracts.
29
+
30
+ The training process adapts Gemma-3's chat-format behavior to perform highly focused title generation. The model was fine-tuned using a **multi-batch training pipeline** due to hardware limitations, leveraging Unsloth’s efficient 4-bit loading and LoRA adapters.
31
+
32
+ This results in a lightweight, fast, and domain-specialized model capable of producing concise, coherent, and academically accurate titles.
33
+
34
+ ---
35
+
36
+ ## Dataset & Preprocessing
37
+
38
+ Training data consists of scientific **abstract → title** pairs.
39
+ Because of memory constraints, the dataset was processed in **sequential batches**, each integrated into the model through incremental checkpoints. This collaborative batch-training approach was made possible thanks to **Unsloth’s lightweight fine-tuning tools**.
40
+
41
+ Each data sample was converted into a **Gemma-3 style chat conversation**, allowing the model to learn the title as the model's response:
42
+
43
+ ```python
44
+ def format_dataset_for_chat(example):
45
+ messages = [
46
+ {"role": "user", "content": "Generate a title for the following abstract:\n" + example["abstract"]},
47
+ {"role": "model", "content": example["title"]}
48
+ ]
49
+
50
+ example["text"] = tokenizer.apply_chat_template(
51
+ messages,
52
+ tokenize=False,
53
+ add_generation_prompt=False
54
+ ).removeprefix("<bos>")
55
+
56
+ return example
57
+ ```
58
+
59
+ ## Chat Format
60
+
61
+ Gemma-3 uses a structured multi-turn dialog format.
62
+ Each training example is converted into a conversation where:
63
+
64
+ - The **user** provides the abstract.
65
+ - The **model** outputs the title.
66
+
67
+ The structure follows the Gemma-3 chat template:
68
+
69
+ <bos><start_of_turn>user
70
+ ... user content ...
71
+ <end_of_turn>
72
+ <start_of_turn>model
73
+ ... model content ...
74
+ <end_of_turn>
75
+
76
+ This formatting is automatically created using Unsloth’s
77
+ `tokenizer.apply_chat_template()`.
78
+
79
+ Below is the preprocessing function used during fine-tuning:
80
+
81
+ ```python
82
+ def format_dataset_for_chat(example):
83
+ messages = [
84
+ {"role": "user", "content": "Generate a title for the following abstract:\n" + example["abstract"]},
85
+ {"role": "model", "content": example["title"]}
86
+ ]
87
+
88
+ example["text"] = tokenizer.apply_chat_template(
89
+ messages,
90
+ tokenize=False,
91
+ add_generation_prompt=False
92
+ ).removeprefix("<bos>")
93
+
94
+ return example
95
+ ```
96
+ ## Training Configuration
97
+
98
+ Fine-tuning was performed using the SFTTrainer from TRL, combined with Unsloth’s
99
+ efficient 4-bit loading and LoRA adaptation layers. The training process followed
100
+ a multi-batch strategy due to hardware limitations, with incremental checkpoint
101
+ loading supported by Unsloth.
102
+
103
+ ### Key Training Settings
104
+
105
+ - Model: unsloth/gemma-3-1b-it
106
+ - Precision: 4-bit (QLoRA)
107
+ - Method: Supervised Fine-Tuning (SFT)
108
+ - LoRA: Enabled for attention and MLP modules
109
+ - Sequence length: 2048 tokens
110
+ - Optimizer: AdamW (8-bit)
111
+ - Scheduler: cosine
112
+ - Strategy: multi-batch training with checkpoint continuation
113
+ - Tokenizer: Gemma-3 chat template applied through Unsloth
114
+
115
+ ### Response-Only Learning
116
+
117
+ To ensure the model learns **only the title** (the model output) and does not
118
+ memorize the user prompt (the abstract), response-only loss masking was applied:
119
+
120
+ ```python
121
+ trainer = train_on_responses_only(
122
+ trainer,
123
+ instruction_part = "<start_of_turn>user\n", # User turn with the abstract
124
+ response_part = "<start_of_turn>model\n", # Model turn with the generated title
125
+ )
126
+ ```
127
+
128
+ This enforces that gradients flow exclusively through the model's output portion
129
+ of the chat sequence, improving instruction-following consistency and ensuring
130
+ that the LoRA adapters specialize in generating high-quality academic titles
131
+ instead of learning or reproducing the user prompt.
132
+
133
+ ### Training Behavior
134
+
135
+ - LoRA significantly reduces VRAM usage while maintaining strong output quality.
136
+ - Unsloth manages efficient 4-bit quantization, chat-template formatting, and
137
+ checkpoint handling.
138
+ - Multi-batch training allows large datasets to be processed even with limited
139
+ hardware resources.
140
+ - Validation steps are used to monitor loss and adjust training dynamics.
141
+
142
+ ## 🚀 Quick Usage Example
143
+
144
+ Before running inference, make sure all required libraries are installed:
145
+
146
+ ```bash
147
+ !pip install -q transformers accelerate torch
148
+ !pip install -q -U bitsandbytes
149
+ # Only if your setup or model requires Unsloth for loading:
150
+ !pip install -q unsloth
151
+ ```
152
+
153
+ Below is a clean and ready-to-run example demonstrating how to generate an
154
+ academic title using the Gemma-3 chat template:
155
+
156
+ ```python
157
+ from transformers import pipeline
158
+ import torch
159
+
160
+ pipe = pipeline(
161
+ "text-generation",
162
+ model="beta3/gemma3_1b_title_generator",
163
+ dtype=torch.bfloat16
164
+ )
165
+
166
+ # Example abstract for title generation
167
+ abstract = """
168
+ Transformer-based architectures have demonstrated strong performance in tasks
169
+ involving reasoning, scientific understanding, and text generation. Producing
170
+ concise academic titles from long abstracts, however, remains a non-trivial task.
171
+ """
172
+
173
+ # Construct the Gemma-3 chat-format prompt manually
174
+ chat_template_prompt = (
175
+ "<bos>"
176
+ "<start_of_turn>user\n"
177
+ "Generate a simple title for the following abstract:\n"
178
+ f"{abstract}\n"
179
+ "<end_of_turn>\n"
180
+ "<start_of_turn>model\n"
181
+ )
182
+
183
+ # Generate the title
184
+ result = pipe(
185
+ chat_template_prompt,
186
+ max_new_tokens=32, # Number of tokens to generate
187
+ do_sample=True, # Enables sampling for more creative outputs
188
+ temperature=0.7, # Controls generation randomness
189
+ top_p=0.9, # Nucleus sampling
190
+ return_full_text=False
191
+ )[0]["generated_text"]
192
+
193
+ print("Generated title:", result)
194
+ ```
195
+
196
+ This example reproduces the exact Gemma-3 chat behavior and produces clean,
197
+ publication-ready academic titles.
198
+
199
+ ## Capabilities & Limitations
200
+
201
+ ### Capabilities
202
+
203
+ - Generates concise, publication-ready academic titles from scientific abstracts.
204
+ - Learns to identify the core idea of long, complex abstracts.
205
+ - Follows structured, instruction-based prompts using the Gemma-3 chat format.
206
+ - Efficient inference thanks to 4-bit quantization and LoRA adaptation.
207
+ - Performs reliably across a wide variety of scientific domains.
208
+
209
+ ### Limitations
210
+
211
+ - Output quality depends heavily on the clarity and structure of the abstract; vague inputs may produce generic titles.
212
+ - The model does not verify factual accuracy or scientific correctness.
213
+ - Performance may vary for highly domain-specific or expert-level fields requiring specialized terminology.
214
+ - This model is only **1B parameters**, significantly smaller than larger Gemma or Llama variants, which means it may not always capture deep semantic details or produce titles as accurate as bigger models.
215
+ - The model is optimized for academic summarization and may not generalize well to creative or conversational tasks.
216
+
217
+ ## Credits
218
+
219
+ This project was made possible thanks to several key open-source tools,
220
+ frameworks, and community contributors:
221
+
222
+ - **Unsloth** — for enabling efficient 4-bit training, LoRA integration,
223
+ memory-optimized model loading, and the Gemma-3 chat template utilities.
224
+ Their tooling was essential for making multi-batch fine-tuning feasible
225
+ under limited hardware conditions.
226
+
227
+ - **Hugging Face TRL** — for providing the SFTTrainer and the
228
+ response-only training workflow, allowing the model to focus exclusively
229
+ on generating high-quality titles.
230
+
231
+ - **Google DeepMind** — for releasing the Gemma-3 family of models,
232
+ offering a powerful instruction-tuned foundation suitable for scientific
233
+ summarization and academic tasks.
234
+
235
+ - **Hugging Face Transformers / Datasets** — for model loading,
236
+ tokenization pipelines, and large-scale dataset management.
237
+
238
+ - **Google Colab** — for generously providing free access to high-performance
239
+ GPUs to the community. Their platform makes it possible for independent
240
+ researchers, students, and developers to experiment with advanced
241
+ large-language-model training workflows without requiring specialized
242
+ hardware.
243
+
244
+ Special appreciation goes to the broader open-source community for maintaining
245
+ the tools, documentation, and shared knowledge that make projects like this
246
+ possible.
247
+
248
+ ## License
249
+
250
+ This model follows the licensing terms of its upstream foundation models and
251
+ tooling:
252
+
253
+ - **Base Model License:** Inherits the license of
254
+ `unsloth/gemma-3-1b-it`, which itself is based on Google’s *Gemma 3*
255
+ licensing terms.
256
+
257
+ - **Gemma 3 License:** Usage must comply with the Gemma family license
258
+ provided by Google DeepMind. For details, refer to the official documentation
259
+ and license terms published by Google.
260
+
261
+ - **Training Frameworks:**
262
+ - Unsloth (training optimizations, LoRA, 4-bit loading)
263
+ - Hugging Face TRL (SFTTrainer)
264
+ - Hugging Face Transformers & Datasets
265
+
266
+ All these tools are used under their respective open-source licenses.
267
+
268
+ **Important:**
269
+ This fine-tuned model is provided *as-is* with no additional warranties. Users
270
+ are responsible for ensuring compliance with applicable licenses and usage
271
+ restrictions when deploying or redistributing the model.
272
+
273
+ For complete details, please consult:
274
+
275
+ - Google Gemma License
276
+ - Unsloth Documentation & License
277
+ - Hugging Face Transformers License
278
+
279
+ ## Intended Use
280
+
281
+ This model is intended for generating concise academic titles from research
282
+ abstracts. It is **not** designed for general conversation, creative writing,
283
+ or factual verification.
284
+
285
+ ## Safety
286
+
287
+ The model may reflect biases present in academic text sources. Outputs should
288
+ be reviewed by humans before publication.
289
+
adapter_config.json ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": {
4
+ "base_model_class": "Gemma3ForCausalLM",
5
+ "parent_library": "transformers.models.gemma3.modeling_gemma3",
6
+ "unsloth_fixed": true
7
+ },
8
+ "base_model_name_or_path": "unsloth/gemma-3-1b-it-unsloth-bnb-4bit",
9
+ "bias": "none",
10
+ "corda_config": null,
11
+ "eva_config": null,
12
+ "exclude_modules": null,
13
+ "fan_in_fan_out": false,
14
+ "inference_mode": true,
15
+ "init_lora_weights": true,
16
+ "layer_replication": null,
17
+ "layers_pattern": null,
18
+ "layers_to_transform": null,
19
+ "loftq_config": {},
20
+ "lora_alpha": 16,
21
+ "lora_bias": false,
22
+ "lora_dropout": 0,
23
+ "megatron_config": null,
24
+ "megatron_core": "megatron.core",
25
+ "modules_to_save": null,
26
+ "peft_type": "LORA",
27
+ "qalora_group_size": 16,
28
+ "r": 16,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": "(?:.*?(?:language|text).*?(?:self_attn|attention|attn|mlp|feed_forward|ffn|dense).*?(?:q_proj|k_proj|v_proj|o_proj|gate_proj|up_proj|down_proj).*?)|(?:\\bmodel\\.layers\\.[\\d]{1,}\\.(?:self_attn|attention|attn|mlp|feed_forward|ffn|dense)\\.(?:(?:q_proj|k_proj|v_proj|o_proj|gate_proj|up_proj|down_proj)))",
32
+ "target_parameters": null,
33
+ "task_type": "CAUSAL_LM",
34
+ "trainable_token_indices": null,
35
+ "use_dora": false,
36
+ "use_qalora": false,
37
+ "use_rslora": false
38
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:22ec271e5e1a0942d81e43f4e7a960909d144fb209154fdbb87c70bcdc36a53f
3
+ size 52231312
added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "<image_soft_token>": 262144
3
+ }
chat_template.jinja ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {{ bos_token }}
2
+ {%- if messages[0]['role'] == 'system' -%}
3
+ {%- if messages[0]['content'] is string -%}
4
+ {%- set first_user_prefix = messages[0]['content'] + '
5
+
6
+ ' -%}
7
+ {%- else -%}
8
+ {%- set first_user_prefix = messages[0]['content'][0]['text'] + '
9
+
10
+ ' -%}
11
+ {%- endif -%}
12
+ {%- set loop_messages = messages[1:] -%}
13
+ {%- else -%}
14
+ {%- set first_user_prefix = "" -%}
15
+ {%- set loop_messages = messages -%}
16
+ {%- endif -%}
17
+ {%- for message in loop_messages -%}
18
+ {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
19
+ {{ raise_exception("Conversation roles must alternate user/assistant/user/assistant/...") }}
20
+ {%- endif -%}
21
+ {%- if (message['role'] == 'assistant') -%}
22
+ {%- set role = "model" -%}
23
+ {%- else -%}
24
+ {%- set role = message['role'] -%}
25
+ {%- endif -%}
26
+ {{ '<start_of_turn>' + role + '
27
+ ' + (first_user_prefix if loop.first else "") }}
28
+ {%- if message['content'] is string -%}
29
+ {{ message['content'] | trim }}
30
+ {%- elif message['content'] is iterable -%}
31
+ {%- for item in message['content'] -%}
32
+ {%- if item['type'] == 'image' -%}
33
+ {{ '<start_of_image>' }}
34
+ {%- elif item['type'] == 'text' -%}
35
+ {{ item['text'] | trim }}
36
+ {%- endif -%}
37
+ {%- endfor -%}
38
+ {%- else -%}
39
+ {{ raise_exception("Invalid content type") }}
40
+ {%- endif -%}
41
+ {{ '<end_of_turn>
42
+ ' }}
43
+ {%- endfor -%}
44
+ {%- if add_generation_prompt -%}
45
+ {{ '<start_of_turn>model
46
+ ' }}
47
+ {%- endif -%}
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:332eec556c7e5f1c4ec9a720eef60f6eafe555a76b82fcd1af9e1d10008a8993
3
+ size 27861739
rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f6657ccd5ba73eb2588fe6c69638f02621253e47f1271867fd3af0b8ff5c9b2a
3
+ size 14645
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a72d1abae8c55fdedc7f6e855fb3939aba7f6d9e09baa4306b2f5553739814c3
3
+ size 1465
special_tokens_map.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "boi_token": "<start_of_image>",
3
+ "bos_token": {
4
+ "content": "<bos>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false
9
+ },
10
+ "eoi_token": "<end_of_image>",
11
+ "eos_token": {
12
+ "content": "<end_of_turn>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false
17
+ },
18
+ "image_token": "<image_soft_token>",
19
+ "pad_token": {
20
+ "content": "<pad>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false
25
+ },
26
+ "unk_token": {
27
+ "content": "<unk>",
28
+ "lstrip": false,
29
+ "normalized": false,
30
+ "rstrip": false,
31
+ "single_word": false
32
+ }
33
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4667f2089529e8e7657cfb6d1c19910ae71ff5f28aa7ab2ff2763330affad795
3
+ size 33384568
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1299c11d7cf632ef3b4e11937501358ada021bbdf7c47638d13c0ee982f2e79c
3
+ size 4689074
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff
 
trainer_state.json ADDED
The diff for this file is too large to render. See raw diff
 
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:81ed17072d6b7a89e259fb73c1864f355ddf518c5e2491afe950701b9fff8f3e
3
+ size 6289