File size: 10,084 Bytes
17ec7c0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
---
base_model: unsloth/gemma-3-1b-it
library_name: transformers
tags:
- gemma-3
- fine-tuning
- sft
- unsloth
- academic-title-generation
- lora
- 4bit
- chat-template
model_name: gemma3_1b_title_generator
---

<center>

# **Gemma 3 — 1B Academic Title Generator**

<img src="https://www.geeky-gadgets.com/wp-content/uploads/2025/03/google-gemma-3-advanced-ai-models.webp" width="600"/>

</center>

---

## Overview

**gemma3_1b_title_generator** is a fine-tuned version of `unsloth/gemma-3-1b-it`, optimized specifically for generating **academic paper titles** from scientific abstracts.

The training process adapts Gemma-3's chat-format behavior to perform highly focused title generation. The model was fine-tuned using a **multi-batch training pipeline** due to hardware limitations, leveraging Unsloth’s efficient 4-bit loading and LoRA adapters.

This results in a lightweight, fast, and domain-specialized model capable of producing concise, coherent, and academically accurate titles.

---

## Dataset & Preprocessing

Training data consists of scientific **abstract → title** pairs.  
Because of memory constraints, the dataset was processed in **sequential batches**, each integrated into the model through incremental checkpoints. This collaborative batch-training approach was made possible thanks to **Unsloth’s lightweight fine-tuning tools**.

Each data sample was converted into a **Gemma-3 style chat conversation**, allowing the model to learn the title as the model's response:

```python
def format_dataset_for_chat(example):
    messages = [
        {"role": "user",  "content": "Generate a title for the following abstract:\n" + example["abstract"]},
        {"role": "model", "content": example["title"]}
    ]

    example["text"] = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=False
    ).removeprefix("<bos>")

    return example
```

## Chat Format

Gemma-3 uses a structured multi-turn dialog format.  
Each training example is converted into a conversation where:

- The **user** provides the abstract.
- The **model** outputs the title.

The structure follows the Gemma-3 chat template:

<bos><start_of_turn>user
... user content ...
<end_of_turn>
<start_of_turn>model
... model content ...
<end_of_turn>

This formatting is automatically created using Unsloth’s
`tokenizer.apply_chat_template()`.

Below is the preprocessing function used during fine-tuning:

```python
def format_dataset_for_chat(example):
    messages = [
        {"role": "user",  "content": "Generate a title for the following abstract:\n" + example["abstract"]},
        {"role": "model", "content": example["title"]}
    ]

    example["text"] = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=False
    ).removeprefix("<bos>")

    return example
```
## Training Configuration

Fine-tuning was performed using the SFTTrainer from TRL, combined with Unsloth’s
efficient 4-bit loading and LoRA adaptation layers. The training process followed
a multi-batch strategy due to hardware limitations, with incremental checkpoint
loading supported by Unsloth.

### Key Training Settings

- Model: unsloth/gemma-3-1b-it  
- Precision: 4-bit (QLoRA)  
- Method: Supervised Fine-Tuning (SFT)  
- LoRA: Enabled for attention and MLP modules  
- Sequence length: 2048 tokens  
- Optimizer: AdamW (8-bit)  
- Scheduler: cosine  
- Strategy: multi-batch training with checkpoint continuation  
- Tokenizer: Gemma-3 chat template applied through Unsloth  

### Response-Only Learning

To ensure the model learns **only the title** (the model output) and does not 
memorize the user prompt (the abstract), response-only loss masking was applied:

```python
trainer = train_on_responses_only(
    trainer,
    instruction_part = "<start_of_turn>user\n",   # User turn with the abstract
    response_part    = "<start_of_turn>model\n",  # Model turn with the generated title
)
```

This enforces that gradients flow exclusively through the model's output portion
of the chat sequence, improving instruction-following consistency and ensuring
that the LoRA adapters specialize in generating high-quality academic titles
instead of learning or reproducing the user prompt.

### Training Behavior

- LoRA significantly reduces VRAM usage while maintaining strong output quality.  
- Unsloth manages efficient 4-bit quantization, chat-template formatting, and
  checkpoint handling.  
- Multi-batch training allows large datasets to be processed even with limited
  hardware resources.  
- Validation steps are used to monitor loss and adjust training dynamics.  

## 🚀 Quick Usage Example

Before running inference, make sure all required libraries are installed:

```bash
!pip install -q transformers accelerate torch
!pip install -q -U bitsandbytes
# Only if your setup or model requires Unsloth for loading:
!pip install -q unsloth
```

Below is a clean and ready-to-run example demonstrating how to generate an
academic title using the Gemma-3 chat template:

```python
from transformers import pipeline
import torch

pipe = pipeline(
    "text-generation",
    model="beta3/gemma3_1b_title_generator",  
    dtype=torch.bfloat16  
)

# Example abstract for title generation
abstract = """
Transformer-based architectures have demonstrated strong performance in tasks
involving reasoning, scientific understanding, and text generation. Producing
concise academic titles from long abstracts, however, remains a non-trivial task.
"""

# Construct the Gemma-3 chat-format prompt manually
chat_template_prompt = (
    "<bos>"
    "<start_of_turn>user\n"
    "Generate a simple title for the following abstract:\n"
    f"{abstract}\n"
    "<end_of_turn>\n"
    "<start_of_turn>model\n"
)

# Generate the title
result = pipe(
    chat_template_prompt,
    max_new_tokens=32,   # Number of tokens to generate
    do_sample=True,      # Enables sampling for more creative outputs
    temperature=0.7,     # Controls generation randomness
    top_p=0.9,           # Nucleus sampling
    return_full_text=False
)[0]["generated_text"]

print("Generated title:", result)
```

This example reproduces the exact Gemma-3 chat behavior and produces clean,
publication-ready academic titles.

## Capabilities & Limitations

### Capabilities

- Generates concise, publication-ready academic titles from scientific abstracts.  
- Learns to identify the core idea of long, complex abstracts.  
- Follows structured, instruction-based prompts using the Gemma-3 chat format.  
- Efficient inference thanks to 4-bit quantization and LoRA adaptation.  
- Performs reliably across a wide variety of scientific domains.

### Limitations

- Output quality depends heavily on the clarity and structure of the abstract; vague inputs may produce generic titles.  
- The model does not verify factual accuracy or scientific correctness.  
- Performance may vary for highly domain-specific or expert-level fields requiring specialized terminology.  
- This model is only **1B parameters**, significantly smaller than larger Gemma or Llama variants, which means it may not always capture deep semantic details or produce titles as accurate as bigger models.  
- The model is optimized for academic summarization and may not generalize well to creative or conversational tasks.

## Credits

This project was made possible thanks to several key open-source tools,
frameworks, and community contributors:

- **Unsloth** — for enabling efficient 4-bit training, LoRA integration,
  memory-optimized model loading, and the Gemma-3 chat template utilities.
  Their tooling was essential for making multi-batch fine-tuning feasible
  under limited hardware conditions.

- **Hugging Face TRL** — for providing the SFTTrainer and the
  response-only training workflow, allowing the model to focus exclusively
  on generating high-quality titles.

- **Google DeepMind** — for releasing the Gemma-3 family of models,
  offering a powerful instruction-tuned foundation suitable for scientific
  summarization and academic tasks.

- **Hugging Face Transformers / Datasets** — for model loading,
  tokenization pipelines, and large-scale dataset management.

- **Google Colab** — for generously providing free access to high-performance
  GPUs to the community. Their platform makes it possible for independent
  researchers, students, and developers to experiment with advanced
  large-language-model training workflows without requiring specialized
  hardware.

Special appreciation goes to the broader open-source community for maintaining
the tools, documentation, and shared knowledge that make projects like this
possible.

## License

This model follows the licensing terms of its upstream foundation models and
tooling:

- **Base Model License:** Inherits the license of  
  `unsloth/gemma-3-1b-it`, which itself is based on Google’s *Gemma 3*
  licensing terms.

- **Gemma 3 License:** Usage must comply with the Gemma family license
  provided by Google DeepMind. For details, refer to the official documentation
  and license terms published by Google.

- **Training Frameworks:**  
  - Unsloth (training optimizations, LoRA, 4-bit loading)  
  - Hugging Face TRL (SFTTrainer)  
  - Hugging Face Transformers & Datasets  

All these tools are used under their respective open-source licenses.

**Important:**  
This fine-tuned model is provided *as-is* with no additional warranties. Users
are responsible for ensuring compliance with applicable licenses and usage
restrictions when deploying or redistributing the model.

For complete details, please consult:

- Google Gemma License  
- Unsloth Documentation & License  
- Hugging Face Transformers License  

## Intended Use

This model is intended for generating concise academic titles from research
abstracts. It is **not** designed for general conversation, creative writing,
or factual verification.

## Safety

The model may reflect biases present in academic text sources. Outputs should
be reviewed by humans before publication.