File size: 10,084 Bytes
17ec7c0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 |
---
base_model: unsloth/gemma-3-1b-it
library_name: transformers
tags:
- gemma-3
- fine-tuning
- sft
- unsloth
- academic-title-generation
- lora
- 4bit
- chat-template
model_name: gemma3_1b_title_generator
---
<center>
# **Gemma 3 — 1B Academic Title Generator**
<img src="https://www.geeky-gadgets.com/wp-content/uploads/2025/03/google-gemma-3-advanced-ai-models.webp" width="600"/>
</center>
---
## Overview
**gemma3_1b_title_generator** is a fine-tuned version of `unsloth/gemma-3-1b-it`, optimized specifically for generating **academic paper titles** from scientific abstracts.
The training process adapts Gemma-3's chat-format behavior to perform highly focused title generation. The model was fine-tuned using a **multi-batch training pipeline** due to hardware limitations, leveraging Unsloth’s efficient 4-bit loading and LoRA adapters.
This results in a lightweight, fast, and domain-specialized model capable of producing concise, coherent, and academically accurate titles.
---
## Dataset & Preprocessing
Training data consists of scientific **abstract → title** pairs.
Because of memory constraints, the dataset was processed in **sequential batches**, each integrated into the model through incremental checkpoints. This collaborative batch-training approach was made possible thanks to **Unsloth’s lightweight fine-tuning tools**.
Each data sample was converted into a **Gemma-3 style chat conversation**, allowing the model to learn the title as the model's response:
```python
def format_dataset_for_chat(example):
messages = [
{"role": "user", "content": "Generate a title for the following abstract:\n" + example["abstract"]},
{"role": "model", "content": example["title"]}
]
example["text"] = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=False
).removeprefix("<bos>")
return example
```
## Chat Format
Gemma-3 uses a structured multi-turn dialog format.
Each training example is converted into a conversation where:
- The **user** provides the abstract.
- The **model** outputs the title.
The structure follows the Gemma-3 chat template:
<bos><start_of_turn>user
... user content ...
<end_of_turn>
<start_of_turn>model
... model content ...
<end_of_turn>
This formatting is automatically created using Unsloth’s
`tokenizer.apply_chat_template()`.
Below is the preprocessing function used during fine-tuning:
```python
def format_dataset_for_chat(example):
messages = [
{"role": "user", "content": "Generate a title for the following abstract:\n" + example["abstract"]},
{"role": "model", "content": example["title"]}
]
example["text"] = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=False
).removeprefix("<bos>")
return example
```
## Training Configuration
Fine-tuning was performed using the SFTTrainer from TRL, combined with Unsloth’s
efficient 4-bit loading and LoRA adaptation layers. The training process followed
a multi-batch strategy due to hardware limitations, with incremental checkpoint
loading supported by Unsloth.
### Key Training Settings
- Model: unsloth/gemma-3-1b-it
- Precision: 4-bit (QLoRA)
- Method: Supervised Fine-Tuning (SFT)
- LoRA: Enabled for attention and MLP modules
- Sequence length: 2048 tokens
- Optimizer: AdamW (8-bit)
- Scheduler: cosine
- Strategy: multi-batch training with checkpoint continuation
- Tokenizer: Gemma-3 chat template applied through Unsloth
### Response-Only Learning
To ensure the model learns **only the title** (the model output) and does not
memorize the user prompt (the abstract), response-only loss masking was applied:
```python
trainer = train_on_responses_only(
trainer,
instruction_part = "<start_of_turn>user\n", # User turn with the abstract
response_part = "<start_of_turn>model\n", # Model turn with the generated title
)
```
This enforces that gradients flow exclusively through the model's output portion
of the chat sequence, improving instruction-following consistency and ensuring
that the LoRA adapters specialize in generating high-quality academic titles
instead of learning or reproducing the user prompt.
### Training Behavior
- LoRA significantly reduces VRAM usage while maintaining strong output quality.
- Unsloth manages efficient 4-bit quantization, chat-template formatting, and
checkpoint handling.
- Multi-batch training allows large datasets to be processed even with limited
hardware resources.
- Validation steps are used to monitor loss and adjust training dynamics.
## 🚀 Quick Usage Example
Before running inference, make sure all required libraries are installed:
```bash
!pip install -q transformers accelerate torch
!pip install -q -U bitsandbytes
# Only if your setup or model requires Unsloth for loading:
!pip install -q unsloth
```
Below is a clean and ready-to-run example demonstrating how to generate an
academic title using the Gemma-3 chat template:
```python
from transformers import pipeline
import torch
pipe = pipeline(
"text-generation",
model="beta3/gemma3_1b_title_generator",
dtype=torch.bfloat16
)
# Example abstract for title generation
abstract = """
Transformer-based architectures have demonstrated strong performance in tasks
involving reasoning, scientific understanding, and text generation. Producing
concise academic titles from long abstracts, however, remains a non-trivial task.
"""
# Construct the Gemma-3 chat-format prompt manually
chat_template_prompt = (
"<bos>"
"<start_of_turn>user\n"
"Generate a simple title for the following abstract:\n"
f"{abstract}\n"
"<end_of_turn>\n"
"<start_of_turn>model\n"
)
# Generate the title
result = pipe(
chat_template_prompt,
max_new_tokens=32, # Number of tokens to generate
do_sample=True, # Enables sampling for more creative outputs
temperature=0.7, # Controls generation randomness
top_p=0.9, # Nucleus sampling
return_full_text=False
)[0]["generated_text"]
print("Generated title:", result)
```
This example reproduces the exact Gemma-3 chat behavior and produces clean,
publication-ready academic titles.
## Capabilities & Limitations
### Capabilities
- Generates concise, publication-ready academic titles from scientific abstracts.
- Learns to identify the core idea of long, complex abstracts.
- Follows structured, instruction-based prompts using the Gemma-3 chat format.
- Efficient inference thanks to 4-bit quantization and LoRA adaptation.
- Performs reliably across a wide variety of scientific domains.
### Limitations
- Output quality depends heavily on the clarity and structure of the abstract; vague inputs may produce generic titles.
- The model does not verify factual accuracy or scientific correctness.
- Performance may vary for highly domain-specific or expert-level fields requiring specialized terminology.
- This model is only **1B parameters**, significantly smaller than larger Gemma or Llama variants, which means it may not always capture deep semantic details or produce titles as accurate as bigger models.
- The model is optimized for academic summarization and may not generalize well to creative or conversational tasks.
## Credits
This project was made possible thanks to several key open-source tools,
frameworks, and community contributors:
- **Unsloth** — for enabling efficient 4-bit training, LoRA integration,
memory-optimized model loading, and the Gemma-3 chat template utilities.
Their tooling was essential for making multi-batch fine-tuning feasible
under limited hardware conditions.
- **Hugging Face TRL** — for providing the SFTTrainer and the
response-only training workflow, allowing the model to focus exclusively
on generating high-quality titles.
- **Google DeepMind** — for releasing the Gemma-3 family of models,
offering a powerful instruction-tuned foundation suitable for scientific
summarization and academic tasks.
- **Hugging Face Transformers / Datasets** — for model loading,
tokenization pipelines, and large-scale dataset management.
- **Google Colab** — for generously providing free access to high-performance
GPUs to the community. Their platform makes it possible for independent
researchers, students, and developers to experiment with advanced
large-language-model training workflows without requiring specialized
hardware.
Special appreciation goes to the broader open-source community for maintaining
the tools, documentation, and shared knowledge that make projects like this
possible.
## License
This model follows the licensing terms of its upstream foundation models and
tooling:
- **Base Model License:** Inherits the license of
`unsloth/gemma-3-1b-it`, which itself is based on Google’s *Gemma 3*
licensing terms.
- **Gemma 3 License:** Usage must comply with the Gemma family license
provided by Google DeepMind. For details, refer to the official documentation
and license terms published by Google.
- **Training Frameworks:**
- Unsloth (training optimizations, LoRA, 4-bit loading)
- Hugging Face TRL (SFTTrainer)
- Hugging Face Transformers & Datasets
All these tools are used under their respective open-source licenses.
**Important:**
This fine-tuned model is provided *as-is* with no additional warranties. Users
are responsible for ensuring compliance with applicable licenses and usage
restrictions when deploying or redistributing the model.
For complete details, please consult:
- Google Gemma License
- Unsloth Documentation & License
- Hugging Face Transformers License
## Intended Use
This model is intended for generating concise academic titles from research
abstracts. It is **not** designed for general conversation, creative writing,
or factual verification.
## Safety
The model may reflect biases present in academic text sources. Outputs should
be reviewed by humans before publication.
|