|
|
--- |
|
|
base_model: unsloth/gemma-3-1b-it |
|
|
library_name: transformers |
|
|
tags: |
|
|
- gemma-3 |
|
|
- fine-tuning |
|
|
- sft |
|
|
- unsloth |
|
|
- academic-title-generation |
|
|
- lora |
|
|
- 4bit |
|
|
- chat-template |
|
|
model_name: gemma3_1b_title_generator |
|
|
--- |
|
|
|
|
|
<center> |
|
|
|
|
|
# **Gemma 3 — 1B Academic Title Generator** |
|
|
|
|
|
<img src="https://www.geeky-gadgets.com/wp-content/uploads/2025/03/google-gemma-3-advanced-ai-models.webp" width="600"/> |
|
|
|
|
|
</center> |
|
|
|
|
|
--- |
|
|
|
|
|
## Overview |
|
|
|
|
|
**gemma3_1b_title_generator** is a fine-tuned version of `unsloth/gemma-3-1b-it`, optimized specifically for generating **academic paper titles** from scientific abstracts. |
|
|
|
|
|
The training process adapts Gemma-3's chat-format behavior to perform highly focused title generation. The model was fine-tuned using a **multi-batch training pipeline** due to hardware limitations, leveraging Unsloth’s efficient 4-bit loading and LoRA adapters. |
|
|
|
|
|
This results in a lightweight, fast, and domain-specialized model capable of producing concise, coherent, and academically accurate titles. |
|
|
|
|
|
--- |
|
|
|
|
|
## Dataset & Preprocessing |
|
|
|
|
|
Training data consists of scientific **abstract → title** pairs. |
|
|
Because of memory constraints, the dataset was processed in **sequential batches**, each integrated into the model through incremental checkpoints. This collaborative batch-training approach was made possible thanks to **Unsloth’s lightweight fine-tuning tools**. |
|
|
|
|
|
Each data sample was converted into a **Gemma-3 style chat conversation**, allowing the model to learn the title as the model's response: |
|
|
|
|
|
```python |
|
|
def format_dataset_for_chat(example): |
|
|
messages = [ |
|
|
{"role": "user", "content": "Generate a title for the following abstract:\n" + example["abstract"]}, |
|
|
{"role": "model", "content": example["title"]} |
|
|
] |
|
|
|
|
|
example["text"] = tokenizer.apply_chat_template( |
|
|
messages, |
|
|
tokenize=False, |
|
|
add_generation_prompt=False |
|
|
).removeprefix("<bos>") |
|
|
|
|
|
return example |
|
|
``` |
|
|
|
|
|
## Chat Format |
|
|
|
|
|
Gemma-3 uses a structured multi-turn dialog format. |
|
|
Each training example is converted into a conversation where: |
|
|
|
|
|
- The **user** provides the abstract. |
|
|
- The **model** outputs the title. |
|
|
|
|
|
The structure follows the Gemma-3 chat template: |
|
|
|
|
|
<bos><start_of_turn>user |
|
|
... user content ... |
|
|
<end_of_turn> |
|
|
<start_of_turn>model |
|
|
... model content ... |
|
|
<end_of_turn> |
|
|
|
|
|
This formatting is automatically created using Unsloth’s |
|
|
`tokenizer.apply_chat_template()`. |
|
|
|
|
|
Below is the preprocessing function used during fine-tuning: |
|
|
|
|
|
```python |
|
|
def format_dataset_for_chat(example): |
|
|
messages = [ |
|
|
{"role": "user", "content": "Generate a title for the following abstract:\n" + example["abstract"]}, |
|
|
{"role": "model", "content": example["title"]} |
|
|
] |
|
|
|
|
|
example["text"] = tokenizer.apply_chat_template( |
|
|
messages, |
|
|
tokenize=False, |
|
|
add_generation_prompt=False |
|
|
).removeprefix("<bos>") |
|
|
|
|
|
return example |
|
|
``` |
|
|
## Training Configuration |
|
|
|
|
|
Fine-tuning was performed using the SFTTrainer from TRL, combined with Unsloth’s |
|
|
efficient 4-bit loading and LoRA adaptation layers. The training process followed |
|
|
a multi-batch strategy due to hardware limitations, with incremental checkpoint |
|
|
loading supported by Unsloth. |
|
|
|
|
|
### Key Training Settings |
|
|
|
|
|
- Model: unsloth/gemma-3-1b-it |
|
|
- Precision: 4-bit (QLoRA) |
|
|
- Method: Supervised Fine-Tuning (SFT) |
|
|
- LoRA: Enabled for attention and MLP modules |
|
|
- Sequence length: 2048 tokens |
|
|
- Optimizer: AdamW (8-bit) |
|
|
- Scheduler: cosine |
|
|
- Strategy: multi-batch training with checkpoint continuation |
|
|
- Tokenizer: Gemma-3 chat template applied through Unsloth |
|
|
|
|
|
### Response-Only Learning |
|
|
|
|
|
To ensure the model learns **only the title** (the model output) and does not |
|
|
memorize the user prompt (the abstract), response-only loss masking was applied: |
|
|
|
|
|
```python |
|
|
trainer = train_on_responses_only( |
|
|
trainer, |
|
|
instruction_part = "<start_of_turn>user\n", # User turn with the abstract |
|
|
response_part = "<start_of_turn>model\n", # Model turn with the generated title |
|
|
) |
|
|
``` |
|
|
|
|
|
This enforces that gradients flow exclusively through the model's output portion |
|
|
of the chat sequence, improving instruction-following consistency and ensuring |
|
|
that the LoRA adapters specialize in generating high-quality academic titles |
|
|
instead of learning or reproducing the user prompt. |
|
|
|
|
|
### Training Behavior |
|
|
|
|
|
- LoRA significantly reduces VRAM usage while maintaining strong output quality. |
|
|
- Unsloth manages efficient 4-bit quantization, chat-template formatting, and |
|
|
checkpoint handling. |
|
|
- Multi-batch training allows large datasets to be processed even with limited |
|
|
hardware resources. |
|
|
- Validation steps are used to monitor loss and adjust training dynamics. |
|
|
|
|
|
## 🚀 Quick Usage Example |
|
|
|
|
|
Before running inference, make sure all required libraries are installed: |
|
|
|
|
|
```bash |
|
|
!pip install -q transformers accelerate torch |
|
|
!pip install -q -U bitsandbytes |
|
|
# Only if your setup or model requires Unsloth for loading: |
|
|
!pip install -q unsloth |
|
|
``` |
|
|
|
|
|
Below is a clean and ready-to-run example demonstrating how to generate an |
|
|
academic title using the Gemma-3 chat template: |
|
|
|
|
|
```python |
|
|
from transformers import pipeline |
|
|
import torch |
|
|
|
|
|
pipe = pipeline( |
|
|
"text-generation", |
|
|
model="beta3/gemma3_1b_title_generator", |
|
|
dtype=torch.bfloat16 |
|
|
) |
|
|
|
|
|
# Example abstract for title generation |
|
|
abstract = """ |
|
|
Transformer-based architectures have demonstrated strong performance in tasks |
|
|
involving reasoning, scientific understanding, and text generation. Producing |
|
|
concise academic titles from long abstracts, however, remains a non-trivial task. |
|
|
""" |
|
|
|
|
|
# Construct the Gemma-3 chat-format prompt manually |
|
|
chat_template_prompt = ( |
|
|
"<bos>" |
|
|
"<start_of_turn>user\n" |
|
|
"Generate a simple title for the following abstract:\n" |
|
|
f"{abstract}\n" |
|
|
"<end_of_turn>\n" |
|
|
"<start_of_turn>model\n" |
|
|
) |
|
|
|
|
|
# Generate the title |
|
|
result = pipe( |
|
|
chat_template_prompt, |
|
|
max_new_tokens=32, # Number of tokens to generate |
|
|
do_sample=True, # Enables sampling for more creative outputs |
|
|
temperature=0.7, # Controls generation randomness |
|
|
top_p=0.9, # Nucleus sampling |
|
|
return_full_text=False |
|
|
)[0]["generated_text"] |
|
|
|
|
|
print("Generated title:", result) |
|
|
``` |
|
|
|
|
|
This example reproduces the exact Gemma-3 chat behavior and produces clean, |
|
|
publication-ready academic titles. |
|
|
|
|
|
## Capabilities & Limitations |
|
|
|
|
|
### Capabilities |
|
|
|
|
|
- Generates concise, publication-ready academic titles from scientific abstracts. |
|
|
- Learns to identify the core idea of long, complex abstracts. |
|
|
- Follows structured, instruction-based prompts using the Gemma-3 chat format. |
|
|
- Efficient inference thanks to 4-bit quantization and LoRA adaptation. |
|
|
- Performs reliably across a wide variety of scientific domains. |
|
|
|
|
|
### Limitations |
|
|
|
|
|
- Output quality depends heavily on the clarity and structure of the abstract; vague inputs may produce generic titles. |
|
|
- The model does not verify factual accuracy or scientific correctness. |
|
|
- Performance may vary for highly domain-specific or expert-level fields requiring specialized terminology. |
|
|
- This model is only **1B parameters**, significantly smaller than larger Gemma or Llama variants, which means it may not always capture deep semantic details or produce titles as accurate as bigger models. |
|
|
- The model is optimized for academic summarization and may not generalize well to creative or conversational tasks. |
|
|
|
|
|
## Credits |
|
|
|
|
|
This project was made possible thanks to several key open-source tools, |
|
|
frameworks, and community contributors: |
|
|
|
|
|
- **Unsloth** — for enabling efficient 4-bit training, LoRA integration, |
|
|
memory-optimized model loading, and the Gemma-3 chat template utilities. |
|
|
Their tooling was essential for making multi-batch fine-tuning feasible |
|
|
under limited hardware conditions. |
|
|
|
|
|
- **Hugging Face TRL** — for providing the SFTTrainer and the |
|
|
response-only training workflow, allowing the model to focus exclusively |
|
|
on generating high-quality titles. |
|
|
|
|
|
- **Google DeepMind** — for releasing the Gemma-3 family of models, |
|
|
offering a powerful instruction-tuned foundation suitable for scientific |
|
|
summarization and academic tasks. |
|
|
|
|
|
- **Hugging Face Transformers / Datasets** — for model loading, |
|
|
tokenization pipelines, and large-scale dataset management. |
|
|
|
|
|
- **Google Colab** — for generously providing free access to high-performance |
|
|
GPUs to the community. Their platform makes it possible for independent |
|
|
researchers, students, and developers to experiment with advanced |
|
|
large-language-model training workflows without requiring specialized |
|
|
hardware. |
|
|
|
|
|
Special appreciation goes to the broader open-source community for maintaining |
|
|
the tools, documentation, and shared knowledge that make projects like this |
|
|
possible. |
|
|
|
|
|
## License |
|
|
|
|
|
This model follows the licensing terms of its upstream foundation models and |
|
|
tooling: |
|
|
|
|
|
- **Base Model License:** Inherits the license of |
|
|
`unsloth/gemma-3-1b-it`, which itself is based on Google’s *Gemma 3* |
|
|
licensing terms. |
|
|
|
|
|
- **Gemma 3 License:** Usage must comply with the Gemma family license |
|
|
provided by Google DeepMind. For details, refer to the official documentation |
|
|
and license terms published by Google. |
|
|
|
|
|
- **Training Frameworks:** |
|
|
- Unsloth (training optimizations, LoRA, 4-bit loading) |
|
|
- Hugging Face TRL (SFTTrainer) |
|
|
- Hugging Face Transformers & Datasets |
|
|
|
|
|
All these tools are used under their respective open-source licenses. |
|
|
|
|
|
**Important:** |
|
|
This fine-tuned model is provided *as-is* with no additional warranties. Users |
|
|
are responsible for ensuring compliance with applicable licenses and usage |
|
|
restrictions when deploying or redistributing the model. |
|
|
|
|
|
For complete details, please consult: |
|
|
|
|
|
- Google Gemma License |
|
|
- Unsloth Documentation & License |
|
|
- Hugging Face Transformers License |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
This model is intended for generating concise academic titles from research |
|
|
abstracts. It is **not** designed for general conversation, creative writing, |
|
|
or factual verification. |
|
|
|
|
|
## Safety |
|
|
|
|
|
The model may reflect biases present in academic text sources. Outputs should |
|
|
be reviewed by humans before publication. |
|
|
|
|
|
|