File size: 10,084 Bytes

17ec7c0

---
base_model: unsloth/gemma-3-1b-it
library_name: transformers
tags:
- gemma-3
- fine-tuning
- sft
- unsloth
- academic-title-generation
- lora
- 4bit
- chat-template
model_name: gemma3_1b_title_generator
---

<center>

# **Gemma 3 — 1B Academic Title Generator**

<img src="https://www.geeky-gadgets.com/wp-content/uploads/2025/03/google-gemma-3-advanced-ai-models.webp" width="600"/>

</center>

---

## Overview

**gemma3_1b_title_generator** is a fine-tuned version of `unsloth/gemma-3-1b-it`, optimized specifically for generating **academic paper titles** from scientific abstracts.

The training process adapts Gemma-3's chat-format behavior to perform highly focused title generation. The model was fine-tuned using a **multi-batch training pipeline** due to hardware limitations, leveraging Unsloth’s efficient 4-bit loading and LoRA adapters.

This results in a lightweight, fast, and domain-specialized model capable of producing concise, coherent, and academically accurate titles.

---

## Dataset & Preprocessing

Training data consists of scientific **abstract → title** pairs.  
Because of memory constraints, the dataset was processed in **sequential batches**, each integrated into the model through incremental checkpoints. This collaborative batch-training approach was made possible thanks to **Unsloth’s lightweight fine-tuning tools**.

Each data sample was converted into a **Gemma-3 style chat conversation**, allowing the model to learn the title as the model's response:

```python
def format_dataset_for_chat(example):
    messages = [
        {"role": "user",  "content": "Generate a title for the following abstract:\n" + example["abstract"]},
        {"role": "model", "content": example["title"]}
    ]

    example["text"] = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=False
    ).removeprefix("<bos>")

    return example
```

## Chat Format

Gemma-3 uses a structured multi-turn dialog format.  
Each training example is converted into a conversation where:

- The **user** provides the abstract.
- The **model** outputs the title.

The structure follows the Gemma-3 chat template:

<bos><start_of_turn>user
... user content ...
<end_of_turn>
<start_of_turn>model
... model content ...
<end_of_turn>

This formatting is automatically created using Unsloth’s
`tokenizer.apply_chat_template()`.

Below is the preprocessing function used during fine-tuning:

```python
def format_dataset_for_chat(example):
    messages = [
        {"role": "user",  "content": "Generate a title for the following abstract:\n" + example["abstract"]},
        {"role": "model", "content": example["title"]}
    ]

    example["text"] = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=False
    ).removeprefix("<bos>")

    return example
```
## Training Configuration

Fine-tuning was performed using the SFTTrainer from TRL, combined with Unsloth’s
efficient 4-bit loading and LoRA adaptation layers. The training process followed
a multi-batch strategy due to hardware limitations, with incremental checkpoint
loading supported by Unsloth.

### Key Training Settings

- Model: unsloth/gemma-3-1b-it  
- Precision: 4-bit (QLoRA)  
- Method: Supervised Fine-Tuning (SFT)  
- LoRA: Enabled for attention and MLP modules  
- Sequence length: 2048 tokens  
- Optimizer: AdamW (8-bit)  
- Scheduler: cosine  
- Strategy: multi-batch training with checkpoint continuation  
- Tokenizer: Gemma-3 chat template applied through Unsloth  

### Response-Only Learning

To ensure the model learns **only the title** (the model output) and does not 
memorize the user prompt (the abstract), response-only loss masking was applied:

```python
trainer = train_on_responses_only(
    trainer,
    instruction_part = "<start_of_turn>user\n",   # User turn with the abstract
    response_part    = "<start_of_turn>model\n",  # Model turn with the generated title
)
```

This enforces that gradients flow exclusively through the model's output portion
of the chat sequence, improving instruction-following consistency and ensuring
that the LoRA adapters specialize in generating high-quality academic titles
instead of learning or reproducing the user prompt.

### Training Behavior

- LoRA significantly reduces VRAM usage while maintaining strong output quality.  
- Unsloth manages efficient 4-bit quantization, chat-template formatting, and
  checkpoint handling.  
- Multi-batch training allows large datasets to be processed even with limited
  hardware resources.  
- Validation steps are used to monitor loss and adjust training dynamics.  

## 🚀 Quick Usage Example

Before running inference, make sure all required libraries are installed:

```bash
!pip install -q transformers accelerate torch
!pip install -q -U bitsandbytes
# Only if your setup or model requires Unsloth for loading:
!pip install -q unsloth
```

Below is a clean and ready-to-run example demonstrating how to generate an
academic title using the Gemma-3 chat template:

```python
from transformers import pipeline
import torch

pipe = pipeline(
    "text-generation",
    model="beta3/gemma3_1b_title_generator",  
    dtype=torch.bfloat16  
)

# Example abstract for title generation
abstract = """
Transformer-based architectures have demonstrated strong performance in tasks
involving reasoning, scientific understanding, and text generation. Producing
concise academic titles from long abstracts, however, remains a non-trivial task.
"""

# Construct the Gemma-3 chat-format prompt manually
chat_template_prompt = (
    "<bos>"
    "<start_of_turn>user\n"
    "Generate a simple title for the following abstract:\n"
    f"{abstract}\n"
    "<end_of_turn>\n"
    "<start_of_turn>model\n"
)

# Generate the title
result = pipe(
    chat_template_prompt,
    max_new_tokens=32,   # Number of tokens to generate
    do_sample=True,      # Enables sampling for more creative outputs
    temperature=0.7,     # Controls generation randomness
    top_p=0.9,           # Nucleus sampling
    return_full_text=False
)[0]["generated_text"]

print("Generated title:", result)
```

This example reproduces the exact Gemma-3 chat behavior and produces clean,
publication-ready academic titles.

## Capabilities & Limitations

### Capabilities

- Generates concise, publication-ready academic titles from scientific abstracts.  
- Learns to identify the core idea of long, complex abstracts.  
- Follows structured, instruction-based prompts using the Gemma-3 chat format.  
- Efficient inference thanks to 4-bit quantization and LoRA adaptation.  
- Performs reliably across a wide variety of scientific domains.

### Limitations

- Output quality depends heavily on the clarity and structure of the abstract; vague inputs may produce generic titles.  
- The model does not verify factual accuracy or scientific correctness.  
- Performance may vary for highly domain-specific or expert-level fields requiring specialized terminology.  
- This model is only **1B parameters**, significantly smaller than larger Gemma or Llama variants, which means it may not always capture deep semantic details or produce titles as accurate as bigger models.  
- The model is optimized for academic summarization and may not generalize well to creative or conversational tasks.

## Credits

This project was made possible thanks to several key open-source tools,
frameworks, and community contributors:

- **Unsloth** — for enabling efficient 4-bit training, LoRA integration,
  memory-optimized model loading, and the Gemma-3 chat template utilities.
  Their tooling was essential for making multi-batch fine-tuning feasible
  under limited hardware conditions.

- **Hugging Face TRL** — for providing the SFTTrainer and the
  response-only training workflow, allowing the model to focus exclusively
  on generating high-quality titles.

- **Google DeepMind** — for releasing the Gemma-3 family of models,
  offering a powerful instruction-tuned foundation suitable for scientific
  summarization and academic tasks.

- **Hugging Face Transformers / Datasets** — for model loading,
  tokenization pipelines, and large-scale dataset management.

- **Google Colab** — for generously providing free access to high-performance
  GPUs to the community. Their platform makes it possible for independent
  researchers, students, and developers to experiment with advanced
  large-language-model training workflows without requiring specialized
  hardware.

Special appreciation goes to the broader open-source community for maintaining
the tools, documentation, and shared knowledge that make projects like this
possible.

## License

This model follows the licensing terms of its upstream foundation models and
tooling:

- **Base Model License:** Inherits the license of  
  `unsloth/gemma-3-1b-it`, which itself is based on Google’s *Gemma 3*
  licensing terms.

- **Gemma 3 License:** Usage must comply with the Gemma family license
  provided by Google DeepMind. For details, refer to the official documentation
  and license terms published by Google.

- **Training Frameworks:**  
  - Unsloth (training optimizations, LoRA, 4-bit loading)  
  - Hugging Face TRL (SFTTrainer)  
  - Hugging Face Transformers & Datasets  

All these tools are used under their respective open-source licenses.

**Important:**  
This fine-tuned model is provided *as-is* with no additional warranties. Users
are responsible for ensuring compliance with applicable licenses and usage
restrictions when deploying or redistributing the model.

For complete details, please consult:

- Google Gemma License  
- Unsloth Documentation & License  
- Hugging Face Transformers License  

## Intended Use

This model is intended for generating concise academic titles from research
abstracts. It is **not** designed for general conversation, creative writing,
or factual verification.

## Safety

The model may reflect biases present in academic text sources. Outputs should
be reviewed by humans before publication.