Upload fine-tuned model directly from Google Drive

17ec7c0 verified about 1 month ago

10.1 kB

	---
	base_model: unsloth/gemma-3-1b-it
	library_name: transformers
	tags:
	- gemma-3
	- fine-tuning
	- sft
	- unsloth
	- academic-title-generation
	- lora
	- 4bit
	- chat-template
	model_name: gemma3_1b_title_generator
	---

	<center>

	# Gemma 3 — 1B Academic Title Generator

	<img src="https://www.geeky-gadgets.com/wp-content/uploads/2025/03/google-gemma-3-advanced-ai-models.webp" width="600"/>

	</center>

	---

	## Overview

	gemma3_1b_title_generator is a fine-tuned version of `unsloth/gemma-3-1b-it`, optimized specifically for generating academic paper titles from scientific abstracts.

	The training process adapts Gemma-3's chat-format behavior to perform highly focused title generation. The model was fine-tuned using a multi-batch training pipeline due to hardware limitations, leveraging Unsloth’s efficient 4-bit loading and LoRA adapters.

	This results in a lightweight, fast, and domain-specialized model capable of producing concise, coherent, and academically accurate titles.

	---

	## Dataset & Preprocessing

	Training data consists of scientific abstract → title pairs.
	Because of memory constraints, the dataset was processed in sequential batches, each integrated into the model through incremental checkpoints. This collaborative batch-training approach was made possible thanks to Unsloth’s lightweight fine-tuning tools.

	Each data sample was converted into a Gemma-3 style chat conversation, allowing the model to learn the title as the model's response:

	```python
	def format_dataset_for_chat(example):
	messages = [
	{"role": "user", "content": "Generate a title for the following abstract:\n" + example["abstract"]},
	{"role": "model", "content": example["title"]}
	]

	example["text"] = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=False
	).removeprefix("<bos>")

	return example
	```

	## Chat Format

	Gemma-3 uses a structured multi-turn dialog format.
	Each training example is converted into a conversation where:

	- The user provides the abstract.
	- The model outputs the title.

	The structure follows the Gemma-3 chat template:

	<bos><start_of_turn>user
	... user content ...
	<end_of_turn>
	<start_of_turn>model
	... model content ...
	<end_of_turn>

	This formatting is automatically created using Unsloth’s
	`tokenizer.apply_chat_template()`.

	Below is the preprocessing function used during fine-tuning:

	```python
	def format_dataset_for_chat(example):
	messages = [
	{"role": "user", "content": "Generate a title for the following abstract:\n" + example["abstract"]},
	{"role": "model", "content": example["title"]}
	]

	example["text"] = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=False
	).removeprefix("<bos>")

	return example
	```
	## Training Configuration

	Fine-tuning was performed using the SFTTrainer from TRL, combined with Unsloth’s
	efficient 4-bit loading and LoRA adaptation layers. The training process followed
	a multi-batch strategy due to hardware limitations, with incremental checkpoint
	loading supported by Unsloth.

	### Key Training Settings

	- Model: unsloth/gemma-3-1b-it
	- Precision: 4-bit (QLoRA)
	- Method: Supervised Fine-Tuning (SFT)
	- LoRA: Enabled for attention and MLP modules
	- Sequence length: 2048 tokens
	- Optimizer: AdamW (8-bit)
	- Scheduler: cosine
	- Strategy: multi-batch training with checkpoint continuation
	- Tokenizer: Gemma-3 chat template applied through Unsloth

	### Response-Only Learning

	To ensure the model learns only the title (the model output) and does not
	memorize the user prompt (the abstract), response-only loss masking was applied:

	```python
	trainer = train_on_responses_only(
	trainer,
	instruction_part = "<start_of_turn>user\n", # User turn with the abstract
	response_part = "<start_of_turn>model\n", # Model turn with the generated title
	)
	```

	This enforces that gradients flow exclusively through the model's output portion
	of the chat sequence, improving instruction-following consistency and ensuring
	that the LoRA adapters specialize in generating high-quality academic titles
	instead of learning or reproducing the user prompt.

	### Training Behavior

	- LoRA significantly reduces VRAM usage while maintaining strong output quality.
	- Unsloth manages efficient 4-bit quantization, chat-template formatting, and
	checkpoint handling.
	- Multi-batch training allows large datasets to be processed even with limited
	hardware resources.
	- Validation steps are used to monitor loss and adjust training dynamics.

	## 🚀 Quick Usage Example

	Before running inference, make sure all required libraries are installed:

	```bash
	!pip install -q transformers accelerate torch
	!pip install -q -U bitsandbytes
	# Only if your setup or model requires Unsloth for loading:
	!pip install -q unsloth
	```

	Below is a clean and ready-to-run example demonstrating how to generate an
	academic title using the Gemma-3 chat template:

	```python
	from transformers import pipeline
	import torch

	pipe = pipeline(
	"text-generation",
	model="beta3/gemma3_1b_title_generator",
	dtype=torch.bfloat16
	)

	# Example abstract for title generation
	abstract = """
	Transformer-based architectures have demonstrated strong performance in tasks
	involving reasoning, scientific understanding, and text generation. Producing
	concise academic titles from long abstracts, however, remains a non-trivial task.
	"""

	# Construct the Gemma-3 chat-format prompt manually
	chat_template_prompt = (
	"<bos>"
	"<start_of_turn>user\n"
	"Generate a simple title for the following abstract:\n"
	f"{abstract}\n"
	"<end_of_turn>\n"
	"<start_of_turn>model\n"
	)

	# Generate the title
	result = pipe(
	chat_template_prompt,
	max_new_tokens=32, # Number of tokens to generate
	do_sample=True, # Enables sampling for more creative outputs
	temperature=0.7, # Controls generation randomness
	top_p=0.9, # Nucleus sampling
	return_full_text=False
	)[0]["generated_text"]

	print("Generated title:", result)
	```

	This example reproduces the exact Gemma-3 chat behavior and produces clean,
	publication-ready academic titles.

	## Capabilities & Limitations

	### Capabilities

	- Generates concise, publication-ready academic titles from scientific abstracts.
	- Learns to identify the core idea of long, complex abstracts.
	- Follows structured, instruction-based prompts using the Gemma-3 chat format.
	- Efficient inference thanks to 4-bit quantization and LoRA adaptation.
	- Performs reliably across a wide variety of scientific domains.

	### Limitations

	- Output quality depends heavily on the clarity and structure of the abstract; vague inputs may produce generic titles.
	- The model does not verify factual accuracy or scientific correctness.
	- Performance may vary for highly domain-specific or expert-level fields requiring specialized terminology.
	- This model is only 1B parameters, significantly smaller than larger Gemma or Llama variants, which means it may not always capture deep semantic details or produce titles as accurate as bigger models.
	- The model is optimized for academic summarization and may not generalize well to creative or conversational tasks.

	## Credits

	This project was made possible thanks to several key open-source tools,
	frameworks, and community contributors:

	- Unsloth — for enabling efficient 4-bit training, LoRA integration,
	memory-optimized model loading, and the Gemma-3 chat template utilities.
	Their tooling was essential for making multi-batch fine-tuning feasible
	under limited hardware conditions.

	- Hugging Face TRL — for providing the SFTTrainer and the
	response-only training workflow, allowing the model to focus exclusively
	on generating high-quality titles.

	- Google DeepMind — for releasing the Gemma-3 family of models,
	offering a powerful instruction-tuned foundation suitable for scientific
	summarization and academic tasks.

	- Hugging Face Transformers / Datasets — for model loading,
	tokenization pipelines, and large-scale dataset management.

	- Google Colab — for generously providing free access to high-performance
	GPUs to the community. Their platform makes it possible for independent
	researchers, students, and developers to experiment with advanced
	large-language-model training workflows without requiring specialized
	hardware.

	Special appreciation goes to the broader open-source community for maintaining
	the tools, documentation, and shared knowledge that make projects like this
	possible.

	## License

	This model follows the licensing terms of its upstream foundation models and
	tooling:

	- Base Model License: Inherits the license of
	`unsloth/gemma-3-1b-it`, which itself is based on Google’s Gemma 3
	licensing terms.

	- Gemma 3 License: Usage must comply with the Gemma family license
	provided by Google DeepMind. For details, refer to the official documentation
	and license terms published by Google.

	- Training Frameworks:
	- Unsloth (training optimizations, LoRA, 4-bit loading)
	- Hugging Face TRL (SFTTrainer)
	- Hugging Face Transformers & Datasets

	All these tools are used under their respective open-source licenses.

	Important:
	This fine-tuned model is provided as-is with no additional warranties. Users
	are responsible for ensuring compliance with applicable licenses and usage
	restrictions when deploying or redistributing the model.

	For complete details, please consult:

	- Google Gemma License
	- Unsloth Documentation & License
	- Hugging Face Transformers License

	## Intended Use

	This model is intended for generating concise academic titles from research
	abstracts. It is not designed for general conversation, creative writing,
	or factual verification.

	## Safety

	The model may reflect biases present in academic text sources. Outputs should
	be reviewed by humans before publication.