Upload fine-tuned model directly from Google Drive

Browse files

Files changed (15) hide show

.gitattributes +1 -0
README.md +289 -3
adapter_config.json +38 -0
adapter_model.safetensors +3 -0
added_tokens.json +3 -0
chat_template.jinja +47 -0
optimizer.pt +3 -0
rng_state.pth +3 -0
scheduler.pt +3 -0
special_tokens_map.json +33 -0
tokenizer.json +3 -0
tokenizer.model +3 -0
tokenizer_config.json +0 -0
trainer_state.json +0 -0
training_args.bin +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,289 @@
----
-license: gemma
----

+---
+base_model: unsloth/gemma-3-1b-it
+library_name: transformers
+tags:
+- gemma-3
+- fine-tuning
+- sft
+- unsloth
+- academic-title-generation
+- lora
+- 4bit
+- chat-template
+model_name: gemma3_1b_title_generator
+---
+<center>
+# **Gemma 3 — 1B Academic Title Generator**
+<img src="https://www.geeky-gadgets.com/wp-content/uploads/2025/03/google-gemma-3-advanced-ai-models.webp" width="600"/>
+</center>
+---
+## Overview
+**gemma3_1b_title_generator** is a fine-tuned version of `unsloth/gemma-3-1b-it`, optimized specifically for generating **academic paper titles** from scientific abstracts.
+The training process adapts Gemma-3's chat-format behavior to perform highly focused title generation. The model was fine-tuned using a **multi-batch training pipeline** due to hardware limitations, leveraging Unsloth’s efficient 4-bit loading and LoRA adapters.
+This results in a lightweight, fast, and domain-specialized model capable of producing concise, coherent, and academically accurate titles.
+---
+## Dataset & Preprocessing
+Training data consists of scientific **abstract → title** pairs.
+Because of memory constraints, the dataset was processed in **sequential batches**, each integrated into the model through incremental checkpoints. This collaborative batch-training approach was made possible thanks to **Unsloth’s lightweight fine-tuning tools**.
+Each data sample was converted into a **Gemma-3 style chat conversation**, allowing the model to learn the title as the model's response:
+```python
+def format_dataset_for_chat(example):
+    messages = [
+        {"role": "user",  "content": "Generate a title for the following abstract:\n" + example["abstract"]},
+        {"role": "model", "content": example["title"]}
+    ]
+    example["text"] = tokenizer.apply_chat_template(
+        messages,
+        tokenize=False,
+        add_generation_prompt=False
+    ).removeprefix("<bos>")
+    return example
+```
+## Chat Format
+Gemma-3 uses a structured multi-turn dialog format.
+Each training example is converted into a conversation where:
+- The **user** provides the abstract.
+- The **model** outputs the title.
+The structure follows the Gemma-3 chat template:
+<bos><start_of_turn>user
+... user content ...
+<end_of_turn>
+<start_of_turn>model
+... model content ...
+<end_of_turn>
+This formatting is automatically created using Unsloth’s
+`tokenizer.apply_chat_template()`.
+Below is the preprocessing function used during fine-tuning:
+```python
+def format_dataset_for_chat(example):
+    messages = [
+        {"role": "user",  "content": "Generate a title for the following abstract:\n" + example["abstract"]},
+        {"role": "model", "content": example["title"]}
+    ]
+    example["text"] = tokenizer.apply_chat_template(
+        messages,
+        tokenize=False,
+        add_generation_prompt=False
+    ).removeprefix("<bos>")
+    return example
+```
+## Training Configuration
+Fine-tuning was performed using the SFTTrainer from TRL, combined with Unsloth’s
+efficient 4-bit loading and LoRA adaptation layers. The training process followed
+a multi-batch strategy due to hardware limitations, with incremental checkpoint
+loading supported by Unsloth.
+### Key Training Settings
+- Model: unsloth/gemma-3-1b-it
+- Precision: 4-bit (QLoRA)
+- Method: Supervised Fine-Tuning (SFT)
+- LoRA: Enabled for attention and MLP modules
+- Sequence length: 2048 tokens
+- Optimizer: AdamW (8-bit)
+- Scheduler: cosine
+- Strategy: multi-batch training with checkpoint continuation
+- Tokenizer: Gemma-3 chat template applied through Unsloth
+### Response-Only Learning
+To ensure the model learns **only the title** (the model output) and does not
+memorize the user prompt (the abstract), response-only loss masking was applied:
+```python
+trainer = train_on_responses_only(
+    trainer,
+    instruction_part = "<start_of_turn>user\n",   # User turn with the abstract
+    response_part    = "<start_of_turn>model\n",  # Model turn with the generated title
+)
+```
+This enforces that gradients flow exclusively through the model's output portion
+of the chat sequence, improving instruction-following consistency and ensuring
+that the LoRA adapters specialize in generating high-quality academic titles
+instead of learning or reproducing the user prompt.
+### Training Behavior
+- LoRA significantly reduces VRAM usage while maintaining strong output quality.
+- Unsloth manages efficient 4-bit quantization, chat-template formatting, and
+  checkpoint handling.
+- Multi-batch training allows large datasets to be processed even with limited
+  hardware resources.
+- Validation steps are used to monitor loss and adjust training dynamics.
+## 🚀 Quick Usage Example
+Before running inference, make sure all required libraries are installed:
+```bash
+!pip install -q transformers accelerate torch
+!pip install -q -U bitsandbytes
+# Only if your setup or model requires Unsloth for loading:
+!pip install -q unsloth
+```
+Below is a clean and ready-to-run example demonstrating how to generate an
+academic title using the Gemma-3 chat template:
+```python
+from transformers import pipeline
+import torch
+pipe = pipeline(
+    "text-generation",
+    model="beta3/gemma3_1b_title_generator",
+    dtype=torch.bfloat16
+)
+# Example abstract for title generation
+abstract = """
+Transformer-based architectures have demonstrated strong performance in tasks
+involving reasoning, scientific understanding, and text generation. Producing
+concise academic titles from long abstracts, however, remains a non-trivial task.
+"""
+# Construct the Gemma-3 chat-format prompt manually
+chat_template_prompt = (
+    "<bos>"
+    "<start_of_turn>user\n"
+    "Generate a simple title for the following abstract:\n"
+    f"{abstract}\n"
+    "<end_of_turn>\n"
+    "<start_of_turn>model\n"
+)
+# Generate the title
+result = pipe(
+    chat_template_prompt,
+    max_new_tokens=32,   # Number of tokens to generate
+    do_sample=True,      # Enables sampling for more creative outputs
+    temperature=0.7,     # Controls generation randomness
+    top_p=0.9,           # Nucleus sampling
+    return_full_text=False
+)[0]["generated_text"]
+print("Generated title:", result)
+```
+This example reproduces the exact Gemma-3 chat behavior and produces clean,
+publication-ready academic titles.
+## Capabilities & Limitations
+### Capabilities
+- Generates concise, publication-ready academic titles from scientific abstracts.
+- Learns to identify the core idea of long, complex abstracts.
+- Follows structured, instruction-based prompts using the Gemma-3 chat format.
+- Efficient inference thanks to 4-bit quantization and LoRA adaptation.
+- Performs reliably across a wide variety of scientific domains.
+### Limitations
+- Output quality depends heavily on the clarity and structure of the abstract; vague inputs may produce generic titles.
+- The model does not verify factual accuracy or scientific correctness.
+- Performance may vary for highly domain-specific or expert-level fields requiring specialized terminology.
+- This model is only **1B parameters**, significantly smaller than larger Gemma or Llama variants, which means it may not always capture deep semantic details or produce titles as accurate as bigger models.
+- The model is optimized for academic summarization and may not generalize well to creative or conversational tasks.
+## Credits
+This project was made possible thanks to several key open-source tools,
+frameworks, and community contributors:
+- **Unsloth** — for enabling efficient 4-bit training, LoRA integration,
+  memory-optimized model loading, and the Gemma-3 chat template utilities.
+  Their tooling was essential for making multi-batch fine-tuning feasible
+  under limited hardware conditions.
+- **Hugging Face TRL** — for providing the SFTTrainer and the
+  response-only training workflow, allowing the model to focus exclusively
+  on generating high-quality titles.
+- **Google DeepMind** — for releasing the Gemma-3 family of models,
+  offering a powerful instruction-tuned foundation suitable for scientific
+  summarization and academic tasks.
+- **Hugging Face Transformers / Datasets** — for model loading,
+  tokenization pipelines, and large-scale dataset management.
+- **Google Colab** — for generously providing free access to high-performance
+  GPUs to the community. Their platform makes it possible for independent
+  researchers, students, and developers to experiment with advanced
+  large-language-model training workflows without requiring specialized
+  hardware.
+Special appreciation goes to the broader open-source community for maintaining
+the tools, documentation, and shared knowledge that make projects like this
+possible.
+## License
+This model follows the licensing terms of its upstream foundation models and
+tooling:
+- **Base Model License:** Inherits the license of
+  `unsloth/gemma-3-1b-it`, which itself is based on Google’s *Gemma 3*
+  licensing terms.
+- **Gemma 3 License:** Usage must comply with the Gemma family license
+  provided by Google DeepMind. For details, refer to the official documentation
+  and license terms published by Google.
+- **Training Frameworks:**
+  - Unsloth (training optimizations, LoRA, 4-bit loading)
+  - Hugging Face TRL (SFTTrainer)
+  - Hugging Face Transformers & Datasets
+All these tools are used under their respective open-source licenses.
+**Important:**
+This fine-tuned model is provided *as-is* with no additional warranties. Users
+are responsible for ensuring compliance with applicable licenses and usage
+restrictions when deploying or redistributing the model.
+For complete details, please consult:
+- Google Gemma License
+- Unsloth Documentation & License
+- Hugging Face Transformers License
+## Intended Use
+This model is intended for generating concise academic titles from research
+abstracts. It is **not** designed for general conversation, creative writing,
+or factual verification.
+## Safety
+The model may reflect biases present in academic text sources. Outputs should
+be reviewed by humans before publication.

adapter_config.json ADDED Viewed

	@@ -0,0 +1,38 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": {
+    "base_model_class": "Gemma3ForCausalLM",
+    "parent_library": "transformers.models.gemma3.modeling_gemma3",
+    "unsloth_fixed": true
+  },
+  "base_model_name_or_path": "unsloth/gemma-3-1b-it-unsloth-bnb-4bit",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": "(?:.*?(?:language|text).*?(?:self_attn|attention|attn|mlp|feed_forward|ffn|dense).*?(?:q_proj|k_proj|v_proj|o_proj|gate_proj|up_proj|down_proj).*?)|(?:\\bmodel\\.layers\\.[\\d]{1,}\\.(?:self_attn|attention|attn|mlp|feed_forward|ffn|dense)\\.(?:(?:q_proj|k_proj|v_proj|o_proj|gate_proj|up_proj|down_proj)))",
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:22ec271e5e1a0942d81e43f4e7a960909d144fb209154fdbb87c70bcdc36a53f
+size 52231312

added_tokens.json ADDED Viewed

	@@ -0,0 +1,3 @@

+{
+  "<image_soft_token>": 262144
+}

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,47 @@

+{{ bos_token }}
+{%- if messages[0]['role'] == 'system' -%}
+    {%- if messages[0]['content'] is string -%}
+        {%- set first_user_prefix = messages[0]['content'] + '
+' -%}
+    {%- else -%}
+        {%- set first_user_prefix = messages[0]['content'][0]['text'] + '
+' -%}
+    {%- endif -%}
+    {%- set loop_messages = messages[1:] -%}
+{%- else -%}
+    {%- set first_user_prefix = "" -%}
+    {%- set loop_messages = messages -%}
+{%- endif -%}
+{%- for message in loop_messages -%}
+    {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
+        {{ raise_exception("Conversation roles must alternate user/assistant/user/assistant/...") }}
+    {%- endif -%}
+    {%- if (message['role'] == 'assistant') -%}
+        {%- set role = "model" -%}
+    {%- else -%}
+        {%- set role = message['role'] -%}
+    {%- endif -%}
+    {{ '<start_of_turn>' + role + '
+' + (first_user_prefix if loop.first else "") }}
+    {%- if message['content'] is string -%}
+        {{ message['content'] | trim }}
+    {%- elif message['content'] is iterable -%}
+        {%- for item in message['content'] -%}
+            {%- if item['type'] == 'image' -%}
+                {{ '<start_of_image>' }}
+            {%- elif item['type'] == 'text' -%}
+                {{ item['text'] | trim }}
+            {%- endif -%}
+        {%- endfor -%}
+    {%- else -%}
+        {{ raise_exception("Invalid content type") }}
+    {%- endif -%}
+    {{ '<end_of_turn>
+' }}
+{%- endfor -%}
+{%- if add_generation_prompt -%}
+    {{ '<start_of_turn>model
+' }}
+{%- endif -%}

optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:332eec556c7e5f1c4ec9a720eef60f6eafe555a76b82fcd1af9e1d10008a8993
+size 27861739

rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f6657ccd5ba73eb2588fe6c69638f02621253e47f1271867fd3af0b8ff5c9b2a
+size 14645

scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a72d1abae8c55fdedc7f6e855fb3939aba7f6d9e09baa4306b2f5553739814c3
+size 1465

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+  "boi_token": "<start_of_image>",
+  "bos_token": {
+    "content": "<bos>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eoi_token": "<end_of_image>",
+  "eos_token": {
+    "content": "<end_of_turn>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "image_token": "<image_soft_token>",
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4667f2089529e8e7657cfb6d1c19910ae71ff5f28aa7ab2ff2763330affad795
+size 33384568

tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1299c11d7cf632ef3b4e11937501358ada021bbdf7c47638d13c0ee982f2e79c
+size 4689074

tokenizer_config.json ADDED Viewed

The diff for this file is too large to render. See raw diff

trainer_state.json ADDED Viewed

The diff for this file is too large to render. See raw diff

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:81ed17072d6b7a89e259fb73c1864f355ddf518c5e2491afe950701b9fff8f3e
+size 6289