Spaces:

rahul7star
/

Train-Lora

Running

App Files Files Community

rahul7star commited on Nov 10, 2025

Commit

abc41be

verified ·

1 Parent(s): b65b846

Update app_gpu.py

Browse files

Files changed (1) hide show

app_gpu.py +164 -0

app_gpu.py CHANGED Viewed

@@ -541,6 +541,170 @@ print(f"[UPLOAD] Pushing adapter to {hf_repo_id}")
 # -> Uploads model to Hugging Face Hub
 # [UPLOAD] adapter_model.safetensors (67.7 MB)
 # [SUCCESS] LoRA uploaded successfully 🚀
 """)
     return demo

 # -> Uploads model to Hugging Face Hub
 # [UPLOAD] adapter_model.safetensors (67.7 MB)
 # [SUCCESS] LoRA uploaded successfully 🚀
+### 🧩 Universal Dynamic LoRA Trainer & Inference — Code Explanation
+This project provides an **end-to-end LoRA fine-tuning and inference system** for language models like **Gemma**, built with **Gradio**, **PEFT**, and **Accelerate**.
+It supports both **training new LoRAs** and **generating text** with existing ones — all in a single interface.
+---
+#### **1️⃣ Imports Overview**
+- **Core libs:** `os`, `torch`, `gradio`, `numpy`, `pandas`
+- **Training libs:** `peft` (`LoraConfig`, `get_peft_model`), `accelerate` (`Accelerator`)
+- **Modeling:** `transformers` (for Gemma base model)
+- **Hub integration:** `huggingface_hub` (for uploading adapters)
+- **Spaces:** `spaces` — for execution within Hugging Face Spaces
+---
+#### **2️⃣ Dataset Loading**
+- Uses a lightweight **MediaTextDataset** class to load:
+  - CSV / Parquet files
+  - or directly from a Hugging Face dataset repo
+- Expects two columns:
+  `short_prompt` → Input text
+  `long_prompt` → Target expanded text
+- Supports batching, missing-column checks, and configurable max record limits.
+---
+#### **3️⃣ Model Loading & Preparation**
+- Loads **Gemma model and tokenizer** via `AutoModelForCausalLM` and `AutoTokenizer`.
+- Automatically detects **target modules** (e.g. `q_proj`, `v_proj`) for LoRA injection.
+- Supports `float16` or `bfloat16` precision with `Accelerator` for optimal memory usage.
+---
+#### **4️⃣ LoRA Training Logic**
+- Core formula:
+  \[
+  W_{eff} = W + \alpha \times (B @ A)
+  \]
+- Only **A** and **B** matrices are trainable; base model weights remain frozen.
+- Configurable parameters:
+  `r` (rank), `alpha` (scaling), `epochs`, `lr`, `batch_size`
+- Training logs stream live in the UI, showing step-by-step loss values.
+- After training, the adapter is **saved locally** and **uploaded to Hugging Face Hub**.
+---
+#### **5️⃣ CPU Inference Mode**
+- Runs entirely on **CPU**, no GPU required.
+- Loads base Gemma model + trained LoRA weights (`PeftModel.from_pretrained`).
+- Optionally merges LoRA with base model.
+- Expands the short prompt → long descriptive text using standard generation parameters (e.g., top-p / top-k sampling).
+---
+#### **6️⃣ 🧠 What LoRA Does (A & B Injection Explained)**
+When you fine-tune a large model (like Gemma or Llama), you’re adjusting **billions** of parameters in large weight matrices.
+LoRA avoids this by **injecting two small low-rank matrices (A and B)** into selected layers instead of modifying the full weight.
+---
+##### **Step 1: Regular Linear Layer**
+\[
+y = W x
+\]
+Here, **W** is a huge matrix (e.g., 4096×4096).
+---
+##### **Step 2: LoRA Layer Modification**
+Instead of updating W directly, LoRA adds a lightweight update:
+\[
+W' = W + \Delta W
+\]
+\[
+\Delta W = B A
+\]
+Where:
+- **A** ∈ ℝ^(r × d)
+- **B** ∈ ℝ^(d × r)
+- and **r ≪ d** (e.g., r=8 instead of 4096)
+So you’re training only a *tiny fraction* of parameters.
+---
+##### **Step 3: Where LoRA Gets Injected**
+It targets critical sub-layers such as:
+- **q_proj, k_proj, v_proj** → Query, Key, Value projections in attention
+- **o_proj / out_proj** → Output projection
+- **gate_proj, up_proj, down_proj** → Feed-forward layers
+When you see:
+> `Adapter (90)`
+That means 90 total layers (from these modules) were wrapped with LoRA adapters.
+---
+##### **Step 4: Training Efficiency**
+- Base weights (`W`) stay **frozen**
+- Only `(A, B)` are **trainable**
+- Compute and memory are drastically reduced
+| Metric | Full Fine-Tune | LoRA Fine-Tune |
+|---------|----------------|----------------|
+| Trainable Params | 2B+ | ~3M |
+| GPU Memory | 40GB+ | <6GB |
+| Time | 10–20 hrs | <1 hr |
+---
+##### **Step 5: Inference Equation**
+At inference time:
+\[
+y = (W + \alpha \times B A) x
+\]
+Where **α** controls the strength of the adapter’s influence.
+---
+##### **Step 6: Visualization**
+Base Layer:
+y = W * x
+LoRA Layer:
+y = (W + B@A) * x
+↑ ↑
+| └── Small rank-A adapter (trainable)
+└──── Small rank-B adapter (trainable)
+---
+##### **Step 7: Example in Code**
+```python
+from peft import LoraConfig, get_peft_model
+from transformers import AutoModelForCausalLM
+model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it")
+config = LoraConfig(
+    r=8,
+    lora_alpha=16,
+    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
+    lora_dropout=0.05
+)
+model = get_peft_model(model, config)
+model.print_trainable_parameters()
+Expected output:
+trainable params: 3,278,848 || all params: 2,040,000,000 || trainable%: 0.16%
 """)
     return demo