rahul7star commited on
Commit
abc41be
·
verified ·
1 Parent(s): b65b846

Update app_gpu.py

Browse files
Files changed (1) hide show
  1. app_gpu.py +164 -0
app_gpu.py CHANGED
@@ -541,6 +541,170 @@ print(f"[UPLOAD] Pushing adapter to {hf_repo_id}")
541
  # -> Uploads model to Hugging Face Hub
542
  # [UPLOAD] adapter_model.safetensors (67.7 MB)
543
  # [SUCCESS] LoRA uploaded successfully 🚀
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
544
  """)
545
  return demo
546
 
 
541
  # -> Uploads model to Hugging Face Hub
542
  # [UPLOAD] adapter_model.safetensors (67.7 MB)
543
  # [SUCCESS] LoRA uploaded successfully 🚀
544
+
545
+ ### 🧩 Universal Dynamic LoRA Trainer & Inference — Code Explanation
546
+ This project provides an **end-to-end LoRA fine-tuning and inference system** for language models like **Gemma**, built with **Gradio**, **PEFT**, and **Accelerate**.
547
+ It supports both **training new LoRAs** and **generating text** with existing ones — all in a single interface.
548
+
549
+ ---
550
+
551
+ #### **1️⃣ Imports Overview**
552
+ - **Core libs:** `os`, `torch`, `gradio`, `numpy`, `pandas`
553
+ - **Training libs:** `peft` (`LoraConfig`, `get_peft_model`), `accelerate` (`Accelerator`)
554
+ - **Modeling:** `transformers` (for Gemma base model)
555
+ - **Hub integration:** `huggingface_hub` (for uploading adapters)
556
+ - **Spaces:** `spaces` — for execution within Hugging Face Spaces
557
+
558
+ ---
559
+
560
+ #### **2️⃣ Dataset Loading**
561
+ - Uses a lightweight **MediaTextDataset** class to load:
562
+ - CSV / Parquet files
563
+ - or directly from a Hugging Face dataset repo
564
+ - Expects two columns:
565
+ `short_prompt` → Input text
566
+ `long_prompt` → Target expanded text
567
+ - Supports batching, missing-column checks, and configurable max record limits.
568
+
569
+ ---
570
+
571
+ #### **3️⃣ Model Loading & Preparation**
572
+ - Loads **Gemma model and tokenizer** via `AutoModelForCausalLM` and `AutoTokenizer`.
573
+ - Automatically detects **target modules** (e.g. `q_proj`, `v_proj`) for LoRA injection.
574
+ - Supports `float16` or `bfloat16` precision with `Accelerator` for optimal memory usage.
575
+
576
+ ---
577
+
578
+ #### **4️⃣ LoRA Training Logic**
579
+ - Core formula:
580
+ \[
581
+ W_{eff} = W + \alpha \times (B @ A)
582
+ \]
583
+ - Only **A** and **B** matrices are trainable; base model weights remain frozen.
584
+ - Configurable parameters:
585
+ `r` (rank), `alpha` (scaling), `epochs`, `lr`, `batch_size`
586
+ - Training logs stream live in the UI, showing step-by-step loss values.
587
+ - After training, the adapter is **saved locally** and **uploaded to Hugging Face Hub**.
588
+
589
+ ---
590
+
591
+ #### **5️⃣ CPU Inference Mode**
592
+ - Runs entirely on **CPU**, no GPU required.
593
+ - Loads base Gemma model + trained LoRA weights (`PeftModel.from_pretrained`).
594
+ - Optionally merges LoRA with base model.
595
+ - Expands the short prompt → long descriptive text using standard generation parameters (e.g., top-p / top-k sampling).
596
+
597
+ ---
598
+
599
+ #### **6️⃣ 🧠 What LoRA Does (A & B Injection Explained)**
600
+
601
+ When you fine-tune a large model (like Gemma or Llama), you’re adjusting **billions** of parameters in large weight matrices.
602
+ LoRA avoids this by **injecting two small low-rank matrices (A and B)** into selected layers instead of modifying the full weight.
603
+
604
+ ---
605
+
606
+ ##### **Step 1: Regular Linear Layer**
607
+
608
+ \[
609
+ y = W x
610
+ \]
611
+
612
+ Here, **W** is a huge matrix (e.g., 4096×4096).
613
+
614
+ ---
615
+
616
+ ##### **Step 2: LoRA Layer Modification**
617
+
618
+ Instead of updating W directly, LoRA adds a lightweight update:
619
+
620
+ \[
621
+ W' = W + \Delta W
622
+ \]
623
+ \[
624
+ \Delta W = B A
625
+ \]
626
+
627
+ Where:
628
+ - **A** ∈ ℝ^(r × d)
629
+ - **B** ∈ ℝ^(d × r)
630
+ - and **r ≪ d** (e.g., r=8 instead of 4096)
631
+
632
+ So you’re training only a *tiny fraction* of parameters.
633
+
634
+ ---
635
+
636
+ ##### **Step 3: Where LoRA Gets Injected**
637
+
638
+ It targets critical sub-layers such as:
639
+ - **q_proj, k_proj, v_proj** → Query, Key, Value projections in attention
640
+ - **o_proj / out_proj** → Output projection
641
+ - **gate_proj, up_proj, down_proj** → Feed-forward layers
642
+
643
+ When you see:
644
+ > `Adapter (90)`
645
+
646
+ That means 90 total layers (from these modules) were wrapped with LoRA adapters.
647
+
648
+ ---
649
+
650
+ ##### **Step 4: Training Efficiency**
651
+
652
+ - Base weights (`W`) stay **frozen**
653
+ - Only `(A, B)` are **trainable**
654
+ - Compute and memory are drastically reduced
655
+
656
+ | Metric | Full Fine-Tune | LoRA Fine-Tune |
657
+ |---------|----------------|----------------|
658
+ | Trainable Params | 2B+ | ~3M |
659
+ | GPU Memory | 40GB+ | <6GB |
660
+ | Time | 10–20 hrs | <1 hr |
661
+
662
+ ---
663
+
664
+ ##### **Step 5: Inference Equation**
665
+
666
+ At inference time:
667
+ \[
668
+ y = (W + \alpha \times B A) x
669
+ \]
670
+
671
+ Where **α** controls the strength of the adapter’s influence.
672
+
673
+ ---
674
+
675
+ ##### **Step 6: Visualization**
676
+ Base Layer:
677
+ y = W * x
678
+
679
+ LoRA Layer:
680
+ y = (W + B@A) * x
681
+ ↑ ↑
682
+ | └── Small rank-A adapter (trainable)
683
+ └──── Small rank-B adapter (trainable)
684
+
685
+ ---
686
+
687
+ ##### **Step 7: Example in Code**
688
+
689
+ ```python
690
+ from peft import LoraConfig, get_peft_model
691
+ from transformers import AutoModelForCausalLM
692
+
693
+ model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it")
694
+
695
+ config = LoraConfig(
696
+ r=8,
697
+ lora_alpha=16,
698
+ target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
699
+ lora_dropout=0.05
700
+ )
701
+
702
+ model = get_peft_model(model, config)
703
+ model.print_trainable_parameters()
704
+ Expected output:
705
+ trainable params: 3,278,848 || all params: 2,040,000,000 || trainable%: 0.16%
706
+
707
+
708
  """)
709
  return demo
710