Spaces:
Running
Running
Update app_gpu.py
Browse files- app_gpu.py +164 -0
app_gpu.py
CHANGED
|
@@ -541,6 +541,170 @@ print(f"[UPLOAD] Pushing adapter to {hf_repo_id}")
|
|
| 541 |
# -> Uploads model to Hugging Face Hub
|
| 542 |
# [UPLOAD] adapter_model.safetensors (67.7 MB)
|
| 543 |
# [SUCCESS] LoRA uploaded successfully 🚀
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 544 |
""")
|
| 545 |
return demo
|
| 546 |
|
|
|
|
| 541 |
# -> Uploads model to Hugging Face Hub
|
| 542 |
# [UPLOAD] adapter_model.safetensors (67.7 MB)
|
| 543 |
# [SUCCESS] LoRA uploaded successfully 🚀
|
| 544 |
+
|
| 545 |
+
### 🧩 Universal Dynamic LoRA Trainer & Inference — Code Explanation
|
| 546 |
+
This project provides an **end-to-end LoRA fine-tuning and inference system** for language models like **Gemma**, built with **Gradio**, **PEFT**, and **Accelerate**.
|
| 547 |
+
It supports both **training new LoRAs** and **generating text** with existing ones — all in a single interface.
|
| 548 |
+
|
| 549 |
+
---
|
| 550 |
+
|
| 551 |
+
#### **1️⃣ Imports Overview**
|
| 552 |
+
- **Core libs:** `os`, `torch`, `gradio`, `numpy`, `pandas`
|
| 553 |
+
- **Training libs:** `peft` (`LoraConfig`, `get_peft_model`), `accelerate` (`Accelerator`)
|
| 554 |
+
- **Modeling:** `transformers` (for Gemma base model)
|
| 555 |
+
- **Hub integration:** `huggingface_hub` (for uploading adapters)
|
| 556 |
+
- **Spaces:** `spaces` — for execution within Hugging Face Spaces
|
| 557 |
+
|
| 558 |
+
---
|
| 559 |
+
|
| 560 |
+
#### **2️⃣ Dataset Loading**
|
| 561 |
+
- Uses a lightweight **MediaTextDataset** class to load:
|
| 562 |
+
- CSV / Parquet files
|
| 563 |
+
- or directly from a Hugging Face dataset repo
|
| 564 |
+
- Expects two columns:
|
| 565 |
+
`short_prompt` → Input text
|
| 566 |
+
`long_prompt` → Target expanded text
|
| 567 |
+
- Supports batching, missing-column checks, and configurable max record limits.
|
| 568 |
+
|
| 569 |
+
---
|
| 570 |
+
|
| 571 |
+
#### **3️⃣ Model Loading & Preparation**
|
| 572 |
+
- Loads **Gemma model and tokenizer** via `AutoModelForCausalLM` and `AutoTokenizer`.
|
| 573 |
+
- Automatically detects **target modules** (e.g. `q_proj`, `v_proj`) for LoRA injection.
|
| 574 |
+
- Supports `float16` or `bfloat16` precision with `Accelerator` for optimal memory usage.
|
| 575 |
+
|
| 576 |
+
---
|
| 577 |
+
|
| 578 |
+
#### **4️⃣ LoRA Training Logic**
|
| 579 |
+
- Core formula:
|
| 580 |
+
\[
|
| 581 |
+
W_{eff} = W + \alpha \times (B @ A)
|
| 582 |
+
\]
|
| 583 |
+
- Only **A** and **B** matrices are trainable; base model weights remain frozen.
|
| 584 |
+
- Configurable parameters:
|
| 585 |
+
`r` (rank), `alpha` (scaling), `epochs`, `lr`, `batch_size`
|
| 586 |
+
- Training logs stream live in the UI, showing step-by-step loss values.
|
| 587 |
+
- After training, the adapter is **saved locally** and **uploaded to Hugging Face Hub**.
|
| 588 |
+
|
| 589 |
+
---
|
| 590 |
+
|
| 591 |
+
#### **5️⃣ CPU Inference Mode**
|
| 592 |
+
- Runs entirely on **CPU**, no GPU required.
|
| 593 |
+
- Loads base Gemma model + trained LoRA weights (`PeftModel.from_pretrained`).
|
| 594 |
+
- Optionally merges LoRA with base model.
|
| 595 |
+
- Expands the short prompt → long descriptive text using standard generation parameters (e.g., top-p / top-k sampling).
|
| 596 |
+
|
| 597 |
+
---
|
| 598 |
+
|
| 599 |
+
#### **6️⃣ 🧠 What LoRA Does (A & B Injection Explained)**
|
| 600 |
+
|
| 601 |
+
When you fine-tune a large model (like Gemma or Llama), you’re adjusting **billions** of parameters in large weight matrices.
|
| 602 |
+
LoRA avoids this by **injecting two small low-rank matrices (A and B)** into selected layers instead of modifying the full weight.
|
| 603 |
+
|
| 604 |
+
---
|
| 605 |
+
|
| 606 |
+
##### **Step 1: Regular Linear Layer**
|
| 607 |
+
|
| 608 |
+
\[
|
| 609 |
+
y = W x
|
| 610 |
+
\]
|
| 611 |
+
|
| 612 |
+
Here, **W** is a huge matrix (e.g., 4096×4096).
|
| 613 |
+
|
| 614 |
+
---
|
| 615 |
+
|
| 616 |
+
##### **Step 2: LoRA Layer Modification**
|
| 617 |
+
|
| 618 |
+
Instead of updating W directly, LoRA adds a lightweight update:
|
| 619 |
+
|
| 620 |
+
\[
|
| 621 |
+
W' = W + \Delta W
|
| 622 |
+
\]
|
| 623 |
+
\[
|
| 624 |
+
\Delta W = B A
|
| 625 |
+
\]
|
| 626 |
+
|
| 627 |
+
Where:
|
| 628 |
+
- **A** ∈ ℝ^(r × d)
|
| 629 |
+
- **B** ∈ ℝ^(d × r)
|
| 630 |
+
- and **r ≪ d** (e.g., r=8 instead of 4096)
|
| 631 |
+
|
| 632 |
+
So you’re training only a *tiny fraction* of parameters.
|
| 633 |
+
|
| 634 |
+
---
|
| 635 |
+
|
| 636 |
+
##### **Step 3: Where LoRA Gets Injected**
|
| 637 |
+
|
| 638 |
+
It targets critical sub-layers such as:
|
| 639 |
+
- **q_proj, k_proj, v_proj** → Query, Key, Value projections in attention
|
| 640 |
+
- **o_proj / out_proj** → Output projection
|
| 641 |
+
- **gate_proj, up_proj, down_proj** → Feed-forward layers
|
| 642 |
+
|
| 643 |
+
When you see:
|
| 644 |
+
> `Adapter (90)`
|
| 645 |
+
|
| 646 |
+
That means 90 total layers (from these modules) were wrapped with LoRA adapters.
|
| 647 |
+
|
| 648 |
+
---
|
| 649 |
+
|
| 650 |
+
##### **Step 4: Training Efficiency**
|
| 651 |
+
|
| 652 |
+
- Base weights (`W`) stay **frozen**
|
| 653 |
+
- Only `(A, B)` are **trainable**
|
| 654 |
+
- Compute and memory are drastically reduced
|
| 655 |
+
|
| 656 |
+
| Metric | Full Fine-Tune | LoRA Fine-Tune |
|
| 657 |
+
|---------|----------------|----------------|
|
| 658 |
+
| Trainable Params | 2B+ | ~3M |
|
| 659 |
+
| GPU Memory | 40GB+ | <6GB |
|
| 660 |
+
| Time | 10–20 hrs | <1 hr |
|
| 661 |
+
|
| 662 |
+
---
|
| 663 |
+
|
| 664 |
+
##### **Step 5: Inference Equation**
|
| 665 |
+
|
| 666 |
+
At inference time:
|
| 667 |
+
\[
|
| 668 |
+
y = (W + \alpha \times B A) x
|
| 669 |
+
\]
|
| 670 |
+
|
| 671 |
+
Where **α** controls the strength of the adapter’s influence.
|
| 672 |
+
|
| 673 |
+
---
|
| 674 |
+
|
| 675 |
+
##### **Step 6: Visualization**
|
| 676 |
+
Base Layer:
|
| 677 |
+
y = W * x
|
| 678 |
+
|
| 679 |
+
LoRA Layer:
|
| 680 |
+
y = (W + B@A) * x
|
| 681 |
+
↑ ↑
|
| 682 |
+
| └── Small rank-A adapter (trainable)
|
| 683 |
+
└──── Small rank-B adapter (trainable)
|
| 684 |
+
|
| 685 |
+
---
|
| 686 |
+
|
| 687 |
+
##### **Step 7: Example in Code**
|
| 688 |
+
|
| 689 |
+
```python
|
| 690 |
+
from peft import LoraConfig, get_peft_model
|
| 691 |
+
from transformers import AutoModelForCausalLM
|
| 692 |
+
|
| 693 |
+
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it")
|
| 694 |
+
|
| 695 |
+
config = LoraConfig(
|
| 696 |
+
r=8,
|
| 697 |
+
lora_alpha=16,
|
| 698 |
+
target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
|
| 699 |
+
lora_dropout=0.05
|
| 700 |
+
)
|
| 701 |
+
|
| 702 |
+
model = get_peft_model(model, config)
|
| 703 |
+
model.print_trainable_parameters()
|
| 704 |
+
Expected output:
|
| 705 |
+
trainable params: 3,278,848 || all params: 2,040,000,000 || trainable%: 0.16%
|
| 706 |
+
|
| 707 |
+
|
| 708 |
""")
|
| 709 |
return demo
|
| 710 |
|