Update README.md

3665016 verified 6 months ago

6.79 kB

model_name: Mistral-7B-LoRA-Merged
repo: clarkkitchen22/mistral-7b-lora-merged
author: clarkkitchen22
license: apache-2.0
language:
  - en
library_name: transformers
pipeline_tag: text-generation
base_model: mistralai/Mistral-7B
quantization:
  supported:
    - fp16
    - 4bit-bnb
    - 8bit-bnb
  recommended: 4bit-bnb
  notes: Runs on RTX 2070 (8GB) with bitsandbytes.
training:
  approach: LoRA (Low-Rank Adaptation)
  lora:
    rank_r: 16
    alpha: 32
    dropout: 0.05
    target_modules:
      - q_proj
      - k_proj
      - v_proj
      - o_proj
      - gate_proj
      - up_proj
      - down_proj
  hardware:
    gpu: RTX 2070 (8GB)
    cpu: Intel i7-9750H
    ram_gb: 16
  timeframe: Developed over a single weekend (self-taught; no prior Python experience)
chat_template:
  style: '[INST] ... [/INST]'
  bos_token: <s>
  eos_token: </s>
metrics:
  - name: qualitative_instruction_following
    value: good
    notes: Tested manually across diverse prompts; no formal benchmark.
  - name: latency
    value: device-dependent
    notes: Merged weights enable faster load times and simplified inference.
usage:
  quickstart: >
    from transformers import AutoTokenizer, AutoModelForCausalLM

    import torch

    model_id = "clarkkitchen22/mistral-7b-lora-merged"

    tok = AutoTokenizer.from_pretrained(model_id)

    mdl = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto",
    torch_dtype=torch.float16)

    x = tok("Explain LoRA in simple terms.", return_tensors="pt").to(mdl.device)

    y = mdl.generate(**x, max_new_tokens=150)

    print(tok.decode(y[0], skip_special_tokens=True))
contact:
  profile: https://huggingface.co/clarkkitchen22
  note: Open for collaboration and AI engineering opportunities.
disclaimer: >
  This is an experimental, educational model created on consumer hardware.
  Outputs may vary or hallucinate — please verify responses for critical tasks.

🧠 Mistral-7B-LoRA-Merged

Author: clarkkitchen22

🚀 Overview

Mistral-7B-LoRA-Merged is a fully merged fine-tuned variant of Mistral-7B.
Developed by @clarkkitchen22 in a single weekend, this project demonstrates how open-source frameworks make it possible to fine-tune and deploy large models on consumer hardware — and how those skills translate into real, production-level understanding of model internals.

This project highlights practical AI engineering, optimization, and problem-solving skills, all learned and applied independently.

🧩 Model Summary

Field	Details
Base Model	Mistral-7B
Fine-Tuning Method	LoRA (Low-Rank Adaptation)
Merge Process	Custom `merge_lora.py` script
Hardware Used	RTX 2070 (8GB VRAM), i7-9750H, 16GB RAM
Precision	FP16 / 4-bit (bitsandbytes compatible)
Training Time	One weekend
Frameworks	🤗 Transformers, PEFT, BitsAndBytes
Use Case	Instruction-following, reasoning, creative text generation
License	Apache 2.0

💡 Highlights

Merged weights — no LoRA adapter required for inference.
Lightweight deployment — optimized for local GPUs (8GB+).
Fully reproducible — uses standard Hugging Face tools and scripts.
Built self-taught — demonstrates accessible AI development using open resources.
Custom tooling — includes a hand-written Python merge script for model consolidation.
Optimized inference — reduced load time and memory overhead by merging weights directly.

⚙️ Example Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "clarkkitchen22/mistral-7b-lora-merged"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

prompt = "Explain how LoRA works in simple terms."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=150)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

🧠 How It Works — The LoRA Merge Explained

Fine-Tuning Phase

LoRA fine-tuning modifies only a subset of weights — typically the projection layers in the transformer blocks.

Instead of retraining all 7B parameters, LoRA introduces small low-rank matrices (r=16) that capture task-specific updates efficiently.

This allows large models to be fine-tuned with minimal GPU memory usage.

Merging Phase

The trained LoRA adapters (ΔW) are mathematically added back to the base weights (W₀): Wmerged=W0+α⋅ΔW

After merging, the model behaves as if the adapters were permanently installed — no extra files, wrappers, or configuration needed.

The final checkpoint contains all learned improvements in a single, easy-to-deploy model file.

Result

Faster load times, reduced dependencies, and stable inference performance.

The merged model runs smoothly on mid-range GPUs while maintaining accuracy comparable to the fine-tuned version.

🧰 Technical Skills Demonstrated
Category	Skills & Concepts
Model Engineering	In-depth understanding of transformer internals, LoRA architecture, and PEFT fine-tuning techniques.
Python Development	Wrote custom merge_lora.py to automate model consolidation using the PEFT and Transformers APIs.
Systems Optimization	Applied 4-bit and 8-bit quantization for efficient training/inference on consumer GPUs.
Experiment Design	Planned and executed an end-to-end fine-tuning experiment, validated output quality manually.
Model Deployment	Created a single self-contained model ready for inference on Hugging Face and local hardware.
Documentation & Reproducibility	Produced structured metadata and README documentation for clarity and collaboration.
Self-Learning	Learned Python, PEFT, and LoRA concepts from scratch and successfully implemented them within days.
🧩 Why This Matters

This project is a proof of initiative, adaptability, and technical execution.
It demonstrates the ability to:

Independently research, implement, and validate advanced ML techniques.

Bridge the gap between research concepts and deployable systems.

Optimize large models for real-world use cases on constrained hardware.

Communicate the technical process clearly for both technical and non-technical stakeholders.

📬 Contact

Profile: huggingface.co/clarkkitchen22

Note: Open to collaboration and AI/ML engineering roles.

⚠️ Disclaimer

This is an educational and experimental project created on consumer hardware.
Outputs may contain inaccuracies; please verify results for important use cases.


---