fnmodel / README.md
aeb56
Add 8-bit quantization support and switch to L4x4 hardware for availability
e32298d
|
raw
history blame
1.94 kB
metadata
title: LoRA Model Merger
emoji: πŸ”—
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: apache-2.0
app_port: 7860
suggested_hardware: l4x4

πŸ”— LoRA Model Merger

A Hugging Face Space for merging fine-tuned LoRA adapters with base models.

Overview

This Space provides an easy-to-use interface for merging LoRA (Low-Rank Adaptation) fine-tuned models with their base models. Specifically designed for:

  • Base Model: moonshotai/Kimi-Linear-48B-A3B-Instruct
  • LoRA Adapters: Optivise/kimi-linear-48b-a3b-instruct-qlora-fine-tuned

Features

βœ… Easy Model Merging - Simple UI to merge LoRA adapters with base model βœ… Built-in Testing - Test your merged model with custom prompts βœ… Hub Integration - Upload merged models directly to Hugging Face Hub βœ… GPU Optimized - Designed for 4xL40S GPU setup

Usage

  1. Merge Models: Provide your Hugging Face token and click "Start Merge Process"
  2. Test Inference: Test the merged model with sample prompts
  3. Upload to Hub: Optionally upload the merged model to your Hugging Face account

Requirements

  • Hardware: 4x NVIDIA L40S GPUs (or equivalent with ~192GB VRAM)
  • Software: Docker, CUDA 12.1+
  • Access: Valid Hugging Face token for model access

Technical Details

The merge process:

  1. Downloads the base model (~48B parameters)
  2. Loads LoRA adapter weights
  3. Merges adapters into base model using PEFT
  4. Saves the unified model for inference

Notes

  • Merge process can take 10-30 minutes depending on network speed
  • Merged model will be approximately the same size as the base model
  • Ensure you have appropriate access rights to both base and LoRA models

Support

For issues or questions:


Built with ❀️ using Transformers, PEFT, and Gradio