Spaces:

optiviseapp
/

fnmodel

Paused

App Files Files Community

fnmodel / README.md

aeb56

Add 8-bit quantization support and switch to L4x4 hardware for availability

e32298d about 1 month ago

preview code

raw

history blame

1.94 kB

metadata

title: LoRA Model Merger
emoji: 🔗
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: apache-2.0
app_port: 7860
suggested_hardware: l4x4

🔗 LoRA Model Merger

A Hugging Face Space for merging fine-tuned LoRA adapters with base models.

Overview

This Space provides an easy-to-use interface for merging LoRA (Low-Rank Adaptation) fine-tuned models with their base models. Specifically designed for:

Base Model: moonshotai/Kimi-Linear-48B-A3B-Instruct
LoRA Adapters: Optivise/kimi-linear-48b-a3b-instruct-qlora-fine-tuned

Features

✅ Easy Model Merging - Simple UI to merge LoRA adapters with base model ✅ Built-in Testing - Test your merged model with custom prompts ✅ Hub Integration - Upload merged models directly to Hugging Face Hub ✅ GPU Optimized - Designed for 4xL40S GPU setup

Usage

Merge Models: Provide your Hugging Face token and click "Start Merge Process"
Test Inference: Test the merged model with sample prompts
Upload to Hub: Optionally upload the merged model to your Hugging Face account

Requirements

Hardware: 4x NVIDIA L40S GPUs (or equivalent with ~192GB VRAM)
Software: Docker, CUDA 12.1+
Access: Valid Hugging Face token for model access

Technical Details

The merge process:

Downloads the base model (~48B parameters)
Loads LoRA adapter weights
Merges adapters into base model using PEFT
Saves the unified model for inference

Notes

Merge process can take 10-30 minutes depending on network speed
Merged model will be approximately the same size as the base model
Ensure you have appropriate access rights to both base and LoRA models

Support

For issues or questions:

Built with ❤️ using Transformers, PEFT, and Gradio