---
title: Kimi 48B Fine-tuned - Inference
emoji: 🚀
colorFrom: purple
colorTo: blue
sdk: docker
pinned: false
license: apache-2.0
app_port: 7860
suggested_hardware: l40sx4
---

# 🚀 Kimi Linear 48B A3B Instruct - Fine-tuned

Professional inference Space for the fine-tuned Kimi-Linear-48B-A3B-Instruct model.

## Model Information

- **Model:** [optiviseapp/kimi-linear-48b-a3b-instruct-fine-tune](https://huggingface.co/optiviseapp/kimi-linear-48b-a3b-instruct-fine-tune)
- **Base Model:** [moonshotai/Kimi-Linear-48B-A3B-Instruct](https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct)
- **Parameters:** 48 Billion
- **Fine-tuning Method:** QLoRA (Quantized Low-Rank Adaptation)
- **Architecture:** Mixture of Experts (MoE) Transformer

## Features

✨ **Professional Chat Interface**
- Clean, modern UI for seamless conversations
- Chat history with copy functionality
- System prompt customization

⚙️ **Advanced Generation Settings**
- Temperature control for creativity
- Top-P and Top-K sampling
- Repetition penalty adjustment
- Configurable response length

🎮 **Optimized Performance**
- Multi-GPU support (4xL40S recommended)
- Automatic device mapping
- bfloat16 precision for efficiency
- ~96GB VRAM requirement

## Usage

1. **Click "Load Model"** - Initialize the model (takes 2-5 minutes)
2. **Set System Prompt** (optional) - Define the assistant's behavior
3. **Start Chatting** - Type your message and hit send
4. **Adjust Settings** - Fine-tune generation parameters as needed

## Generation Parameters

### Temperature (0.0 - 2.0)
- **Low (0.1-0.5):** Focused, deterministic responses
- **Medium (0.6-0.9):** Balanced creativity
- **High (1.0-2.0):** More creative and diverse outputs

### Top P (0.0 - 1.0)
- **0.9 (recommended):** Good balance
- Lower values: More focused
- Higher values: More diverse

### Max New Tokens
- Maximum length of generated response
- **1024 (default):** Good for most use cases
- Increase for longer responses

## Hardware Requirements

- **Recommended:** 4x NVIDIA L40S GPUs (192GB total VRAM)
- **Minimum:** 4x NVIDIA L4 GPUs (96GB total VRAM)
- **Memory:** ~96GB VRAM in bfloat16 precision

## Fine-tuning Details

This model was fine-tuned using QLoRA with the following configuration:
- **LoRA Rank (r):** 16
- **LoRA Alpha:** 32
- **Target Modules:** q_proj, k_proj, v_proj, o_proj (attention layers only)
- **Dropout:** 0.05

## Support

For issues or questions:
- [Transformers Documentation](https://huggingface.co/docs/transformers)
- [Model Page](https://huggingface.co/optiviseapp/kimi-linear-48b-a3b-instruct-fine-tune)

---

Built with ❤️ using Transformers and Gradio