fnmodel / README_inference.md
aeb56
Transform Space into professional inference UI for fine-tuned model
5e458c4
---
title: Kimi 48B Fine-tuned - Inference
emoji: ๐Ÿš€
colorFrom: purple
colorTo: blue
sdk: docker
pinned: false
license: apache-2.0
app_port: 7860
suggested_hardware: l40sx4
---
# ๐Ÿš€ Kimi Linear 48B A3B Instruct - Fine-tuned
Professional inference Space for the fine-tuned Kimi-Linear-48B-A3B-Instruct model.
## Model Information
- **Model:** [optiviseapp/kimi-linear-48b-a3b-instruct-fine-tune](https://huggingface.co/optiviseapp/kimi-linear-48b-a3b-instruct-fine-tune)
- **Base Model:** [moonshotai/Kimi-Linear-48B-A3B-Instruct](https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct)
- **Parameters:** 48 Billion
- **Fine-tuning Method:** QLoRA (Quantized Low-Rank Adaptation)
- **Architecture:** Mixture of Experts (MoE) Transformer
## Features
โœจ **Professional Chat Interface**
- Clean, modern UI for seamless conversations
- Chat history with copy functionality
- System prompt customization
โš™๏ธ **Advanced Generation Settings**
- Temperature control for creativity
- Top-P and Top-K sampling
- Repetition penalty adjustment
- Configurable response length
๐ŸŽฎ **Optimized Performance**
- Multi-GPU support (4xL40S recommended)
- Automatic device mapping
- bfloat16 precision for efficiency
- ~96GB VRAM requirement
## Usage
1. **Click "Load Model"** - Initialize the model (takes 2-5 minutes)
2. **Set System Prompt** (optional) - Define the assistant's behavior
3. **Start Chatting** - Type your message and hit send
4. **Adjust Settings** - Fine-tune generation parameters as needed
## Generation Parameters
### Temperature (0.0 - 2.0)
- **Low (0.1-0.5):** Focused, deterministic responses
- **Medium (0.6-0.9):** Balanced creativity
- **High (1.0-2.0):** More creative and diverse outputs
### Top P (0.0 - 1.0)
- **0.9 (recommended):** Good balance
- Lower values: More focused
- Higher values: More diverse
### Max New Tokens
- Maximum length of generated response
- **1024 (default):** Good for most use cases
- Increase for longer responses
## Hardware Requirements
- **Recommended:** 4x NVIDIA L40S GPUs (192GB total VRAM)
- **Minimum:** 4x NVIDIA L4 GPUs (96GB total VRAM)
- **Memory:** ~96GB VRAM in bfloat16 precision
## Fine-tuning Details
This model was fine-tuned using QLoRA with the following configuration:
- **LoRA Rank (r):** 16
- **LoRA Alpha:** 32
- **Target Modules:** q_proj, k_proj, v_proj, o_proj (attention layers only)
- **Dropout:** 0.05
## Support
For issues or questions:
- [Transformers Documentation](https://huggingface.co/docs/transformers)
- [Model Page](https://huggingface.co/optiviseapp/kimi-linear-48b-a3b-instruct-fine-tune)
---
Built with โค๏ธ using Transformers and Gradio