--- title: Kimi 48B Fine-tuned - Inference emoji: 🚀 colorFrom: purple colorTo: blue sdk: docker pinned: false license: apache-2.0 app_port: 7860 suggested_hardware: l40sx4 --- # 🚀 Kimi Linear 48B A3B Instruct - Fine-tuned Professional inference Space for the fine-tuned Kimi-Linear-48B-A3B-Instruct model. ## Model Information - **Model:** [optiviseapp/kimi-linear-48b-a3b-instruct-fine-tune](https://huggingface.co/optiviseapp/kimi-linear-48b-a3b-instruct-fine-tune) - **Base Model:** [moonshotai/Kimi-Linear-48B-A3B-Instruct](https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct) - **Parameters:** 48 Billion - **Fine-tuning Method:** QLoRA (Quantized Low-Rank Adaptation) - **Architecture:** Mixture of Experts (MoE) Transformer ## Features ✨ **Professional Chat Interface** - Clean, modern UI for seamless conversations - Chat history with copy functionality - System prompt customization ⚙️ **Advanced Generation Settings** - Temperature control for creativity - Top-P and Top-K sampling - Repetition penalty adjustment - Configurable response length 🎮 **Optimized Performance** - Multi-GPU support (4xL40S recommended) - Automatic device mapping - bfloat16 precision for efficiency - ~96GB VRAM requirement ## Usage 1. **Click "Load Model"** - Initialize the model (takes 2-5 minutes) 2. **Set System Prompt** (optional) - Define the assistant's behavior 3. **Start Chatting** - Type your message and hit send 4. **Adjust Settings** - Fine-tune generation parameters as needed ## Generation Parameters ### Temperature (0.0 - 2.0) - **Low (0.1-0.5):** Focused, deterministic responses - **Medium (0.6-0.9):** Balanced creativity - **High (1.0-2.0):** More creative and diverse outputs ### Top P (0.0 - 1.0) - **0.9 (recommended):** Good balance - Lower values: More focused - Higher values: More diverse ### Max New Tokens - Maximum length of generated response - **1024 (default):** Good for most use cases - Increase for longer responses ## Hardware Requirements - **Recommended:** 4x NVIDIA L40S GPUs (192GB total VRAM) - **Minimum:** 4x NVIDIA L4 GPUs (96GB total VRAM) - **Memory:** ~96GB VRAM in bfloat16 precision ## Fine-tuning Details This model was fine-tuned using QLoRA with the following configuration: - **LoRA Rank (r):** 16 - **LoRA Alpha:** 32 - **Target Modules:** q_proj, k_proj, v_proj, o_proj (attention layers only) - **Dropout:** 0.05 ## Support For issues or questions: - [Transformers Documentation](https://huggingface.co/docs/transformers) - [Model Page](https://huggingface.co/optiviseapp/kimi-linear-48b-a3b-instruct-fine-tune) --- Built with ❤️ using Transformers and Gradio