File size: 2,662 Bytes
5e458c4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
---
title: Kimi 48B Fine-tuned - Inference
emoji: ๐Ÿš€
colorFrom: purple
colorTo: blue
sdk: docker
pinned: false
license: apache-2.0
app_port: 7860
suggested_hardware: l40sx4
---

# ๐Ÿš€ Kimi Linear 48B A3B Instruct - Fine-tuned

Professional inference Space for the fine-tuned Kimi-Linear-48B-A3B-Instruct model.

## Model Information

- **Model:** [optiviseapp/kimi-linear-48b-a3b-instruct-fine-tune](https://huggingface.co/optiviseapp/kimi-linear-48b-a3b-instruct-fine-tune)
- **Base Model:** [moonshotai/Kimi-Linear-48B-A3B-Instruct](https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct)
- **Parameters:** 48 Billion
- **Fine-tuning Method:** QLoRA (Quantized Low-Rank Adaptation)
- **Architecture:** Mixture of Experts (MoE) Transformer

## Features

โœจ **Professional Chat Interface**
- Clean, modern UI for seamless conversations
- Chat history with copy functionality
- System prompt customization

โš™๏ธ **Advanced Generation Settings**
- Temperature control for creativity
- Top-P and Top-K sampling
- Repetition penalty adjustment
- Configurable response length

๐ŸŽฎ **Optimized Performance**
- Multi-GPU support (4xL40S recommended)
- Automatic device mapping
- bfloat16 precision for efficiency
- ~96GB VRAM requirement

## Usage

1. **Click "Load Model"** - Initialize the model (takes 2-5 minutes)
2. **Set System Prompt** (optional) - Define the assistant's behavior
3. **Start Chatting** - Type your message and hit send
4. **Adjust Settings** - Fine-tune generation parameters as needed

## Generation Parameters

### Temperature (0.0 - 2.0)
- **Low (0.1-0.5):** Focused, deterministic responses
- **Medium (0.6-0.9):** Balanced creativity
- **High (1.0-2.0):** More creative and diverse outputs

### Top P (0.0 - 1.0)
- **0.9 (recommended):** Good balance
- Lower values: More focused
- Higher values: More diverse

### Max New Tokens
- Maximum length of generated response
- **1024 (default):** Good for most use cases
- Increase for longer responses

## Hardware Requirements

- **Recommended:** 4x NVIDIA L40S GPUs (192GB total VRAM)
- **Minimum:** 4x NVIDIA L4 GPUs (96GB total VRAM)
- **Memory:** ~96GB VRAM in bfloat16 precision

## Fine-tuning Details

This model was fine-tuned using QLoRA with the following configuration:
- **LoRA Rank (r):** 16
- **LoRA Alpha:** 32
- **Target Modules:** q_proj, k_proj, v_proj, o_proj (attention layers only)
- **Dropout:** 0.05

## Support

For issues or questions:
- [Transformers Documentation](https://huggingface.co/docs/transformers)
- [Model Page](https://huggingface.co/optiviseapp/kimi-linear-48b-a3b-instruct-fine-tune)

---

Built with โค๏ธ using Transformers and Gradio