--- title: Kimi 48B Fine-tuned - Evaluation emoji: 📊 colorFrom: purple colorTo: blue sdk: docker pinned: false license: apache-2.0 app_port: 7860 suggested_hardware: l4x4 --- # 📊 Kimi Linear 48B A3B Instruct - Evaluation Model evaluation Space for the fine-tuned Kimi-Linear-48B-A3B-Instruct model. **Chat/inference functionality is currently disabled** - this Space focuses on running benchmarks and evaluations only. ## Model Information - **Model:** [optiviseapp/kimi-linear-48b-a3b-instruct-fine-tune](https://huggingface.co/optiviseapp/kimi-linear-48b-a3b-instruct-fine-tune) - **Base Model:** [moonshotai/Kimi-Linear-48B-A3B-Instruct](https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct) - **Parameters:** 48 Billion - **Fine-tuning:** QLoRA on attention layers - **Evaluation Framework:** LM Evaluation Harness ## Features 📊 **Model Evaluation** - LM Evaluation Harness integration - Multiple benchmark support (ARC-Challenge, TruthfulQA, Winogrande) - Automated testing and reporting - Results saved for analysis ⚡ **High-Performance** - Multi-GPU model loading - Optimized memory distribution - bfloat16 precision - Supports 48B parameter models ⚙️ **Easy to Use** - Simple Gradio interface - One-click model loading - Select benchmarks via checkboxes - Real-time progress updates ## Usage ### Quick Start **Option 1: Direct Evaluation (Recommended)** 1. Go directly to the "📊 Evaluation" tab 2. Select benchmarks to run (ARC-Challenge, TruthfulQA, Winogrande) 3. Click "🚀 Start Evaluation" 4. lm_eval will automatically load and evaluate the model 5. Wait 30-60 minutes for results 6. Results will be displayed and saved to `/tmp/eval_results_[timestamp]/` **Option 2: With Model Verification** 1. **(Optional)** Click "🚀 Load Model" in Controls tab to verify setup (5-10 min) 2. Go to the "📊 Evaluation" tab 3. Select benchmarks and click "🚀 Start Evaluation" 4. The pre-loaded model will be automatically unloaded to free VRAM 5. lm_eval will load its own fresh instance for evaluation 6. Wait 30-60 minutes for results **View Results** - Evaluation results include metrics for each benchmark - Results are automatically formatted and displayed - Full results JSON files are saved for detailed analysis ## Why LM Evaluation Harness? The LM Evaluation Harness is a standard framework for evaluating language models: - **Standardized:** Consistent benchmarks across models - **Comprehensive:** Wide variety of tasks and metrics - **Reproducible:** Deterministic evaluation results - **Trusted:** Used by major research organizations ## Hardware Requirements - **Recommended:** 4x NVIDIA L40S (192GB VRAM) - **Minimum:** 4x NVIDIA L4 (96GB VRAM) - **Model Size:** ~96GB in bfloat16 ### Memory Management This Space is optimized for limited VRAM (92GB across 4x L4): - **Direct Evaluation:** Skip model pre-loading and go straight to evaluation (recommended) - **Automatic Cleanup:** Any pre-loaded model is unloaded before evaluation starts - **Aggressive Memory Clearing:** Multiple garbage collection passes + 5s wait time - **Single Instance:** Only lm_eval's model instance runs during evaluation - **Batch Size:** Set to 1 to minimize memory usage during evaluation - **Device Mapping:** Automatic distribution across available GPUs - **Memory Fragmentation:** PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True set by default ## Technical Details ### Fine-tuning Configuration - **Method:** QLoRA - **LoRA Rank:** 16 - **LoRA Alpha:** 32 - **Target Modules:** q_proj, k_proj, v_proj, o_proj - **Training:** Attention layers only ### Benchmark Details **ARC-Challenge** - Advanced Reasoning Challenge - 1,172 multiple-choice science questions - Tests complex reasoning and knowledge - Metrics: accuracy, accuracy_norm **TruthfulQA** - Tests model's truthfulness - Multiple-choice format (mc2) - Evaluates factual correctness - Metrics: accuracy, bleu, rouge **Winogrande** - Common sense reasoning - Pronoun resolution tasks - 1,267 test questions - Metrics: accuracy ## Support & Resources - [LM Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) - [Model Page](https://huggingface.co/optiviseapp/kimi-linear-48b-a3b-instruct-fine-tune) - [Base Model Page](https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct) - [Transformers Documentation](https://huggingface.co/docs/transformers) --- **Powered by LM Evaluation Harness** 📊 | Built with ❤️