Spaces:

optiviseapp
/

fnmodel

Paused

App Files Files Community

aeb56 commited on 30 days ago

Commit

69cd0c5

1 Parent(s): 3e60f36

Disable chat/inference, focus on evaluation only

Browse files

Files changed (2) hide show

README.md +61 -66
app.py +17 -6

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
-title: Kimi 48B Fine-tuned - Inference
-emoji: 🚀
 colorFrom: purple
 colorTo: blue
 sdk: docker
@@ -10,9 +10,9 @@ app_port: 7860
 suggested_hardware: l4x4
 ---
-# 🚀 Kimi Linear 48B A3B Instruct - Fine-tuned
-High-performance inference Space for the fine-tuned Kimi-Linear-48B-A3B-Instruct model, powered by **vLLM**.
 ## Model Information
@@ -20,53 +20,57 @@ High-performance inference Space for the fine-tuned Kimi-Linear-48B-A3B-Instruct
 - **Base Model:** [moonshotai/Kimi-Linear-48B-A3B-Instruct](https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct)
 - **Parameters:** 48 Billion
 - **Fine-tuning:** QLoRA on attention layers
-- **Inference Engine:** vLLM
 ## Features
-⚡ **High-Performance Inference**
-- Powered by vLLM for maximum throughput
-- Optimized memory usage with PagedAttention
-- Multi-GPU support (automatic)
-💬 **Professional Chat Interface**
-- Clean Gradio UI
-- Real-time responses
-- Chat history
-- Copy button for responses
-⚙️ **Configurable Generation**
-- Temperature control
-- Top-P sampling
-- Max tokens setting
-- System prompt support
 ## Usage
 ### Quick Start
-1. **Start vLLM Server**
-   - Click "🚀 Start vLLM Server" button
-   - Wait 2-5 minutes for initialization
-   - Look for "✅ Server started successfully"
-2. **Chat**
-   - Type your message
-   - Click "Send" or press Enter
-   - Get fast, high-quality responses
-3. **Customize**
-   - Set a system prompt (optional)
-   - Adjust temperature for creativity
-   - Modify max tokens for response length
-## Why vLLM?
-vLLM is a high-throughput and memory-efficient inference engine:
-- **Faster:** Optimized CUDA kernels
-- **Efficient:** PagedAttention for KV cache
-- **Scalable:** Multi-GPU support
-- **Compatible:** OpenAI API format
 ## Hardware Requirements
@@ -83,42 +87,33 @@ vLLM is a high-throughput and memory-efficient inference engine:
 - **Target Modules:** q_proj, k_proj, v_proj, o_proj
 - **Training:** Attention layers only
-### Generation Parameters
-**Temperature (0.0-2.0)**
-- 0.1-0.5: Focused, deterministic
-- 0.6-0.9: Balanced (recommended)
-- 1.0-2.0: Creative, diverse
-**Top P (0.0-1.0)**
-- Controls nucleus sampling
-- 0.9 recommended for most use cases
-**Max Tokens**
-- Maximum response length
-- 1024 default, up to 4096
-## API Access
-vLLM provides OpenAI-compatible API:
-```bash
-curl -X POST "http://localhost:8000/v1/chat/completions" \
-  -H "Content-Type: application/json" \
-  --data '{
-    "model": "optiviseapp/kimi-linear-48b-a3b-instruct-fine-tune",
-    "messages": [
-      {"role": "user", "content": "Hello!"}
-    ]
-  }'
-```
-## Support
-- [vLLM Documentation](https://docs.vllm.ai/)
 - [Model Page](https://huggingface.co/optiviseapp/kimi-linear-48b-a3b-instruct-fine-tune)
 - [Transformers Documentation](https://huggingface.co/docs/transformers)
 ---
-**Powered by vLLM** 🚀 | Built with ❤️

 ---
+title: Kimi 48B Fine-tuned - Evaluation
+emoji: 📊
 colorFrom: purple
 colorTo: blue
 sdk: docker
 suggested_hardware: l4x4
 ---
+# 📊 Kimi Linear 48B A3B Instruct - Evaluation
+Model evaluation Space for the fine-tuned Kimi-Linear-48B-A3B-Instruct model. **Chat/inference functionality is currently disabled** - this Space focuses on running benchmarks and evaluations only.
 ## Model Information
 - **Base Model:** [moonshotai/Kimi-Linear-48B-A3B-Instruct](https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct)
 - **Parameters:** 48 Billion
 - **Fine-tuning:** QLoRA on attention layers
+- **Evaluation Framework:** LM Evaluation Harness
 ## Features
+📊 **Model Evaluation**
+- LM Evaluation Harness integration
+- Multiple benchmark support (ARC-Challenge, TruthfulQA, Winogrande)
+- Automated testing and reporting
+- Results saved for analysis
+⚡ **High-Performance**
+- Multi-GPU model loading
+- Optimized memory distribution
+- bfloat16 precision
+- Supports 48B parameter models
+⚙️ **Easy to Use**
+- Simple Gradio interface
+- One-click model loading
+- Select benchmarks via checkboxes
+- Real-time progress updates
 ## Usage
 ### Quick Start
+1. **Load Model**
+   - Click "🚀 Load Model" button in the Controls tab
+   - Wait 5-10 minutes for model initialization
+   - Model will be distributed across available GPUs
+   - Look for "✅ Model loaded successfully"
+2. **Run Evaluation**
+   - Go to the "📊 Evaluation" tab
+   - Select benchmarks to run (ARC-Challenge, TruthfulQA, Winogrande)
+   - Click "🚀 Start Evaluation"
+   - Wait 30-60 minutes for results
+   - Results will be displayed and saved to `/tmp/eval_results_[timestamp]/`
+3. **View Results**
+   - Evaluation results include metrics for each benchmark
+   - Results are automatically formatted and displayed
+   - Full results JSON files are saved for detailed analysis
+## Why LM Evaluation Harness?
+The LM Evaluation Harness is a standard framework for evaluating language models:
+- **Standardized:** Consistent benchmarks across models
+- **Comprehensive:** Wide variety of tasks and metrics
+- **Reproducible:** Deterministic evaluation results
+- **Trusted:** Used by major research organizations
 ## Hardware Requirements
 - **Target Modules:** q_proj, k_proj, v_proj, o_proj
 - **Training:** Attention layers only
+### Benchmark Details
+**ARC-Challenge**
+- Advanced Reasoning Challenge
+- 1,172 multiple-choice science questions
+- Tests complex reasoning and knowledge
+- Metrics: accuracy, accuracy_norm
+**TruthfulQA**
+- Tests model's truthfulness
+- Multiple-choice format (mc2)
+- Evaluates factual correctness
+- Metrics: accuracy, bleu, rouge
+**Winogrande**
+- Common sense reasoning
+- Pronoun resolution tasks
+- 1,267 test questions
+- Metrics: accuracy
+## Support & Resources
+- [LM Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness)
 - [Model Page](https://huggingface.co/optiviseapp/kimi-linear-48b-a3b-instruct-fine-tune)
+- [Base Model Page](https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct)
 - [Transformers Documentation](https://huggingface.co/docs/transformers)
 ---
+**Powered by LM Evaluation Harness** 📊 | Built with ❤️

app.py CHANGED Viewed

@@ -65,7 +65,7 @@ class ChatBot:
             else:
                 device_info = ""
-            yield f"✅ **Model loaded successfully!**{device_info}\n\nYou can now use Chat or Evaluation tabs."
         except Exception as e:
             self.loaded = False
@@ -220,11 +220,13 @@ class ChatBot:
 bot = ChatBot()
 # UI with Tabs
-with gr.Blocks(theme=gr.themes.Soft(), title="Kimi 48B Fine-tuned") as demo:
     gr.Markdown("""
-    # 🚀 Kimi Linear 48B A3B - Fine-tuned
     **Model:** `optiviseapp/kimi-linear-48b-a3b-instruct-fine-tune`
     """)
     # Show GPU info
@@ -244,11 +246,14 @@ with gr.Blocks(theme=gr.themes.Soft(), title="Kimi 48B Fine-tuned") as demo:
             gr.Markdown("""
             ### ℹ️ Instructions
             1. **Click "Load Model"** - Takes 5-10 minutes
-            2. **Use Chat tab** - For conversations
-            3. **Use Evaluation tab** - To run benchmarks
             """)
-        # Tab 2: Chat
         with gr.Tab("💬 Chat"):
             with gr.Row():
                 with gr.Column(scale=1):
@@ -272,6 +277,7 @@ with gr.Blocks(theme=gr.themes.Soft(), title="Kimi 48B Fine-tuned") as demo:
                         send = gr.Button("Send", variant="primary", scale=1)
                     clear = gr.Button("Clear Chat")
         # Tab 3: Evaluation
         with gr.Tab("📊 Evaluation"):
@@ -319,6 +325,9 @@ with gr.Blocks(theme=gr.themes.Soft(), title="Kimi 48B Fine-tuned") as demo:
     # Events
     load_btn.click(bot.load_model, outputs=status)
     def respond(message, history, system, max_tok, temp, top):
         bot_message = bot.chat(message, history, system, max_tok, temp, top)
         history.append((message, bot_message))
@@ -327,7 +336,9 @@ with gr.Blocks(theme=gr.themes.Soft(), title="Kimi 48B Fine-tuned") as demo:
     msg.submit(respond, [msg, chatbot, system_prompt, max_tokens, temperature, top_p], [chatbot, msg])
     send.click(respond, [msg, chatbot, system_prompt, max_tokens, temperature, top_p], [chatbot, msg])
     clear.click(lambda: None, None, chatbot)
     eval_btn.click(bot.run_evaluation, inputs=tasks, outputs=eval_results)
 if __name__ == "__main__":

             else:
                 device_info = ""
+            yield f"✅ **Model loaded successfully!**{device_info}\n\nYou can now use the Evaluation tab."
         except Exception as e:
             self.loaded = False
 bot = ChatBot()
 # UI with Tabs
+with gr.Blocks(theme=gr.themes.Soft(), title="Kimi 48B Fine-tuned - Evaluation") as demo:
     gr.Markdown("""
+    # 📊 Kimi Linear 48B A3B - Evaluation
     **Model:** `optiviseapp/kimi-linear-48b-a3b-instruct-fine-tune`
+    **This Space is configured for model evaluation only. Chat/inference is disabled.**
     """)
     # Show GPU info
             gr.Markdown("""
             ### ℹ️ Instructions
             1. **Click "Load Model"** - Takes 5-10 minutes
+            2. **Use Evaluation tab** - To run benchmarks
+            **Note:** Chat/inference functionality is currently disabled. This Space focuses on model evaluation only.
             """)
+        # Tab 2: Chat - DISABLED
+        # Uncomment this section to re-enable chat functionality
+        """
         with gr.Tab("💬 Chat"):
             with gr.Row():
                 with gr.Column(scale=1):
                         send = gr.Button("Send", variant="primary", scale=1)
                     clear = gr.Button("Clear Chat")
+        """
         # Tab 3: Evaluation
         with gr.Tab("📊 Evaluation"):
     # Events
     load_btn.click(bot.load_model, outputs=status)
+    # Chat event handlers - DISABLED
+    # Uncomment these lines to re-enable chat functionality
+    """
     def respond(message, history, system, max_tok, temp, top):
         bot_message = bot.chat(message, history, system, max_tok, temp, top)
         history.append((message, bot_message))
     msg.submit(respond, [msg, chatbot, system_prompt, max_tokens, temperature, top_p], [chatbot, msg])
     send.click(respond, [msg, chatbot, system_prompt, max_tokens, temperature, top_p], [chatbot, msg])
     clear.click(lambda: None, None, chatbot)
+    """
+    # Evaluation event handler
     eval_btn.click(bot.run_evaluation, inputs=tasks, outputs=eval_results)
 if __name__ == "__main__":