aeb56 commited on
Commit
69cd0c5
Β·
1 Parent(s): 3e60f36

Disable chat/inference, focus on evaluation only

Browse files
Files changed (2) hide show
  1. README.md +61 -66
  2. app.py +17 -6
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
- title: Kimi 48B Fine-tuned - Inference
3
- emoji: πŸš€
4
  colorFrom: purple
5
  colorTo: blue
6
  sdk: docker
@@ -10,9 +10,9 @@ app_port: 7860
10
  suggested_hardware: l4x4
11
  ---
12
 
13
- # πŸš€ Kimi Linear 48B A3B Instruct - Fine-tuned
14
 
15
- High-performance inference Space for the fine-tuned Kimi-Linear-48B-A3B-Instruct model, powered by **vLLM**.
16
 
17
  ## Model Information
18
 
@@ -20,53 +20,57 @@ High-performance inference Space for the fine-tuned Kimi-Linear-48B-A3B-Instruct
20
  - **Base Model:** [moonshotai/Kimi-Linear-48B-A3B-Instruct](https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct)
21
  - **Parameters:** 48 Billion
22
  - **Fine-tuning:** QLoRA on attention layers
23
- - **Inference Engine:** vLLM
24
 
25
  ## Features
26
 
27
- ⚑ **High-Performance Inference**
28
- - Powered by vLLM for maximum throughput
29
- - Optimized memory usage with PagedAttention
30
- - Multi-GPU support (automatic)
 
31
 
32
- πŸ’¬ **Professional Chat Interface**
33
- - Clean Gradio UI
34
- - Real-time responses
35
- - Chat history
36
- - Copy button for responses
37
 
38
- βš™οΈ **Configurable Generation**
39
- - Temperature control
40
- - Top-P sampling
41
- - Max tokens setting
42
- - System prompt support
43
 
44
  ## Usage
45
 
46
  ### Quick Start
47
 
48
- 1. **Start vLLM Server**
49
- - Click "πŸš€ Start vLLM Server" button
50
- - Wait 2-5 minutes for initialization
51
- - Look for "βœ… Server started successfully"
 
52
 
53
- 2. **Chat**
54
- - Type your message
55
- - Click "Send" or press Enter
56
- - Get fast, high-quality responses
 
 
57
 
58
- 3. **Customize**
59
- - Set a system prompt (optional)
60
- - Adjust temperature for creativity
61
- - Modify max tokens for response length
62
 
63
- ## Why vLLM?
64
 
65
- vLLM is a high-throughput and memory-efficient inference engine:
66
- - **Faster:** Optimized CUDA kernels
67
- - **Efficient:** PagedAttention for KV cache
68
- - **Scalable:** Multi-GPU support
69
- - **Compatible:** OpenAI API format
70
 
71
  ## Hardware Requirements
72
 
@@ -83,42 +87,33 @@ vLLM is a high-throughput and memory-efficient inference engine:
83
  - **Target Modules:** q_proj, k_proj, v_proj, o_proj
84
  - **Training:** Attention layers only
85
 
86
- ### Generation Parameters
87
 
88
- **Temperature (0.0-2.0)**
89
- - 0.1-0.5: Focused, deterministic
90
- - 0.6-0.9: Balanced (recommended)
91
- - 1.0-2.0: Creative, diverse
 
92
 
93
- **Top P (0.0-1.0)**
94
- - Controls nucleus sampling
95
- - 0.9 recommended for most use cases
 
 
96
 
97
- **Max Tokens**
98
- - Maximum response length
99
- - 1024 default, up to 4096
 
 
100
 
101
- ## API Access
102
 
103
- vLLM provides OpenAI-compatible API:
104
-
105
- ```bash
106
- curl -X POST "http://localhost:8000/v1/chat/completions" \
107
- -H "Content-Type: application/json" \
108
- --data '{
109
- "model": "optiviseapp/kimi-linear-48b-a3b-instruct-fine-tune",
110
- "messages": [
111
- {"role": "user", "content": "Hello!"}
112
- ]
113
- }'
114
- ```
115
-
116
- ## Support
117
-
118
- - [vLLM Documentation](https://docs.vllm.ai/)
119
  - [Model Page](https://huggingface.co/optiviseapp/kimi-linear-48b-a3b-instruct-fine-tune)
 
120
  - [Transformers Documentation](https://huggingface.co/docs/transformers)
121
 
122
  ---
123
 
124
- **Powered by vLLM** πŸš€ | Built with ❀️
 
1
  ---
2
+ title: Kimi 48B Fine-tuned - Evaluation
3
+ emoji: πŸ“Š
4
  colorFrom: purple
5
  colorTo: blue
6
  sdk: docker
 
10
  suggested_hardware: l4x4
11
  ---
12
 
13
+ # πŸ“Š Kimi Linear 48B A3B Instruct - Evaluation
14
 
15
+ Model evaluation Space for the fine-tuned Kimi-Linear-48B-A3B-Instruct model. **Chat/inference functionality is currently disabled** - this Space focuses on running benchmarks and evaluations only.
16
 
17
  ## Model Information
18
 
 
20
  - **Base Model:** [moonshotai/Kimi-Linear-48B-A3B-Instruct](https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct)
21
  - **Parameters:** 48 Billion
22
  - **Fine-tuning:** QLoRA on attention layers
23
+ - **Evaluation Framework:** LM Evaluation Harness
24
 
25
  ## Features
26
 
27
+ πŸ“Š **Model Evaluation**
28
+ - LM Evaluation Harness integration
29
+ - Multiple benchmark support (ARC-Challenge, TruthfulQA, Winogrande)
30
+ - Automated testing and reporting
31
+ - Results saved for analysis
32
 
33
+ ⚑ **High-Performance**
34
+ - Multi-GPU model loading
35
+ - Optimized memory distribution
36
+ - bfloat16 precision
37
+ - Supports 48B parameter models
38
 
39
+ βš™οΈ **Easy to Use**
40
+ - Simple Gradio interface
41
+ - One-click model loading
42
+ - Select benchmarks via checkboxes
43
+ - Real-time progress updates
44
 
45
  ## Usage
46
 
47
  ### Quick Start
48
 
49
+ 1. **Load Model**
50
+ - Click "πŸš€ Load Model" button in the Controls tab
51
+ - Wait 5-10 minutes for model initialization
52
+ - Model will be distributed across available GPUs
53
+ - Look for "βœ… Model loaded successfully"
54
 
55
+ 2. **Run Evaluation**
56
+ - Go to the "πŸ“Š Evaluation" tab
57
+ - Select benchmarks to run (ARC-Challenge, TruthfulQA, Winogrande)
58
+ - Click "πŸš€ Start Evaluation"
59
+ - Wait 30-60 minutes for results
60
+ - Results will be displayed and saved to `/tmp/eval_results_[timestamp]/`
61
 
62
+ 3. **View Results**
63
+ - Evaluation results include metrics for each benchmark
64
+ - Results are automatically formatted and displayed
65
+ - Full results JSON files are saved for detailed analysis
66
 
67
+ ## Why LM Evaluation Harness?
68
 
69
+ The LM Evaluation Harness is a standard framework for evaluating language models:
70
+ - **Standardized:** Consistent benchmarks across models
71
+ - **Comprehensive:** Wide variety of tasks and metrics
72
+ - **Reproducible:** Deterministic evaluation results
73
+ - **Trusted:** Used by major research organizations
74
 
75
  ## Hardware Requirements
76
 
 
87
  - **Target Modules:** q_proj, k_proj, v_proj, o_proj
88
  - **Training:** Attention layers only
89
 
90
+ ### Benchmark Details
91
 
92
+ **ARC-Challenge**
93
+ - Advanced Reasoning Challenge
94
+ - 1,172 multiple-choice science questions
95
+ - Tests complex reasoning and knowledge
96
+ - Metrics: accuracy, accuracy_norm
97
 
98
+ **TruthfulQA**
99
+ - Tests model's truthfulness
100
+ - Multiple-choice format (mc2)
101
+ - Evaluates factual correctness
102
+ - Metrics: accuracy, bleu, rouge
103
 
104
+ **Winogrande**
105
+ - Common sense reasoning
106
+ - Pronoun resolution tasks
107
+ - 1,267 test questions
108
+ - Metrics: accuracy
109
 
110
+ ## Support & Resources
111
 
112
+ - [LM Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
113
  - [Model Page](https://huggingface.co/optiviseapp/kimi-linear-48b-a3b-instruct-fine-tune)
114
+ - [Base Model Page](https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct)
115
  - [Transformers Documentation](https://huggingface.co/docs/transformers)
116
 
117
  ---
118
 
119
+ **Powered by LM Evaluation Harness** πŸ“Š | Built with ❀️
app.py CHANGED
@@ -65,7 +65,7 @@ class ChatBot:
65
  else:
66
  device_info = ""
67
 
68
- yield f"βœ… **Model loaded successfully!**{device_info}\n\nYou can now use Chat or Evaluation tabs."
69
 
70
  except Exception as e:
71
  self.loaded = False
@@ -220,11 +220,13 @@ class ChatBot:
220
  bot = ChatBot()
221
 
222
  # UI with Tabs
223
- with gr.Blocks(theme=gr.themes.Soft(), title="Kimi 48B Fine-tuned") as demo:
224
  gr.Markdown("""
225
- # πŸš€ Kimi Linear 48B A3B - Fine-tuned
226
 
227
  **Model:** `optiviseapp/kimi-linear-48b-a3b-instruct-fine-tune`
 
 
228
  """)
229
 
230
  # Show GPU info
@@ -244,11 +246,14 @@ with gr.Blocks(theme=gr.themes.Soft(), title="Kimi 48B Fine-tuned") as demo:
244
  gr.Markdown("""
245
  ### ℹ️ Instructions
246
  1. **Click "Load Model"** - Takes 5-10 minutes
247
- 2. **Use Chat tab** - For conversations
248
- 3. **Use Evaluation tab** - To run benchmarks
 
249
  """)
250
 
251
- # Tab 2: Chat
 
 
252
  with gr.Tab("πŸ’¬ Chat"):
253
  with gr.Row():
254
  with gr.Column(scale=1):
@@ -272,6 +277,7 @@ with gr.Blocks(theme=gr.themes.Soft(), title="Kimi 48B Fine-tuned") as demo:
272
  send = gr.Button("Send", variant="primary", scale=1)
273
 
274
  clear = gr.Button("Clear Chat")
 
275
 
276
  # Tab 3: Evaluation
277
  with gr.Tab("πŸ“Š Evaluation"):
@@ -319,6 +325,9 @@ with gr.Blocks(theme=gr.themes.Soft(), title="Kimi 48B Fine-tuned") as demo:
319
  # Events
320
  load_btn.click(bot.load_model, outputs=status)
321
 
 
 
 
322
  def respond(message, history, system, max_tok, temp, top):
323
  bot_message = bot.chat(message, history, system, max_tok, temp, top)
324
  history.append((message, bot_message))
@@ -327,7 +336,9 @@ with gr.Blocks(theme=gr.themes.Soft(), title="Kimi 48B Fine-tuned") as demo:
327
  msg.submit(respond, [msg, chatbot, system_prompt, max_tokens, temperature, top_p], [chatbot, msg])
328
  send.click(respond, [msg, chatbot, system_prompt, max_tokens, temperature, top_p], [chatbot, msg])
329
  clear.click(lambda: None, None, chatbot)
 
330
 
 
331
  eval_btn.click(bot.run_evaluation, inputs=tasks, outputs=eval_results)
332
 
333
  if __name__ == "__main__":
 
65
  else:
66
  device_info = ""
67
 
68
+ yield f"βœ… **Model loaded successfully!**{device_info}\n\nYou can now use the Evaluation tab."
69
 
70
  except Exception as e:
71
  self.loaded = False
 
220
  bot = ChatBot()
221
 
222
  # UI with Tabs
223
+ with gr.Blocks(theme=gr.themes.Soft(), title="Kimi 48B Fine-tuned - Evaluation") as demo:
224
  gr.Markdown("""
225
+ # πŸ“Š Kimi Linear 48B A3B - Evaluation
226
 
227
  **Model:** `optiviseapp/kimi-linear-48b-a3b-instruct-fine-tune`
228
+
229
+ **This Space is configured for model evaluation only. Chat/inference is disabled.**
230
  """)
231
 
232
  # Show GPU info
 
246
  gr.Markdown("""
247
  ### ℹ️ Instructions
248
  1. **Click "Load Model"** - Takes 5-10 minutes
249
+ 2. **Use Evaluation tab** - To run benchmarks
250
+
251
+ **Note:** Chat/inference functionality is currently disabled. This Space focuses on model evaluation only.
252
  """)
253
 
254
+ # Tab 2: Chat - DISABLED
255
+ # Uncomment this section to re-enable chat functionality
256
+ """
257
  with gr.Tab("πŸ’¬ Chat"):
258
  with gr.Row():
259
  with gr.Column(scale=1):
 
277
  send = gr.Button("Send", variant="primary", scale=1)
278
 
279
  clear = gr.Button("Clear Chat")
280
+ """
281
 
282
  # Tab 3: Evaluation
283
  with gr.Tab("πŸ“Š Evaluation"):
 
325
  # Events
326
  load_btn.click(bot.load_model, outputs=status)
327
 
328
+ # Chat event handlers - DISABLED
329
+ # Uncomment these lines to re-enable chat functionality
330
+ """
331
  def respond(message, history, system, max_tok, temp, top):
332
  bot_message = bot.chat(message, history, system, max_tok, temp, top)
333
  history.append((message, bot_message))
 
336
  msg.submit(respond, [msg, chatbot, system_prompt, max_tokens, temperature, top_p], [chatbot, msg])
337
  send.click(respond, [msg, chatbot, system_prompt, max_tokens, temperature, top_p], [chatbot, msg])
338
  clear.click(lambda: None, None, chatbot)
339
+ """
340
 
341
+ # Evaluation event handler
342
  eval_btn.click(bot.run_evaluation, inputs=tasks, outputs=eval_results)
343
 
344
  if __name__ == "__main__":