ASTERIZER commited on
Commit
b7c73ba
·
verified ·
1 Parent(s): 6d3cf92

Upload Base/Datasets/rag_mcp_sft/BUILD_REPORT.md with huggingface_hub

Browse files
Base/Datasets/rag_mcp_sft/BUILD_REPORT.md ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # RAG + MCP SFT Build Report
2
+
3
+ - Retrieved on: 2026-04-03
4
+ - Target tokens: 10,000,000
5
+ - Realized tokens: 10,000,168
6
+ - Train samples: 60,647
7
+ - Val samples: 1,237
8
+ - Total samples: 61,884
9
+ - Average formatted tokens per sample: 161.6
10
+ - Max window enforced: 1024 tokens
11
+
12
+ ## Breakdown by kind
13
+
14
+ - checklist: 9,050
15
+ - clarification: 6,162
16
+ - comparison: 9,368
17
+ - description: 12,424
18
+ - qna: 15,639
19
+ - scenario: 9,241
20
+
21
+ ## Breakdown by topic
22
+
23
+ - Bridge: 6,252
24
+ - Bridge+Bridge: 96
25
+ - Bridge+MCP: 592
26
+ - Bridge+RAG: 447
27
+ - MCP: 25,242
28
+ - MCP+Bridge: 561
29
+ - MCP+MCP: 2,087
30
+ - MCP+RAG: 1,824
31
+ - RAG: 21,022
32
+ - RAG+Bridge: 467
33
+ - RAG+MCP: 1,887
34
+ - RAG+RAG: 1,407
35
+
36
+ ## Files
37
+
38
+ - train.json
39
+ - val.json
40
+ - all.json
41
+ - sample_preview.json
42
+ - source_manifest.json