Nekochu commited on
Commit
c17ef80
·
verified ·
1 Parent(s): 0248d77

validate using anthropic skill-creator

Browse files
mcp-builder/LICENSE.txt ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ Apache License
3
+ Version 2.0, January 2004
4
+ http://www.apache.org/licenses/
5
+
6
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
7
+
8
+ 1. Definitions.
9
+
10
+ "License" shall mean the terms and conditions for use, reproduction,
11
+ and distribution as defined by Sections 1 through 9 of this document.
12
+
13
+ "Licensor" shall mean the copyright owner or entity authorized by
14
+ the copyright owner that is granting the License.
15
+
16
+ "Legal Entity" shall mean the union of the acting entity and all
17
+ other entities that control, are controlled by, or are under common
18
+ control with that entity. For the purposes of this definition,
19
+ "control" means (i) the power, direct or indirect, to cause the
20
+ direction or management of such entity, whether by contract or
21
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
22
+ outstanding shares, or (iii) beneficial ownership of such entity.
23
+
24
+ "You" (or "Your") shall mean an individual or Legal Entity
25
+ exercising permissions granted by this License.
26
+
27
+ "Source" form shall mean the preferred form for making modifications,
28
+ including but not limited to software source code, documentation
29
+ source, and configuration files.
30
+
31
+ "Object" form shall mean any form resulting from mechanical
32
+ transformation or translation of a Source form, including but
33
+ not limited to compiled object code, generated documentation,
34
+ and conversions to other media types.
35
+
36
+ "Work" shall mean the work of authorship, whether in Source or
37
+ Object form, made available under the License, as indicated by a
38
+ copyright notice that is included in or attached to the work
39
+ (an example is provided in the Appendix below).
40
+
41
+ "Derivative Works" shall mean any work, whether in Source or Object
42
+ form, that is based on (or derived from) the Work and for which the
43
+ editorial revisions, annotations, elaborations, or other modifications
44
+ represent, as a whole, an original work of authorship. For the purposes
45
+ of this License, Derivative Works shall not include works that remain
46
+ separable from, or merely link (or bind by name) to the interfaces of,
47
+ the Work and Derivative Works thereof.
48
+
49
+ "Contribution" shall mean any work of authorship, including
50
+ the original version of the Work and any modifications or additions
51
+ to that Work or Derivative Works thereof, that is intentionally
52
+ submitted to Licensor for inclusion in the Work by the copyright owner
53
+ or by an individual or Legal Entity authorized to submit on behalf of
54
+ the copyright owner. For the purposes of this definition, "submitted"
55
+ means any form of electronic, verbal, or written communication sent
56
+ to the Licensor or its representatives, including but not limited to
57
+ communication on electronic mailing lists, source code control systems,
58
+ and issue tracking systems that are managed by, or on behalf of, the
59
+ Licensor for the purpose of discussing and improving the Work, but
60
+ excluding communication that is conspicuously marked or otherwise
61
+ designated in writing by the copyright owner as "Not a Contribution."
62
+
63
+ "Contributor" shall mean Licensor and any individual or Legal Entity
64
+ on behalf of whom a Contribution has been received by Licensor and
65
+ subsequently incorporated within the Work.
66
+
67
+ 2. Grant of Copyright License. Subject to the terms and conditions of
68
+ this License, each Contributor hereby grants to You a perpetual,
69
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
70
+ copyright license to reproduce, prepare Derivative Works of,
71
+ publicly display, publicly perform, sublicense, and distribute the
72
+ Work and such Derivative Works in Source or Object form.
73
+
74
+ 3. Grant of Patent License. Subject to the terms and conditions of
75
+ this License, each Contributor hereby grants to You a perpetual,
76
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
77
+ (except as stated in this section) patent license to make, have made,
78
+ use, offer to sell, sell, import, and otherwise transfer the Work,
79
+ where such license applies only to those patent claims licensable
80
+ by such Contributor that are necessarily infringed by their
81
+ Contribution(s) alone or by combination of their Contribution(s)
82
+ with the Work to which such Contribution(s) was submitted. If You
83
+ institute patent litigation against any entity (including a
84
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
85
+ or a Contribution incorporated within the Work constitutes direct
86
+ or contributory patent infringement, then any patent licenses
87
+ granted to You under this License for that Work shall terminate
88
+ as of the date such litigation is filed.
89
+
90
+ 4. Redistribution. You may reproduce and distribute copies of the
91
+ Work or Derivative Works thereof in any medium, with or without
92
+ modifications, and in Source or Object form, provided that You
93
+ meet the following conditions:
94
+
95
+ (a) You must give any other recipients of the Work or
96
+ Derivative Works a copy of this License; and
97
+
98
+ (b) You must cause any modified files to carry prominent notices
99
+ stating that You changed the files; and
100
+
101
+ (c) You must retain, in the Source form of any Derivative Works
102
+ that You distribute, all copyright, patent, trademark, and
103
+ attribution notices from the Source form of the Work,
104
+ excluding those notices that do not pertain to any part of
105
+ the Derivative Works; and
106
+
107
+ (d) If the Work includes a "NOTICE" text file as part of its
108
+ distribution, then any Derivative Works that You distribute must
109
+ include a readable copy of the attribution notices contained
110
+ within such NOTICE file, excluding those notices that do not
111
+ pertain to any part of the Derivative Works, in at least one
112
+ of the following places: within a NOTICE text file distributed
113
+ as part of the Derivative Works; within the Source form or
114
+ documentation, if provided along with the Derivative Works; or,
115
+ within a display generated by the Derivative Works, if and
116
+ wherever such third-party notices normally appear. The contents
117
+ of the NOTICE file are for informational purposes only and
118
+ do not modify the License. You may add Your own attribution
119
+ notices within Derivative Works that You distribute, alongside
120
+ or as an addendum to the NOTICE text from the Work, provided
121
+ that such additional attribution notices cannot be construed
122
+ as modifying the License.
123
+
124
+ You may add Your own copyright statement to Your modifications and
125
+ may provide additional or different license terms and conditions
126
+ for use, reproduction, or distribution of Your modifications, or
127
+ for any such Derivative Works as a whole, provided Your use,
128
+ reproduction, and distribution of the Work otherwise complies with
129
+ the conditions stated in this License.
130
+
131
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
132
+ any Contribution intentionally submitted for inclusion in the Work
133
+ by You to the Licensor shall be under the terms and conditions of
134
+ this License, without any additional terms or conditions.
135
+ Notwithstanding the above, nothing herein shall supersede or modify
136
+ the terms of any separate license agreement you may have executed
137
+ with Licensor regarding such Contributions.
138
+
139
+ 6. Trademarks. This License does not grant permission to use the trade
140
+ names, trademarks, service marks, or product names of the Licensor,
141
+ except as required for reasonable and customary use in describing the
142
+ origin of the Work and reproducing the content of the NOTICE file.
143
+
144
+ 7. Disclaimer of Warranty. Unless required by applicable law or
145
+ agreed to in writing, Licensor provides the Work (and each
146
+ Contributor provides its Contributions) on an "AS IS" BASIS,
147
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
148
+ implied, including, without limitation, any warranties or conditions
149
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
150
+ PARTICULAR PURPOSE. You are solely responsible for determining the
151
+ appropriateness of using or redistributing the Work and assume any
152
+ risks associated with Your exercise of permissions under this License.
153
+
154
+ 8. Limitation of Liability. In no event and under no legal theory,
155
+ whether in tort (including negligence), contract, or otherwise,
156
+ unless required by applicable law (such as deliberate and grossly
157
+ negligent acts) or agreed to in writing, shall any Contributor be
158
+ liable to You for damages, including any direct, indirect, special,
159
+ incidental, or consequential damages of any character arising as a
160
+ result of this License or out of the use or inability to use the
161
+ Work (including but not limited to damages for loss of goodwill,
162
+ work stoppage, computer failure or malfunction, or any and all
163
+ other commercial damages or losses), even if such Contributor
164
+ has been advised of the possibility of such damages.
165
+
166
+ 9. Accepting Warranty or Additional Liability. While redistributing
167
+ the Work or Derivative Works thereof, You may choose to offer,
168
+ and charge a fee for, acceptance of support, warranty, indemnity,
169
+ or other liability obligations and/or rights consistent with this
170
+ License. However, in accepting such obligations, You may act only
171
+ on Your own behalf and on Your sole responsibility, not on behalf
172
+ of any other Contributor, and only if You agree to indemnify,
173
+ defend, and hold each Contributor harmless for any liability
174
+ incurred by, or claims asserted against, such Contributor by reason
175
+ of your accepting any such warranty or additional liability.
176
+
177
+ END OF TERMS AND CONDITIONS
178
+
179
+ APPENDIX: How to apply the Apache License to your work.
180
+
181
+ To apply the Apache License to your work, attach the following
182
+ boilerplate notice, with the fields enclosed by brackets "[]"
183
+ replaced with your own identifying information. (Don't include
184
+ the brackets!) The text should be enclosed in the appropriate
185
+ comment syntax for the file format. We also recommend that a
186
+ file or class name and description of purpose be included on the
187
+ same "printed page" as the copyright notice for easier
188
+ identification within third-party archives.
189
+
190
+ Copyright [yyyy] [name of copyright owner]
191
+
192
+ Licensed under the Apache License, Version 2.0 (the "License");
193
+ you may not use this file except in compliance with the License.
194
+ You may obtain a copy of the License at
195
+
196
+ http://www.apache.org/licenses/LICENSE-2.0
197
+
198
+ Unless required by applicable law or agreed to in writing, software
199
+ distributed under the License is distributed on an "AS IS" BASIS,
200
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
201
+ See the License for the specific language governing permissions and
202
+ limitations under the License.
mcp-builder/SKILL.md ADDED
@@ -0,0 +1,236 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: mcp-builder
3
+ description: Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. Use when building MCP servers to integrate external APIs or services, whether in Python (FastMCP) or Node/TypeScript (MCP SDK).
4
+ license: Complete terms in LICENSE.txt
5
+ ---
6
+
7
+ # MCP Server Development Guide
8
+
9
+ ## Overview
10
+
11
+ Create MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. The quality of an MCP server is measured by how well it enables LLMs to accomplish real-world tasks.
12
+
13
+ ---
14
+
15
+ # Process
16
+
17
+ ## 🚀 High-Level Workflow
18
+
19
+ Creating a high-quality MCP server involves four main phases:
20
+
21
+ ### Phase 1: Deep Research and Planning
22
+
23
+ #### 1.1 Understand Modern MCP Design
24
+
25
+ **API Coverage vs. Workflow Tools:**
26
+ Balance comprehensive API endpoint coverage with specialized workflow tools. Workflow tools can be more convenient for specific tasks, while comprehensive coverage gives agents flexibility to compose operations. Performance varies by client—some clients benefit from code execution that combines basic tools, while others work better with higher-level workflows. When uncertain, prioritize comprehensive API coverage.
27
+
28
+ **Tool Naming and Discoverability:**
29
+ Clear, descriptive tool names help agents find the right tools quickly. Use consistent prefixes (e.g., `github_create_issue`, `github_list_repos`) and action-oriented naming.
30
+
31
+ **Context Management:**
32
+ Agents benefit from concise tool descriptions and the ability to filter/paginate results. Design tools that return focused, relevant data. Some clients support code execution which can help agents filter and process data efficiently.
33
+
34
+ **Actionable Error Messages:**
35
+ Error messages should guide agents toward solutions with specific suggestions and next steps.
36
+
37
+ #### 1.2 Study MCP Protocol Documentation
38
+
39
+ **Navigate the MCP specification:**
40
+
41
+ Start with the sitemap to find relevant pages: `https://modelcontextprotocol.io/sitemap.xml`
42
+
43
+ Then fetch specific pages with `.md` suffix for markdown format (e.g., `https://modelcontextprotocol.io/specification/draft.md`).
44
+
45
+ Key pages to review:
46
+ - Specification overview and architecture
47
+ - Transport mechanisms (streamable HTTP, stdio)
48
+ - Tool, resource, and prompt definitions
49
+
50
+ #### 1.3 Study Framework Documentation
51
+
52
+ **Recommended stack:**
53
+ - **Language**: TypeScript (high-quality SDK support and good compatibility in many execution environments e.g. MCPB. Plus AI models are good at generating TypeScript code, benefiting from its broad usage, static typing and good linting tools)
54
+ - **Transport**: Streamable HTTP for remote servers, using stateless JSON (simpler to scale and maintain, as opposed to stateful sessions and streaming responses). stdio for local servers.
55
+
56
+ **Load framework documentation:**
57
+
58
+ - **MCP Best Practices**: [📋 View Best Practices](./reference/mcp_best_practices.md) - Core guidelines
59
+
60
+ **For TypeScript (recommended):**
61
+ - **TypeScript SDK**: Use WebFetch to load `https://raw.githubusercontent.com/modelcontextprotocol/typescript-sdk/main/README.md`
62
+ - [⚡ TypeScript Guide](./reference/node_mcp_server.md) - TypeScript patterns and examples
63
+
64
+ **For Python:**
65
+ - **Python SDK**: Use WebFetch to load `https://raw.githubusercontent.com/modelcontextprotocol/python-sdk/main/README.md`
66
+ - [🐍 Python Guide](./reference/python_mcp_server.md) - Python patterns and examples
67
+
68
+ #### 1.4 Plan Your Implementation
69
+
70
+ **Understand the API:**
71
+ Review the service's API documentation to identify key endpoints, authentication requirements, and data models. Use web search and WebFetch as needed.
72
+
73
+ **Tool Selection:**
74
+ Prioritize comprehensive API coverage. List endpoints to implement, starting with the most common operations.
75
+
76
+ ---
77
+
78
+ ### Phase 2: Implementation
79
+
80
+ #### 2.1 Set Up Project Structure
81
+
82
+ See language-specific guides for project setup:
83
+ - [⚡ TypeScript Guide](./reference/node_mcp_server.md) - Project structure, package.json, tsconfig.json
84
+ - [🐍 Python Guide](./reference/python_mcp_server.md) - Module organization, dependencies
85
+
86
+ #### 2.2 Implement Core Infrastructure
87
+
88
+ Create shared utilities:
89
+ - API client with authentication
90
+ - Error handling helpers
91
+ - Response formatting (JSON/Markdown)
92
+ - Pagination support
93
+
94
+ #### 2.3 Implement Tools
95
+
96
+ For each tool:
97
+
98
+ **Input Schema:**
99
+ - Use Zod (TypeScript) or Pydantic (Python)
100
+ - Include constraints and clear descriptions
101
+ - Add examples in field descriptions
102
+
103
+ **Output Schema:**
104
+ - Define `outputSchema` where possible for structured data
105
+ - Use `structuredContent` in tool responses (TypeScript SDK feature)
106
+ - Helps clients understand and process tool outputs
107
+
108
+ **Tool Description:**
109
+ - Concise summary of functionality
110
+ - Parameter descriptions
111
+ - Return type schema
112
+
113
+ **Implementation:**
114
+ - Async/await for I/O operations
115
+ - Proper error handling with actionable messages
116
+ - Support pagination where applicable
117
+ - Return both text content and structured data when using modern SDKs
118
+
119
+ **Annotations:**
120
+ - `readOnlyHint`: true/false
121
+ - `destructiveHint`: true/false
122
+ - `idempotentHint`: true/false
123
+ - `openWorldHint`: true/false
124
+
125
+ ---
126
+
127
+ ### Phase 3: Review and Test
128
+
129
+ #### 3.1 Code Quality
130
+
131
+ Review for:
132
+ - No duplicated code (DRY principle)
133
+ - Consistent error handling
134
+ - Full type coverage
135
+ - Clear tool descriptions
136
+
137
+ #### 3.2 Build and Test
138
+
139
+ **TypeScript:**
140
+ - Run `npm run build` to verify compilation
141
+ - Test with MCP Inspector: `npx @modelcontextprotocol/inspector`
142
+
143
+ **Python:**
144
+ - Verify syntax: `python -m py_compile your_server.py`
145
+ - Test with MCP Inspector
146
+
147
+ See language-specific guides for detailed testing approaches and quality checklists.
148
+
149
+ ---
150
+
151
+ ### Phase 4: Create Evaluations
152
+
153
+ After implementing your MCP server, create comprehensive evaluations to test its effectiveness.
154
+
155
+ **Load [✅ Evaluation Guide](./reference/evaluation.md) for complete evaluation guidelines.**
156
+
157
+ #### 4.1 Understand Evaluation Purpose
158
+
159
+ Use evaluations to test whether LLMs can effectively use your MCP server to answer realistic, complex questions.
160
+
161
+ #### 4.2 Create 10 Evaluation Questions
162
+
163
+ To create effective evaluations, follow the process outlined in the evaluation guide:
164
+
165
+ 1. **Tool Inspection**: List available tools and understand their capabilities
166
+ 2. **Content Exploration**: Use READ-ONLY operations to explore available data
167
+ 3. **Question Generation**: Create 10 complex, realistic questions
168
+ 4. **Answer Verification**: Solve each question yourself to verify answers
169
+
170
+ #### 4.3 Evaluation Requirements
171
+
172
+ Ensure each question is:
173
+ - **Independent**: Not dependent on other questions
174
+ - **Read-only**: Only non-destructive operations required
175
+ - **Complex**: Requiring multiple tool calls and deep exploration
176
+ - **Realistic**: Based on real use cases humans would care about
177
+ - **Verifiable**: Single, clear answer that can be verified by string comparison
178
+ - **Stable**: Answer won't change over time
179
+
180
+ #### 4.4 Output Format
181
+
182
+ Create an XML file with this structure:
183
+
184
+ ```xml
185
+ <evaluation>
186
+ <qa_pair>
187
+ <question>Find discussions about AI model launches with animal codenames. One model needed a specific safety designation that uses the format ASL-X. What number X was being determined for the model named after a spotted wild cat?</question>
188
+ <answer>3</answer>
189
+ </qa_pair>
190
+ <!-- More qa_pairs... -->
191
+ </evaluation>
192
+ ```
193
+
194
+ ---
195
+
196
+ # Reference Files
197
+
198
+ ## 📚 Documentation Library
199
+
200
+ Load these resources as needed during development:
201
+
202
+ ### Core MCP Documentation (Load First)
203
+ - **MCP Protocol**: Start with sitemap at `https://modelcontextprotocol.io/sitemap.xml`, then fetch specific pages with `.md` suffix
204
+ - [📋 MCP Best Practices](./reference/mcp_best_practices.md) - Universal MCP guidelines including:
205
+ - Server and tool naming conventions
206
+ - Response format guidelines (JSON vs Markdown)
207
+ - Pagination best practices
208
+ - Transport selection (streamable HTTP vs stdio)
209
+ - Security and error handling standards
210
+
211
+ ### SDK Documentation (Load During Phase 1/2)
212
+ - **Python SDK**: Fetch from `https://raw.githubusercontent.com/modelcontextprotocol/python-sdk/main/README.md`
213
+ - **TypeScript SDK**: Fetch from `https://raw.githubusercontent.com/modelcontextprotocol/typescript-sdk/main/README.md`
214
+
215
+ ### Language-Specific Implementation Guides (Load During Phase 2)
216
+ - [🐍 Python Implementation Guide](./reference/python_mcp_server.md) - Complete Python/FastMCP guide with:
217
+ - Server initialization patterns
218
+ - Pydantic model examples
219
+ - Tool registration with `@mcp.tool`
220
+ - Complete working examples
221
+ - Quality checklist
222
+
223
+ - [⚡ TypeScript Implementation Guide](./reference/node_mcp_server.md) - Complete TypeScript guide with:
224
+ - Project structure
225
+ - Zod schema patterns
226
+ - Tool registration with `server.registerTool`
227
+ - Complete working examples
228
+ - Quality checklist
229
+
230
+ ### Evaluation Guide (Load During Phase 4)
231
+ - [✅ Evaluation Guide](./reference/evaluation.md) - Complete evaluation creation guide with:
232
+ - Question creation guidelines
233
+ - Answer verification strategies
234
+ - XML format specifications
235
+ - Example questions and answers
236
+ - Running an evaluation with the provided scripts
mcp-builder/reference/evaluation.md ADDED
@@ -0,0 +1,602 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MCP Server Evaluation Guide
2
+
3
+ ## Overview
4
+
5
+ This document provides guidance on creating comprehensive evaluations for MCP servers. Evaluations test whether LLMs can effectively use your MCP server to answer realistic, complex questions using only the tools provided.
6
+
7
+ ---
8
+
9
+ ## Quick Reference
10
+
11
+ ### Evaluation Requirements
12
+ - Create 10 human-readable questions
13
+ - Questions must be READ-ONLY, INDEPENDENT, NON-DESTRUCTIVE
14
+ - Each question requires multiple tool calls (potentially dozens)
15
+ - Answers must be single, verifiable values
16
+ - Answers must be STABLE (won't change over time)
17
+
18
+ ### Output Format
19
+ ```xml
20
+ <evaluation>
21
+ <qa_pair>
22
+ <question>Your question here</question>
23
+ <answer>Single verifiable answer</answer>
24
+ </qa_pair>
25
+ </evaluation>
26
+ ```
27
+
28
+ ---
29
+
30
+ ## Purpose of Evaluations
31
+
32
+ The measure of quality of an MCP server is NOT how well or comprehensively the server implements tools, but how well these implementations (input/output schemas, docstrings/descriptions, functionality) enable LLMs with no other context and access ONLY to the MCP servers to answer realistic and difficult questions.
33
+
34
+ ## Evaluation Overview
35
+
36
+ Create 10 human-readable questions requiring ONLY READ-ONLY, INDEPENDENT, NON-DESTRUCTIVE, and IDEMPOTENT operations to answer. Each question should be:
37
+ - Realistic
38
+ - Clear and concise
39
+ - Unambiguous
40
+ - Complex, requiring potentially dozens of tool calls or steps
41
+ - Answerable with a single, verifiable value that you identify in advance
42
+
43
+ ## Question Guidelines
44
+
45
+ ### Core Requirements
46
+
47
+ 1. **Questions MUST be independent**
48
+ - Each question should NOT depend on the answer to any other question
49
+ - Should not assume prior write operations from processing another question
50
+
51
+ 2. **Questions MUST require ONLY NON-DESTRUCTIVE AND IDEMPOTENT tool use**
52
+ - Should not instruct or require modifying state to arrive at the correct answer
53
+
54
+ 3. **Questions must be REALISTIC, CLEAR, CONCISE, and COMPLEX**
55
+ - Must require another LLM to use multiple (potentially dozens of) tools or steps to answer
56
+
57
+ ### Complexity and Depth
58
+
59
+ 4. **Questions must require deep exploration**
60
+ - Consider multi-hop questions requiring multiple sub-questions and sequential tool calls
61
+ - Each step should benefit from information found in previous questions
62
+
63
+ 5. **Questions may require extensive paging**
64
+ - May need paging through multiple pages of results
65
+ - May require querying old data (1-2 years out-of-date) to find niche information
66
+ - The questions must be DIFFICULT
67
+
68
+ 6. **Questions must require deep understanding**
69
+ - Rather than surface-level knowledge
70
+ - May pose complex ideas as True/False questions requiring evidence
71
+ - May use multiple-choice format where LLM must search different hypotheses
72
+
73
+ 7. **Questions must not be solvable with straightforward keyword search**
74
+ - Do not include specific keywords from the target content
75
+ - Use synonyms, related concepts, or paraphrases
76
+ - Require multiple searches, analyzing multiple related items, extracting context, then deriving the answer
77
+
78
+ ### Tool Testing
79
+
80
+ 8. **Questions should stress-test tool return values**
81
+ - May elicit tools returning large JSON objects or lists, overwhelming the LLM
82
+ - Should require understanding multiple modalities of data:
83
+ - IDs and names
84
+ - Timestamps and datetimes (months, days, years, seconds)
85
+ - File IDs, names, extensions, and mimetypes
86
+ - URLs, GIDs, etc.
87
+ - Should probe the tool's ability to return all useful forms of data
88
+
89
+ 9. **Questions should MOSTLY reflect real human use cases**
90
+ - The kinds of information retrieval tasks that HUMANS assisted by an LLM would care about
91
+
92
+ 10. **Questions may require dozens of tool calls**
93
+ - This challenges LLMs with limited context
94
+ - Encourages MCP server tools to reduce information returned
95
+
96
+ 11. **Include ambiguous questions**
97
+ - May be ambiguous OR require difficult decisions on which tools to call
98
+ - Force the LLM to potentially make mistakes or misinterpret
99
+ - Ensure that despite AMBIGUITY, there is STILL A SINGLE VERIFIABLE ANSWER
100
+
101
+ ### Stability
102
+
103
+ 12. **Questions must be designed so the answer DOES NOT CHANGE**
104
+ - Do not ask questions that rely on "current state" which is dynamic
105
+ - For example, do not count:
106
+ - Number of reactions to a post
107
+ - Number of replies to a thread
108
+ - Number of members in a channel
109
+
110
+ 13. **DO NOT let the MCP server RESTRICT the kinds of questions you create**
111
+ - Create challenging and complex questions
112
+ - Some may not be solvable with the available MCP server tools
113
+ - Questions may require specific output formats (datetime vs. epoch time, JSON vs. MARKDOWN)
114
+ - Questions may require dozens of tool calls to complete
115
+
116
+ ## Answer Guidelines
117
+
118
+ ### Verification
119
+
120
+ 1. **Answers must be VERIFIABLE via direct string comparison**
121
+ - If the answer can be re-written in many formats, clearly specify the output format in the QUESTION
122
+ - Examples: "Use YYYY/MM/DD.", "Respond True or False.", "Answer A, B, C, or D and nothing else."
123
+ - Answer should be a single VERIFIABLE value such as:
124
+ - User ID, user name, display name, first name, last name
125
+ - Channel ID, channel name
126
+ - Message ID, string
127
+ - URL, title
128
+ - Numerical quantity
129
+ - Timestamp, datetime
130
+ - Boolean (for True/False questions)
131
+ - Email address, phone number
132
+ - File ID, file name, file extension
133
+ - Multiple choice answer
134
+ - Answers must not require special formatting or complex, structured output
135
+ - Answer will be verified using DIRECT STRING COMPARISON
136
+
137
+ ### Readability
138
+
139
+ 2. **Answers should generally prefer HUMAN-READABLE formats**
140
+ - Examples: names, first name, last name, datetime, file name, message string, URL, yes/no, true/false, a/b/c/d
141
+ - Rather than opaque IDs (though IDs are acceptable)
142
+ - The VAST MAJORITY of answers should be human-readable
143
+
144
+ ### Stability
145
+
146
+ 3. **Answers must be STABLE/STATIONARY**
147
+ - Look at old content (e.g., conversations that have ended, projects that have launched, questions answered)
148
+ - Create QUESTIONS based on "closed" concepts that will always return the same answer
149
+ - Questions may ask to consider a fixed time window to insulate from non-stationary answers
150
+ - Rely on context UNLIKELY to change
151
+ - Example: if finding a paper name, be SPECIFIC enough so answer is not confused with papers published later
152
+
153
+ 4. **Answers must be CLEAR and UNAMBIGUOUS**
154
+ - Questions must be designed so there is a single, clear answer
155
+ - Answer can be derived from using the MCP server tools
156
+
157
+ ### Diversity
158
+
159
+ 5. **Answers must be DIVERSE**
160
+ - Answer should be a single VERIFIABLE value in diverse modalities and formats
161
+ - User concept: user ID, user name, display name, first name, last name, email address, phone number
162
+ - Channel concept: channel ID, channel name, channel topic
163
+ - Message concept: message ID, message string, timestamp, month, day, year
164
+
165
+ 6. **Answers must NOT be complex structures**
166
+ - Not a list of values
167
+ - Not a complex object
168
+ - Not a list of IDs or strings
169
+ - Not natural language text
170
+ - UNLESS the answer can be straightforwardly verified using DIRECT STRING COMPARISON
171
+ - And can be realistically reproduced
172
+ - It should be unlikely that an LLM would return the same list in any other order or format
173
+
174
+ ## Evaluation Process
175
+
176
+ ### Step 1: Documentation Inspection
177
+
178
+ Read the documentation of the target API to understand:
179
+ - Available endpoints and functionality
180
+ - If ambiguity exists, fetch additional information from the web
181
+ - Parallelize this step AS MUCH AS POSSIBLE
182
+ - Ensure each subagent is ONLY examining documentation from the file system or on the web
183
+
184
+ ### Step 2: Tool Inspection
185
+
186
+ List the tools available in the MCP server:
187
+ - Inspect the MCP server directly
188
+ - Understand input/output schemas, docstrings, and descriptions
189
+ - WITHOUT calling the tools themselves at this stage
190
+
191
+ ### Step 3: Developing Understanding
192
+
193
+ Repeat steps 1 & 2 until you have a good understanding:
194
+ - Iterate multiple times
195
+ - Think about the kinds of tasks you want to create
196
+ - Refine your understanding
197
+ - At NO stage should you READ the code of the MCP server implementation itself
198
+ - Use your intuition and understanding to create reasonable, realistic, but VERY challenging tasks
199
+
200
+ ### Step 4: Read-Only Content Inspection
201
+
202
+ After understanding the API and tools, USE the MCP server tools:
203
+ - Inspect content using READ-ONLY and NON-DESTRUCTIVE operations ONLY
204
+ - Goal: identify specific content (e.g., users, channels, messages, projects, tasks) for creating realistic questions
205
+ - Should NOT call any tools that modify state
206
+ - Will NOT read the code of the MCP server implementation itself
207
+ - Parallelize this step with individual sub-agents pursuing independent explorations
208
+ - Ensure each subagent is only performing READ-ONLY, NON-DESTRUCTIVE, and IDEMPOTENT operations
209
+ - BE CAREFUL: SOME TOOLS may return LOTS OF DATA which would cause you to run out of CONTEXT
210
+ - Make INCREMENTAL, SMALL, AND TARGETED tool calls for exploration
211
+ - In all tool call requests, use the `limit` parameter to limit results (<10)
212
+ - Use pagination
213
+
214
+ ### Step 5: Task Generation
215
+
216
+ After inspecting the content, create 10 human-readable questions:
217
+ - An LLM should be able to answer these with the MCP server
218
+ - Follow all question and answer guidelines above
219
+
220
+ ## Output Format
221
+
222
+ Each QA pair consists of a question and an answer. The output should be an XML file with this structure:
223
+
224
+ ```xml
225
+ <evaluation>
226
+ <qa_pair>
227
+ <question>Find the project created in Q2 2024 with the highest number of completed tasks. What is the project name?</question>
228
+ <answer>Website Redesign</answer>
229
+ </qa_pair>
230
+ <qa_pair>
231
+ <question>Search for issues labeled as "bug" that were closed in March 2024. Which user closed the most issues? Provide their username.</question>
232
+ <answer>sarah_dev</answer>
233
+ </qa_pair>
234
+ <qa_pair>
235
+ <question>Look for pull requests that modified files in the /api directory and were merged between January 1 and January 31, 2024. How many different contributors worked on these PRs?</question>
236
+ <answer>7</answer>
237
+ </qa_pair>
238
+ <qa_pair>
239
+ <question>Find the repository with the most stars that was created before 2023. What is the repository name?</question>
240
+ <answer>data-pipeline</answer>
241
+ </qa_pair>
242
+ </evaluation>
243
+ ```
244
+
245
+ ## Evaluation Examples
246
+
247
+ ### Good Questions
248
+
249
+ **Example 1: Multi-hop question requiring deep exploration (GitHub MCP)**
250
+ ```xml
251
+ <qa_pair>
252
+ <question>Find the repository that was archived in Q3 2023 and had previously been the most forked project in the organization. What was the primary programming language used in that repository?</question>
253
+ <answer>Python</answer>
254
+ </qa_pair>
255
+ ```
256
+
257
+ This question is good because:
258
+ - Requires multiple searches to find archived repositories
259
+ - Needs to identify which had the most forks before archival
260
+ - Requires examining repository details for the language
261
+ - Answer is a simple, verifiable value
262
+ - Based on historical (closed) data that won't change
263
+
264
+ **Example 2: Requires understanding context without keyword matching (Project Management MCP)**
265
+ ```xml
266
+ <qa_pair>
267
+ <question>Locate the initiative focused on improving customer onboarding that was completed in late 2023. The project lead created a retrospective document after completion. What was the lead's role title at that time?</question>
268
+ <answer>Product Manager</answer>
269
+ </qa_pair>
270
+ ```
271
+
272
+ This question is good because:
273
+ - Doesn't use specific project name ("initiative focused on improving customer onboarding")
274
+ - Requires finding completed projects from specific timeframe
275
+ - Needs to identify the project lead and their role
276
+ - Requires understanding context from retrospective documents
277
+ - Answer is human-readable and stable
278
+ - Based on completed work (won't change)
279
+
280
+ **Example 3: Complex aggregation requiring multiple steps (Issue Tracker MCP)**
281
+ ```xml
282
+ <qa_pair>
283
+ <question>Among all bugs reported in January 2024 that were marked as critical priority, which assignee resolved the highest percentage of their assigned bugs within 48 hours? Provide the assignee's username.</question>
284
+ <answer>alex_eng</answer>
285
+ </qa_pair>
286
+ ```
287
+
288
+ This question is good because:
289
+ - Requires filtering bugs by date, priority, and status
290
+ - Needs to group by assignee and calculate resolution rates
291
+ - Requires understanding timestamps to determine 48-hour windows
292
+ - Tests pagination (potentially many bugs to process)
293
+ - Answer is a single username
294
+ - Based on historical data from specific time period
295
+
296
+ **Example 4: Requires synthesis across multiple data types (CRM MCP)**
297
+ ```xml
298
+ <qa_pair>
299
+ <question>Find the account that upgraded from the Starter to Enterprise plan in Q4 2023 and had the highest annual contract value. What industry does this account operate in?</question>
300
+ <answer>Healthcare</answer>
301
+ </qa_pair>
302
+ ```
303
+
304
+ This question is good because:
305
+ - Requires understanding subscription tier changes
306
+ - Needs to identify upgrade events in specific timeframe
307
+ - Requires comparing contract values
308
+ - Must access account industry information
309
+ - Answer is simple and verifiable
310
+ - Based on completed historical transactions
311
+
312
+ ### Poor Questions
313
+
314
+ **Example 1: Answer changes over time**
315
+ ```xml
316
+ <qa_pair>
317
+ <question>How many open issues are currently assigned to the engineering team?</question>
318
+ <answer>47</answer>
319
+ </qa_pair>
320
+ ```
321
+
322
+ This question is poor because:
323
+ - The answer will change as issues are created, closed, or reassigned
324
+ - Not based on stable/stationary data
325
+ - Relies on "current state" which is dynamic
326
+
327
+ **Example 2: Too easy with keyword search**
328
+ ```xml
329
+ <qa_pair>
330
+ <question>Find the pull request with title "Add authentication feature" and tell me who created it.</question>
331
+ <answer>developer123</answer>
332
+ </qa_pair>
333
+ ```
334
+
335
+ This question is poor because:
336
+ - Can be solved with a straightforward keyword search for exact title
337
+ - Doesn't require deep exploration or understanding
338
+ - No synthesis or analysis needed
339
+
340
+ **Example 3: Ambiguous answer format**
341
+ ```xml
342
+ <qa_pair>
343
+ <question>List all the repositories that have Python as their primary language.</question>
344
+ <answer>repo1, repo2, repo3, data-pipeline, ml-tools</answer>
345
+ </qa_pair>
346
+ ```
347
+
348
+ This question is poor because:
349
+ - Answer is a list that could be returned in any order
350
+ - Difficult to verify with direct string comparison
351
+ - LLM might format differently (JSON array, comma-separated, newline-separated)
352
+ - Better to ask for a specific aggregate (count) or superlative (most stars)
353
+
354
+ ## Verification Process
355
+
356
+ After creating evaluations:
357
+
358
+ 1. **Examine the XML file** to understand the schema
359
+ 2. **Load each task instruction** and in parallel using the MCP server and tools, identify the correct answer by attempting to solve the task YOURSELF
360
+ 3. **Flag any operations** that require WRITE or DESTRUCTIVE operations
361
+ 4. **Accumulate all CORRECT answers** and replace any incorrect answers in the document
362
+ 5. **Remove any `<qa_pair>`** that require WRITE or DESTRUCTIVE operations
363
+
364
+ Remember to parallelize solving tasks to avoid running out of context, then accumulate all answers and make changes to the file at the end.
365
+
366
+ ## Tips for Creating Quality Evaluations
367
+
368
+ 1. **Think Hard and Plan Ahead** before generating tasks
369
+ 2. **Parallelize Where Opportunity Arises** to speed up the process and manage context
370
+ 3. **Focus on Realistic Use Cases** that humans would actually want to accomplish
371
+ 4. **Create Challenging Questions** that test the limits of the MCP server's capabilities
372
+ 5. **Ensure Stability** by using historical data and closed concepts
373
+ 6. **Verify Answers** by solving the questions yourself using the MCP server tools
374
+ 7. **Iterate and Refine** based on what you learn during the process
375
+
376
+ ---
377
+
378
+ # Running Evaluations
379
+
380
+ After creating your evaluation file, you can use the provided evaluation harness to test your MCP server.
381
+
382
+ ## Setup
383
+
384
+ 1. **Install Dependencies**
385
+
386
+ ```bash
387
+ pip install -r scripts/requirements.txt
388
+ ```
389
+
390
+ Or install manually:
391
+ ```bash
392
+ pip install anthropic mcp
393
+ ```
394
+
395
+ 2. **Set API Key**
396
+
397
+ ```bash
398
+ export ANTHROPIC_API_KEY=your_api_key_here
399
+ ```
400
+
401
+ ## Evaluation File Format
402
+
403
+ Evaluation files use XML format with `<qa_pair>` elements:
404
+
405
+ ```xml
406
+ <evaluation>
407
+ <qa_pair>
408
+ <question>Find the project created in Q2 2024 with the highest number of completed tasks. What is the project name?</question>
409
+ <answer>Website Redesign</answer>
410
+ </qa_pair>
411
+ <qa_pair>
412
+ <question>Search for issues labeled as "bug" that were closed in March 2024. Which user closed the most issues? Provide their username.</question>
413
+ <answer>sarah_dev</answer>
414
+ </qa_pair>
415
+ </evaluation>
416
+ ```
417
+
418
+ ## Running Evaluations
419
+
420
+ The evaluation script (`scripts/evaluation.py`) supports three transport types:
421
+
422
+ **Important:**
423
+ - **stdio transport**: The evaluation script automatically launches and manages the MCP server process for you. Do not run the server manually.
424
+ - **sse/http transports**: You must start the MCP server separately before running the evaluation. The script connects to the already-running server at the specified URL.
425
+
426
+ ### 1. Local STDIO Server
427
+
428
+ For locally-run MCP servers (script launches the server automatically):
429
+
430
+ ```bash
431
+ python scripts/evaluation.py \
432
+ -t stdio \
433
+ -c python \
434
+ -a my_mcp_server.py \
435
+ evaluation.xml
436
+ ```
437
+
438
+ With environment variables:
439
+ ```bash
440
+ python scripts/evaluation.py \
441
+ -t stdio \
442
+ -c python \
443
+ -a my_mcp_server.py \
444
+ -e API_KEY=abc123 \
445
+ -e DEBUG=true \
446
+ evaluation.xml
447
+ ```
448
+
449
+ ### 2. Server-Sent Events (SSE)
450
+
451
+ For SSE-based MCP servers (you must start the server first):
452
+
453
+ ```bash
454
+ python scripts/evaluation.py \
455
+ -t sse \
456
+ -u https://example.com/mcp \
457
+ -H "Authorization: Bearer token123" \
458
+ -H "X-Custom-Header: value" \
459
+ evaluation.xml
460
+ ```
461
+
462
+ ### 3. HTTP (Streamable HTTP)
463
+
464
+ For HTTP-based MCP servers (you must start the server first):
465
+
466
+ ```bash
467
+ python scripts/evaluation.py \
468
+ -t http \
469
+ -u https://example.com/mcp \
470
+ -H "Authorization: Bearer token123" \
471
+ evaluation.xml
472
+ ```
473
+
474
+ ## Command-Line Options
475
+
476
+ ```
477
+ usage: evaluation.py [-h] [-t {stdio,sse,http}] [-m MODEL] [-c COMMAND]
478
+ [-a ARGS [ARGS ...]] [-e ENV [ENV ...]] [-u URL]
479
+ [-H HEADERS [HEADERS ...]] [-o OUTPUT]
480
+ eval_file
481
+
482
+ positional arguments:
483
+ eval_file Path to evaluation XML file
484
+
485
+ optional arguments:
486
+ -h, --help Show help message
487
+ -t, --transport Transport type: stdio, sse, or http (default: stdio)
488
+ -m, --model Claude model to use (default: claude-3-7-sonnet-20250219)
489
+ -o, --output Output file for report (default: print to stdout)
490
+
491
+ stdio options:
492
+ -c, --command Command to run MCP server (e.g., python, node)
493
+ -a, --args Arguments for the command (e.g., server.py)
494
+ -e, --env Environment variables in KEY=VALUE format
495
+
496
+ sse/http options:
497
+ -u, --url MCP server URL
498
+ -H, --header HTTP headers in 'Key: Value' format
499
+ ```
500
+
501
+ ## Output
502
+
503
+ The evaluation script generates a detailed report including:
504
+
505
+ - **Summary Statistics**:
506
+ - Accuracy (correct/total)
507
+ - Average task duration
508
+ - Average tool calls per task
509
+ - Total tool calls
510
+
511
+ - **Per-Task Results**:
512
+ - Prompt and expected response
513
+ - Actual response from the agent
514
+ - Whether the answer was correct (✅/❌)
515
+ - Duration and tool call details
516
+ - Agent's summary of its approach
517
+ - Agent's feedback on the tools
518
+
519
+ ### Save Report to File
520
+
521
+ ```bash
522
+ python scripts/evaluation.py \
523
+ -t stdio \
524
+ -c python \
525
+ -a my_server.py \
526
+ -o evaluation_report.md \
527
+ evaluation.xml
528
+ ```
529
+
530
+ ## Complete Example Workflow
531
+
532
+ Here's a complete example of creating and running an evaluation:
533
+
534
+ 1. **Create your evaluation file** (`my_evaluation.xml`):
535
+
536
+ ```xml
537
+ <evaluation>
538
+ <qa_pair>
539
+ <question>Find the user who created the most issues in January 2024. What is their username?</question>
540
+ <answer>alice_developer</answer>
541
+ </qa_pair>
542
+ <qa_pair>
543
+ <question>Among all pull requests merged in Q1 2024, which repository had the highest number? Provide the repository name.</question>
544
+ <answer>backend-api</answer>
545
+ </qa_pair>
546
+ <qa_pair>
547
+ <question>Find the project that was completed in December 2023 and had the longest duration from start to finish. How many days did it take?</question>
548
+ <answer>127</answer>
549
+ </qa_pair>
550
+ </evaluation>
551
+ ```
552
+
553
+ 2. **Install dependencies**:
554
+
555
+ ```bash
556
+ pip install -r scripts/requirements.txt
557
+ export ANTHROPIC_API_KEY=your_api_key
558
+ ```
559
+
560
+ 3. **Run evaluation**:
561
+
562
+ ```bash
563
+ python scripts/evaluation.py \
564
+ -t stdio \
565
+ -c python \
566
+ -a github_mcp_server.py \
567
+ -e GITHUB_TOKEN=ghp_xxx \
568
+ -o github_eval_report.md \
569
+ my_evaluation.xml
570
+ ```
571
+
572
+ 4. **Review the report** in `github_eval_report.md` to:
573
+ - See which questions passed/failed
574
+ - Read the agent's feedback on your tools
575
+ - Identify areas for improvement
576
+ - Iterate on your MCP server design
577
+
578
+ ## Troubleshooting
579
+
580
+ ### Connection Errors
581
+
582
+ If you get connection errors:
583
+ - **STDIO**: Verify the command and arguments are correct
584
+ - **SSE/HTTP**: Check the URL is accessible and headers are correct
585
+ - Ensure any required API keys are set in environment variables or headers
586
+
587
+ ### Low Accuracy
588
+
589
+ If many evaluations fail:
590
+ - Review the agent's feedback for each task
591
+ - Check if tool descriptions are clear and comprehensive
592
+ - Verify input parameters are well-documented
593
+ - Consider whether tools return too much or too little data
594
+ - Ensure error messages are actionable
595
+
596
+ ### Timeout Issues
597
+
598
+ If tasks are timing out:
599
+ - Use a more capable model (e.g., `claude-3-7-sonnet-20250219`)
600
+ - Check if tools are returning too much data
601
+ - Verify pagination is working correctly
602
+ - Consider simplifying complex questions
mcp-builder/reference/mcp_best_practices.md ADDED
@@ -0,0 +1,249 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MCP Server Best Practices
2
+
3
+ ## Quick Reference
4
+
5
+ ### Server Naming
6
+ - **Python**: `{service}_mcp` (e.g., `slack_mcp`)
7
+ - **Node/TypeScript**: `{service}-mcp-server` (e.g., `slack-mcp-server`)
8
+
9
+ ### Tool Naming
10
+ - Use snake_case with service prefix
11
+ - Format: `{service}_{action}_{resource}`
12
+ - Example: `slack_send_message`, `github_create_issue`
13
+
14
+ ### Response Formats
15
+ - Support both JSON and Markdown formats
16
+ - JSON for programmatic processing
17
+ - Markdown for human readability
18
+
19
+ ### Pagination
20
+ - Always respect `limit` parameter
21
+ - Return `has_more`, `next_offset`, `total_count`
22
+ - Default to 20-50 items
23
+
24
+ ### Transport
25
+ - **Streamable HTTP**: For remote servers, multi-client scenarios
26
+ - **stdio**: For local integrations, command-line tools
27
+ - Avoid SSE (deprecated in favor of streamable HTTP)
28
+
29
+ ---
30
+
31
+ ## Server Naming Conventions
32
+
33
+ Follow these standardized naming patterns:
34
+
35
+ **Python**: Use format `{service}_mcp` (lowercase with underscores)
36
+ - Examples: `slack_mcp`, `github_mcp`, `jira_mcp`
37
+
38
+ **Node/TypeScript**: Use format `{service}-mcp-server` (lowercase with hyphens)
39
+ - Examples: `slack-mcp-server`, `github-mcp-server`, `jira-mcp-server`
40
+
41
+ The name should be general, descriptive of the service being integrated, easy to infer from the task description, and without version numbers.
42
+
43
+ ---
44
+
45
+ ## Tool Naming and Design
46
+
47
+ ### Tool Naming
48
+
49
+ 1. **Use snake_case**: `search_users`, `create_project`, `get_channel_info`
50
+ 2. **Include service prefix**: Anticipate that your MCP server may be used alongside other MCP servers
51
+ - Use `slack_send_message` instead of just `send_message`
52
+ - Use `github_create_issue` instead of just `create_issue`
53
+ 3. **Be action-oriented**: Start with verbs (get, list, search, create, etc.)
54
+ 4. **Be specific**: Avoid generic names that could conflict with other servers
55
+
56
+ ### Tool Design
57
+
58
+ - Tool descriptions must narrowly and unambiguously describe functionality
59
+ - Descriptions must precisely match actual functionality
60
+ - Provide tool annotations (readOnlyHint, destructiveHint, idempotentHint, openWorldHint)
61
+ - Keep tool operations focused and atomic
62
+
63
+ ---
64
+
65
+ ## Response Formats
66
+
67
+ All tools that return data should support multiple formats:
68
+
69
+ ### JSON Format (`response_format="json"`)
70
+ - Machine-readable structured data
71
+ - Include all available fields and metadata
72
+ - Consistent field names and types
73
+ - Use for programmatic processing
74
+
75
+ ### Markdown Format (`response_format="markdown"`, typically default)
76
+ - Human-readable formatted text
77
+ - Use headers, lists, and formatting for clarity
78
+ - Convert timestamps to human-readable format
79
+ - Show display names with IDs in parentheses
80
+ - Omit verbose metadata
81
+
82
+ ---
83
+
84
+ ## Pagination
85
+
86
+ For tools that list resources:
87
+
88
+ - **Always respect the `limit` parameter**
89
+ - **Implement pagination**: Use `offset` or cursor-based pagination
90
+ - **Return pagination metadata**: Include `has_more`, `next_offset`/`next_cursor`, `total_count`
91
+ - **Never load all results into memory**: Especially important for large datasets
92
+ - **Default to reasonable limits**: 20-50 items is typical
93
+
94
+ Example pagination response:
95
+ ```json
96
+ {
97
+ "total": 150,
98
+ "count": 20,
99
+ "offset": 0,
100
+ "items": [...],
101
+ "has_more": true,
102
+ "next_offset": 20
103
+ }
104
+ ```
105
+
106
+ ---
107
+
108
+ ## Transport Options
109
+
110
+ ### Streamable HTTP
111
+
112
+ **Best for**: Remote servers, web services, multi-client scenarios
113
+
114
+ **Characteristics**:
115
+ - Bidirectional communication over HTTP
116
+ - Supports multiple simultaneous clients
117
+ - Can be deployed as a web service
118
+ - Enables server-to-client notifications
119
+
120
+ **Use when**:
121
+ - Serving multiple clients simultaneously
122
+ - Deploying as a cloud service
123
+ - Integration with web applications
124
+
125
+ ### stdio
126
+
127
+ **Best for**: Local integrations, command-line tools
128
+
129
+ **Characteristics**:
130
+ - Standard input/output stream communication
131
+ - Simple setup, no network configuration needed
132
+ - Runs as a subprocess of the client
133
+
134
+ **Use when**:
135
+ - Building tools for local development environments
136
+ - Integrating with desktop applications
137
+ - Single-user, single-session scenarios
138
+
139
+ **Note**: stdio servers should NOT log to stdout (use stderr for logging)
140
+
141
+ ### Transport Selection
142
+
143
+ | Criterion | stdio | Streamable HTTP |
144
+ |-----------|-------|-----------------|
145
+ | **Deployment** | Local | Remote |
146
+ | **Clients** | Single | Multiple |
147
+ | **Complexity** | Low | Medium |
148
+ | **Real-time** | No | Yes |
149
+
150
+ ---
151
+
152
+ ## Security Best Practices
153
+
154
+ ### Authentication and Authorization
155
+
156
+ **OAuth 2.1**:
157
+ - Use secure OAuth 2.1 with certificates from recognized authorities
158
+ - Validate access tokens before processing requests
159
+ - Only accept tokens specifically intended for your server
160
+
161
+ **API Keys**:
162
+ - Store API keys in environment variables, never in code
163
+ - Validate keys on server startup
164
+ - Provide clear error messages when authentication fails
165
+
166
+ ### Input Validation
167
+
168
+ - Sanitize file paths to prevent directory traversal
169
+ - Validate URLs and external identifiers
170
+ - Check parameter sizes and ranges
171
+ - Prevent command injection in system calls
172
+ - Use schema validation (Pydantic/Zod) for all inputs
173
+
174
+ ### Error Handling
175
+
176
+ - Don't expose internal errors to clients
177
+ - Log security-relevant errors server-side
178
+ - Provide helpful but not revealing error messages
179
+ - Clean up resources after errors
180
+
181
+ ### DNS Rebinding Protection
182
+
183
+ For streamable HTTP servers running locally:
184
+ - Enable DNS rebinding protection
185
+ - Validate the `Origin` header on all incoming connections
186
+ - Bind to `127.0.0.1` rather than `0.0.0.0`
187
+
188
+ ---
189
+
190
+ ## Tool Annotations
191
+
192
+ Provide annotations to help clients understand tool behavior:
193
+
194
+ | Annotation | Type | Default | Description |
195
+ |-----------|------|---------|-------------|
196
+ | `readOnlyHint` | boolean | false | Tool does not modify its environment |
197
+ | `destructiveHint` | boolean | true | Tool may perform destructive updates |
198
+ | `idempotentHint` | boolean | false | Repeated calls with same args have no additional effect |
199
+ | `openWorldHint` | boolean | true | Tool interacts with external entities |
200
+
201
+ **Important**: Annotations are hints, not security guarantees. Clients should not make security-critical decisions based solely on annotations.
202
+
203
+ ---
204
+
205
+ ## Error Handling
206
+
207
+ - Use standard JSON-RPC error codes
208
+ - Report tool errors within result objects (not protocol-level errors)
209
+ - Provide helpful, specific error messages with suggested next steps
210
+ - Don't expose internal implementation details
211
+ - Clean up resources properly on errors
212
+
213
+ Example error handling:
214
+ ```typescript
215
+ try {
216
+ const result = performOperation();
217
+ return { content: [{ type: "text", text: result }] };
218
+ } catch (error) {
219
+ return {
220
+ isError: true,
221
+ content: [{
222
+ type: "text",
223
+ text: `Error: ${error.message}. Try using filter='active_only' to reduce results.`
224
+ }]
225
+ };
226
+ }
227
+ ```
228
+
229
+ ---
230
+
231
+ ## Testing Requirements
232
+
233
+ Comprehensive testing should cover:
234
+
235
+ - **Functional testing**: Verify correct execution with valid/invalid inputs
236
+ - **Integration testing**: Test interaction with external systems
237
+ - **Security testing**: Validate auth, input sanitization, rate limiting
238
+ - **Performance testing**: Check behavior under load, timeouts
239
+ - **Error handling**: Ensure proper error reporting and cleanup
240
+
241
+ ---
242
+
243
+ ## Documentation Requirements
244
+
245
+ - Provide clear documentation of all tools and capabilities
246
+ - Include working examples (at least 3 per major feature)
247
+ - Document security considerations
248
+ - Specify required permissions and access levels
249
+ - Document rate limits and performance characteristics
mcp-builder/reference/node_mcp_server.md ADDED
@@ -0,0 +1,970 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Node/TypeScript MCP Server Implementation Guide
2
+
3
+ ## Overview
4
+
5
+ This document provides Node/TypeScript-specific best practices and examples for implementing MCP servers using the MCP TypeScript SDK. It covers project structure, server setup, tool registration patterns, input validation with Zod, error handling, and complete working examples.
6
+
7
+ ---
8
+
9
+ ## Quick Reference
10
+
11
+ ### Key Imports
12
+ ```typescript
13
+ import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
14
+ import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
15
+ import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
16
+ import express from "express";
17
+ import { z } from "zod";
18
+ ```
19
+
20
+ ### Server Initialization
21
+ ```typescript
22
+ const server = new McpServer({
23
+ name: "service-mcp-server",
24
+ version: "1.0.0"
25
+ });
26
+ ```
27
+
28
+ ### Tool Registration Pattern
29
+ ```typescript
30
+ server.registerTool(
31
+ "tool_name",
32
+ {
33
+ title: "Tool Display Name",
34
+ description: "What the tool does",
35
+ inputSchema: { param: z.string() },
36
+ outputSchema: { result: z.string() }
37
+ },
38
+ async ({ param }) => {
39
+ const output = { result: `Processed: ${param}` };
40
+ return {
41
+ content: [{ type: "text", text: JSON.stringify(output) }],
42
+ structuredContent: output // Modern pattern for structured data
43
+ };
44
+ }
45
+ );
46
+ ```
47
+
48
+ ---
49
+
50
+ ## MCP TypeScript SDK
51
+
52
+ The official MCP TypeScript SDK provides:
53
+ - `McpServer` class for server initialization
54
+ - `registerTool` method for tool registration
55
+ - Zod schema integration for runtime input validation
56
+ - Type-safe tool handler implementations
57
+
58
+ **IMPORTANT - Use Modern APIs Only:**
59
+ - **DO use**: `server.registerTool()`, `server.registerResource()`, `server.registerPrompt()`
60
+ - **DO NOT use**: Old deprecated APIs such as `server.tool()`, `server.setRequestHandler(ListToolsRequestSchema, ...)`, or manual handler registration
61
+ - The `register*` methods provide better type safety, automatic schema handling, and are the recommended approach
62
+
63
+ See the MCP SDK documentation in the references for complete details.
64
+
65
+ ## Server Naming Convention
66
+
67
+ Node/TypeScript MCP servers must follow this naming pattern:
68
+ - **Format**: `{service}-mcp-server` (lowercase with hyphens)
69
+ - **Examples**: `github-mcp-server`, `jira-mcp-server`, `stripe-mcp-server`
70
+
71
+ The name should be:
72
+ - General (not tied to specific features)
73
+ - Descriptive of the service/API being integrated
74
+ - Easy to infer from the task description
75
+ - Without version numbers or dates
76
+
77
+ ## Project Structure
78
+
79
+ Create the following structure for Node/TypeScript MCP servers:
80
+
81
+ ```
82
+ {service}-mcp-server/
83
+ ├── package.json
84
+ ├── tsconfig.json
85
+ ├── README.md
86
+ ├── src/
87
+ │ ├── index.ts # Main entry point with McpServer initialization
88
+ │ ├── types.ts # TypeScript type definitions and interfaces
89
+ │ ├── tools/ # Tool implementations (one file per domain)
90
+ │ ├── services/ # API clients and shared utilities
91
+ │ ├── schemas/ # Zod validation schemas
92
+ │ └── constants.ts # Shared constants (API_URL, CHARACTER_LIMIT, etc.)
93
+ └── dist/ # Built JavaScript files (entry point: dist/index.js)
94
+ ```
95
+
96
+ ## Tool Implementation
97
+
98
+ ### Tool Naming
99
+
100
+ Use snake_case for tool names (e.g., "search_users", "create_project", "get_channel_info") with clear, action-oriented names.
101
+
102
+ **Avoid Naming Conflicts**: Include the service context to prevent overlaps:
103
+ - Use "slack_send_message" instead of just "send_message"
104
+ - Use "github_create_issue" instead of just "create_issue"
105
+ - Use "asana_list_tasks" instead of just "list_tasks"
106
+
107
+ ### Tool Structure
108
+
109
+ Tools are registered using the `registerTool` method with the following requirements:
110
+ - Use Zod schemas for runtime input validation and type safety
111
+ - The `description` field must be explicitly provided - JSDoc comments are NOT automatically extracted
112
+ - Explicitly provide `title`, `description`, `inputSchema`, and `annotations`
113
+ - The `inputSchema` must be a Zod schema object (not a JSON schema)
114
+ - Type all parameters and return values explicitly
115
+
116
+ ```typescript
117
+ import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
118
+ import { z } from "zod";
119
+
120
+ const server = new McpServer({
121
+ name: "example-mcp",
122
+ version: "1.0.0"
123
+ });
124
+
125
+ // Zod schema for input validation
126
+ const UserSearchInputSchema = z.object({
127
+ query: z.string()
128
+ .min(2, "Query must be at least 2 characters")
129
+ .max(200, "Query must not exceed 200 characters")
130
+ .describe("Search string to match against names/emails"),
131
+ limit: z.number()
132
+ .int()
133
+ .min(1)
134
+ .max(100)
135
+ .default(20)
136
+ .describe("Maximum results to return"),
137
+ offset: z.number()
138
+ .int()
139
+ .min(0)
140
+ .default(0)
141
+ .describe("Number of results to skip for pagination"),
142
+ response_format: z.nativeEnum(ResponseFormat)
143
+ .default(ResponseFormat.MARKDOWN)
144
+ .describe("Output format: 'markdown' for human-readable or 'json' for machine-readable")
145
+ }).strict();
146
+
147
+ // Type definition from Zod schema
148
+ type UserSearchInput = z.infer<typeof UserSearchInputSchema>;
149
+
150
+ server.registerTool(
151
+ "example_search_users",
152
+ {
153
+ title: "Search Example Users",
154
+ description: `Search for users in the Example system by name, email, or team.
155
+
156
+ This tool searches across all user profiles in the Example platform, supporting partial matches and various search filters. It does NOT create or modify users, only searches existing ones.
157
+
158
+ Args:
159
+ - query (string): Search string to match against names/emails
160
+ - limit (number): Maximum results to return, between 1-100 (default: 20)
161
+ - offset (number): Number of results to skip for pagination (default: 0)
162
+ - response_format ('markdown' | 'json'): Output format (default: 'markdown')
163
+
164
+ Returns:
165
+ For JSON format: Structured data with schema:
166
+ {
167
+ "total": number, // Total number of matches found
168
+ "count": number, // Number of results in this response
169
+ "offset": number, // Current pagination offset
170
+ "users": [
171
+ {
172
+ "id": string, // User ID (e.g., "U123456789")
173
+ "name": string, // Full name (e.g., "John Doe")
174
+ "email": string, // Email address
175
+ "team": string, // Team name (optional)
176
+ "active": boolean // Whether user is active
177
+ }
178
+ ],
179
+ "has_more": boolean, // Whether more results are available
180
+ "next_offset": number // Offset for next page (if has_more is true)
181
+ }
182
+
183
+ Examples:
184
+ - Use when: "Find all marketing team members" -> params with query="team:marketing"
185
+ - Use when: "Search for John's account" -> params with query="john"
186
+ - Don't use when: You need to create a user (use example_create_user instead)
187
+
188
+ Error Handling:
189
+ - Returns "Error: Rate limit exceeded" if too many requests (429 status)
190
+ - Returns "No users found matching '<query>'" if search returns empty`,
191
+ inputSchema: UserSearchInputSchema,
192
+ annotations: {
193
+ readOnlyHint: true,
194
+ destructiveHint: false,
195
+ idempotentHint: true,
196
+ openWorldHint: true
197
+ }
198
+ },
199
+ async (params: UserSearchInput) => {
200
+ try {
201
+ // Input validation is handled by Zod schema
202
+ // Make API request using validated parameters
203
+ const data = await makeApiRequest<any>(
204
+ "users/search",
205
+ "GET",
206
+ undefined,
207
+ {
208
+ q: params.query,
209
+ limit: params.limit,
210
+ offset: params.offset
211
+ }
212
+ );
213
+
214
+ const users = data.users || [];
215
+ const total = data.total || 0;
216
+
217
+ if (!users.length) {
218
+ return {
219
+ content: [{
220
+ type: "text",
221
+ text: `No users found matching '${params.query}'`
222
+ }]
223
+ };
224
+ }
225
+
226
+ // Prepare structured output
227
+ const output = {
228
+ total,
229
+ count: users.length,
230
+ offset: params.offset,
231
+ users: users.map((user: any) => ({
232
+ id: user.id,
233
+ name: user.name,
234
+ email: user.email,
235
+ ...(user.team ? { team: user.team } : {}),
236
+ active: user.active ?? true
237
+ })),
238
+ has_more: total > params.offset + users.length,
239
+ ...(total > params.offset + users.length ? {
240
+ next_offset: params.offset + users.length
241
+ } : {})
242
+ };
243
+
244
+ // Format text representation based on requested format
245
+ let textContent: string;
246
+ if (params.response_format === ResponseFormat.MARKDOWN) {
247
+ const lines = [`# User Search Results: '${params.query}'`, "",
248
+ `Found ${total} users (showing ${users.length})`, ""];
249
+ for (const user of users) {
250
+ lines.push(`## ${user.name} (${user.id})`);
251
+ lines.push(`- **Email**: ${user.email}`);
252
+ if (user.team) lines.push(`- **Team**: ${user.team}`);
253
+ lines.push("");
254
+ }
255
+ textContent = lines.join("\n");
256
+ } else {
257
+ textContent = JSON.stringify(output, null, 2);
258
+ }
259
+
260
+ return {
261
+ content: [{ type: "text", text: textContent }],
262
+ structuredContent: output // Modern pattern for structured data
263
+ };
264
+ } catch (error) {
265
+ return {
266
+ content: [{
267
+ type: "text",
268
+ text: handleApiError(error)
269
+ }]
270
+ };
271
+ }
272
+ }
273
+ );
274
+ ```
275
+
276
+ ## Zod Schemas for Input Validation
277
+
278
+ Zod provides runtime type validation:
279
+
280
+ ```typescript
281
+ import { z } from "zod";
282
+
283
+ // Basic schema with validation
284
+ const CreateUserSchema = z.object({
285
+ name: z.string()
286
+ .min(1, "Name is required")
287
+ .max(100, "Name must not exceed 100 characters"),
288
+ email: z.string()
289
+ .email("Invalid email format"),
290
+ age: z.number()
291
+ .int("Age must be a whole number")
292
+ .min(0, "Age cannot be negative")
293
+ .max(150, "Age cannot be greater than 150")
294
+ }).strict(); // Use .strict() to forbid extra fields
295
+
296
+ // Enums
297
+ enum ResponseFormat {
298
+ MARKDOWN = "markdown",
299
+ JSON = "json"
300
+ }
301
+
302
+ const SearchSchema = z.object({
303
+ response_format: z.nativeEnum(ResponseFormat)
304
+ .default(ResponseFormat.MARKDOWN)
305
+ .describe("Output format")
306
+ });
307
+
308
+ // Optional fields with defaults
309
+ const PaginationSchema = z.object({
310
+ limit: z.number()
311
+ .int()
312
+ .min(1)
313
+ .max(100)
314
+ .default(20)
315
+ .describe("Maximum results to return"),
316
+ offset: z.number()
317
+ .int()
318
+ .min(0)
319
+ .default(0)
320
+ .describe("Number of results to skip")
321
+ });
322
+ ```
323
+
324
+ ## Response Format Options
325
+
326
+ Support multiple output formats for flexibility:
327
+
328
+ ```typescript
329
+ enum ResponseFormat {
330
+ MARKDOWN = "markdown",
331
+ JSON = "json"
332
+ }
333
+
334
+ const inputSchema = z.object({
335
+ query: z.string(),
336
+ response_format: z.nativeEnum(ResponseFormat)
337
+ .default(ResponseFormat.MARKDOWN)
338
+ .describe("Output format: 'markdown' for human-readable or 'json' for machine-readable")
339
+ });
340
+ ```
341
+
342
+ **Markdown format**:
343
+ - Use headers, lists, and formatting for clarity
344
+ - Convert timestamps to human-readable format
345
+ - Show display names with IDs in parentheses
346
+ - Omit verbose metadata
347
+ - Group related information logically
348
+
349
+ **JSON format**:
350
+ - Return complete, structured data suitable for programmatic processing
351
+ - Include all available fields and metadata
352
+ - Use consistent field names and types
353
+
354
+ ## Pagination Implementation
355
+
356
+ For tools that list resources:
357
+
358
+ ```typescript
359
+ const ListSchema = z.object({
360
+ limit: z.number().int().min(1).max(100).default(20),
361
+ offset: z.number().int().min(0).default(0)
362
+ });
363
+
364
+ async function listItems(params: z.infer<typeof ListSchema>) {
365
+ const data = await apiRequest(params.limit, params.offset);
366
+
367
+ const response = {
368
+ total: data.total,
369
+ count: data.items.length,
370
+ offset: params.offset,
371
+ items: data.items,
372
+ has_more: data.total > params.offset + data.items.length,
373
+ next_offset: data.total > params.offset + data.items.length
374
+ ? params.offset + data.items.length
375
+ : undefined
376
+ };
377
+
378
+ return JSON.stringify(response, null, 2);
379
+ }
380
+ ```
381
+
382
+ ## Character Limits and Truncation
383
+
384
+ Add a CHARACTER_LIMIT constant to prevent overwhelming responses:
385
+
386
+ ```typescript
387
+ // At module level in constants.ts
388
+ export const CHARACTER_LIMIT = 25000; // Maximum response size in characters
389
+
390
+ async function searchTool(params: SearchInput) {
391
+ let result = generateResponse(data);
392
+
393
+ // Check character limit and truncate if needed
394
+ if (result.length > CHARACTER_LIMIT) {
395
+ const truncatedData = data.slice(0, Math.max(1, data.length / 2));
396
+ response.data = truncatedData;
397
+ response.truncated = true;
398
+ response.truncation_message =
399
+ `Response truncated from ${data.length} to ${truncatedData.length} items. ` +
400
+ `Use 'offset' parameter or add filters to see more results.`;
401
+ result = JSON.stringify(response, null, 2);
402
+ }
403
+
404
+ return result;
405
+ }
406
+ ```
407
+
408
+ ## Error Handling
409
+
410
+ Provide clear, actionable error messages:
411
+
412
+ ```typescript
413
+ import axios, { AxiosError } from "axios";
414
+
415
+ function handleApiError(error: unknown): string {
416
+ if (error instanceof AxiosError) {
417
+ if (error.response) {
418
+ switch (error.response.status) {
419
+ case 404:
420
+ return "Error: Resource not found. Please check the ID is correct.";
421
+ case 403:
422
+ return "Error: Permission denied. You don't have access to this resource.";
423
+ case 429:
424
+ return "Error: Rate limit exceeded. Please wait before making more requests.";
425
+ default:
426
+ return `Error: API request failed with status ${error.response.status}`;
427
+ }
428
+ } else if (error.code === "ECONNABORTED") {
429
+ return "Error: Request timed out. Please try again.";
430
+ }
431
+ }
432
+ return `Error: Unexpected error occurred: ${error instanceof Error ? error.message : String(error)}`;
433
+ }
434
+ ```
435
+
436
+ ## Shared Utilities
437
+
438
+ Extract common functionality into reusable functions:
439
+
440
+ ```typescript
441
+ // Shared API request function
442
+ async function makeApiRequest<T>(
443
+ endpoint: string,
444
+ method: "GET" | "POST" | "PUT" | "DELETE" = "GET",
445
+ data?: any,
446
+ params?: any
447
+ ): Promise<T> {
448
+ try {
449
+ const response = await axios({
450
+ method,
451
+ url: `${API_BASE_URL}/${endpoint}`,
452
+ data,
453
+ params,
454
+ timeout: 30000,
455
+ headers: {
456
+ "Content-Type": "application/json",
457
+ "Accept": "application/json"
458
+ }
459
+ });
460
+ return response.data;
461
+ } catch (error) {
462
+ throw error;
463
+ }
464
+ }
465
+ ```
466
+
467
+ ## Async/Await Best Practices
468
+
469
+ Always use async/await for network requests and I/O operations:
470
+
471
+ ```typescript
472
+ // Good: Async network request
473
+ async function fetchData(resourceId: string): Promise<ResourceData> {
474
+ const response = await axios.get(`${API_URL}/resource/${resourceId}`);
475
+ return response.data;
476
+ }
477
+
478
+ // Bad: Promise chains
479
+ function fetchData(resourceId: string): Promise<ResourceData> {
480
+ return axios.get(`${API_URL}/resource/${resourceId}`)
481
+ .then(response => response.data); // Harder to read and maintain
482
+ }
483
+ ```
484
+
485
+ ## TypeScript Best Practices
486
+
487
+ 1. **Use Strict TypeScript**: Enable strict mode in tsconfig.json
488
+ 2. **Define Interfaces**: Create clear interface definitions for all data structures
489
+ 3. **Avoid `any`**: Use proper types or `unknown` instead of `any`
490
+ 4. **Zod for Runtime Validation**: Use Zod schemas to validate external data
491
+ 5. **Type Guards**: Create type guard functions for complex type checking
492
+ 6. **Error Handling**: Always use try-catch with proper error type checking
493
+ 7. **Null Safety**: Use optional chaining (`?.`) and nullish coalescing (`??`)
494
+
495
+ ```typescript
496
+ // Good: Type-safe with Zod and interfaces
497
+ interface UserResponse {
498
+ id: string;
499
+ name: string;
500
+ email: string;
501
+ team?: string;
502
+ active: boolean;
503
+ }
504
+
505
+ const UserSchema = z.object({
506
+ id: z.string(),
507
+ name: z.string(),
508
+ email: z.string().email(),
509
+ team: z.string().optional(),
510
+ active: z.boolean()
511
+ });
512
+
513
+ type User = z.infer<typeof UserSchema>;
514
+
515
+ async function getUser(id: string): Promise<User> {
516
+ const data = await apiCall(`/users/${id}`);
517
+ return UserSchema.parse(data); // Runtime validation
518
+ }
519
+
520
+ // Bad: Using any
521
+ async function getUser(id: string): Promise<any> {
522
+ return await apiCall(`/users/${id}`); // No type safety
523
+ }
524
+ ```
525
+
526
+ ## Package Configuration
527
+
528
+ ### package.json
529
+
530
+ ```json
531
+ {
532
+ "name": "{service}-mcp-server",
533
+ "version": "1.0.0",
534
+ "description": "MCP server for {Service} API integration",
535
+ "type": "module",
536
+ "main": "dist/index.js",
537
+ "scripts": {
538
+ "start": "node dist/index.js",
539
+ "dev": "tsx watch src/index.ts",
540
+ "build": "tsc",
541
+ "clean": "rm -rf dist"
542
+ },
543
+ "engines": {
544
+ "node": ">=18"
545
+ },
546
+ "dependencies": {
547
+ "@modelcontextprotocol/sdk": "^1.6.1",
548
+ "axios": "^1.7.9",
549
+ "zod": "^3.23.8"
550
+ },
551
+ "devDependencies": {
552
+ "@types/node": "^22.10.0",
553
+ "tsx": "^4.19.2",
554
+ "typescript": "^5.7.2"
555
+ }
556
+ }
557
+ ```
558
+
559
+ ### tsconfig.json
560
+
561
+ ```json
562
+ {
563
+ "compilerOptions": {
564
+ "target": "ES2022",
565
+ "module": "Node16",
566
+ "moduleResolution": "Node16",
567
+ "lib": ["ES2022"],
568
+ "outDir": "./dist",
569
+ "rootDir": "./src",
570
+ "strict": true,
571
+ "esModuleInterop": true,
572
+ "skipLibCheck": true,
573
+ "forceConsistentCasingInFileNames": true,
574
+ "declaration": true,
575
+ "declarationMap": true,
576
+ "sourceMap": true,
577
+ "allowSyntheticDefaultImports": true
578
+ },
579
+ "include": ["src/**/*"],
580
+ "exclude": ["node_modules", "dist"]
581
+ }
582
+ ```
583
+
584
+ ## Complete Example
585
+
586
+ ```typescript
587
+ #!/usr/bin/env node
588
+ /**
589
+ * MCP Server for Example Service.
590
+ *
591
+ * This server provides tools to interact with Example API, including user search,
592
+ * project management, and data export capabilities.
593
+ */
594
+
595
+ import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
596
+ import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
597
+ import { z } from "zod";
598
+ import axios, { AxiosError } from "axios";
599
+
600
+ // Constants
601
+ const API_BASE_URL = "https://api.example.com/v1";
602
+ const CHARACTER_LIMIT = 25000;
603
+
604
+ // Enums
605
+ enum ResponseFormat {
606
+ MARKDOWN = "markdown",
607
+ JSON = "json"
608
+ }
609
+
610
+ // Zod schemas
611
+ const UserSearchInputSchema = z.object({
612
+ query: z.string()
613
+ .min(2, "Query must be at least 2 characters")
614
+ .max(200, "Query must not exceed 200 characters")
615
+ .describe("Search string to match against names/emails"),
616
+ limit: z.number()
617
+ .int()
618
+ .min(1)
619
+ .max(100)
620
+ .default(20)
621
+ .describe("Maximum results to return"),
622
+ offset: z.number()
623
+ .int()
624
+ .min(0)
625
+ .default(0)
626
+ .describe("Number of results to skip for pagination"),
627
+ response_format: z.nativeEnum(ResponseFormat)
628
+ .default(ResponseFormat.MARKDOWN)
629
+ .describe("Output format: 'markdown' for human-readable or 'json' for machine-readable")
630
+ }).strict();
631
+
632
+ type UserSearchInput = z.infer<typeof UserSearchInputSchema>;
633
+
634
+ // Shared utility functions
635
+ async function makeApiRequest<T>(
636
+ endpoint: string,
637
+ method: "GET" | "POST" | "PUT" | "DELETE" = "GET",
638
+ data?: any,
639
+ params?: any
640
+ ): Promise<T> {
641
+ try {
642
+ const response = await axios({
643
+ method,
644
+ url: `${API_BASE_URL}/${endpoint}`,
645
+ data,
646
+ params,
647
+ timeout: 30000,
648
+ headers: {
649
+ "Content-Type": "application/json",
650
+ "Accept": "application/json"
651
+ }
652
+ });
653
+ return response.data;
654
+ } catch (error) {
655
+ throw error;
656
+ }
657
+ }
658
+
659
+ function handleApiError(error: unknown): string {
660
+ if (error instanceof AxiosError) {
661
+ if (error.response) {
662
+ switch (error.response.status) {
663
+ case 404:
664
+ return "Error: Resource not found. Please check the ID is correct.";
665
+ case 403:
666
+ return "Error: Permission denied. You don't have access to this resource.";
667
+ case 429:
668
+ return "Error: Rate limit exceeded. Please wait before making more requests.";
669
+ default:
670
+ return `Error: API request failed with status ${error.response.status}`;
671
+ }
672
+ } else if (error.code === "ECONNABORTED") {
673
+ return "Error: Request timed out. Please try again.";
674
+ }
675
+ }
676
+ return `Error: Unexpected error occurred: ${error instanceof Error ? error.message : String(error)}`;
677
+ }
678
+
679
+ // Create MCP server instance
680
+ const server = new McpServer({
681
+ name: "example-mcp",
682
+ version: "1.0.0"
683
+ });
684
+
685
+ // Register tools
686
+ server.registerTool(
687
+ "example_search_users",
688
+ {
689
+ title: "Search Example Users",
690
+ description: `[Full description as shown above]`,
691
+ inputSchema: UserSearchInputSchema,
692
+ annotations: {
693
+ readOnlyHint: true,
694
+ destructiveHint: false,
695
+ idempotentHint: true,
696
+ openWorldHint: true
697
+ }
698
+ },
699
+ async (params: UserSearchInput) => {
700
+ // Implementation as shown above
701
+ }
702
+ );
703
+
704
+ // Main function
705
+ // For stdio (local):
706
+ async function runStdio() {
707
+ if (!process.env.EXAMPLE_API_KEY) {
708
+ console.error("ERROR: EXAMPLE_API_KEY environment variable is required");
709
+ process.exit(1);
710
+ }
711
+
712
+ const transport = new StdioServerTransport();
713
+ await server.connect(transport);
714
+ console.error("MCP server running via stdio");
715
+ }
716
+
717
+ // For streamable HTTP (remote):
718
+ async function runHTTP() {
719
+ if (!process.env.EXAMPLE_API_KEY) {
720
+ console.error("ERROR: EXAMPLE_API_KEY environment variable is required");
721
+ process.exit(1);
722
+ }
723
+
724
+ const app = express();
725
+ app.use(express.json());
726
+
727
+ app.post('/mcp', async (req, res) => {
728
+ const transport = new StreamableHTTPServerTransport({
729
+ sessionIdGenerator: undefined,
730
+ enableJsonResponse: true
731
+ });
732
+ res.on('close', () => transport.close());
733
+ await server.connect(transport);
734
+ await transport.handleRequest(req, res, req.body);
735
+ });
736
+
737
+ const port = parseInt(process.env.PORT || '3000');
738
+ app.listen(port, () => {
739
+ console.error(`MCP server running on http://localhost:${port}/mcp`);
740
+ });
741
+ }
742
+
743
+ // Choose transport based on environment
744
+ const transport = process.env.TRANSPORT || 'stdio';
745
+ if (transport === 'http') {
746
+ runHTTP().catch(error => {
747
+ console.error("Server error:", error);
748
+ process.exit(1);
749
+ });
750
+ } else {
751
+ runStdio().catch(error => {
752
+ console.error("Server error:", error);
753
+ process.exit(1);
754
+ });
755
+ }
756
+ ```
757
+
758
+ ---
759
+
760
+ ## Advanced MCP Features
761
+
762
+ ### Resource Registration
763
+
764
+ Expose data as resources for efficient, URI-based access:
765
+
766
+ ```typescript
767
+ import { ResourceTemplate } from "@modelcontextprotocol/sdk/types.js";
768
+
769
+ // Register a resource with URI template
770
+ server.registerResource(
771
+ {
772
+ uri: "file://documents/{name}",
773
+ name: "Document Resource",
774
+ description: "Access documents by name",
775
+ mimeType: "text/plain"
776
+ },
777
+ async (uri: string) => {
778
+ // Extract parameter from URI
779
+ const match = uri.match(/^file:\/\/documents\/(.+)$/);
780
+ if (!match) {
781
+ throw new Error("Invalid URI format");
782
+ }
783
+
784
+ const documentName = match[1];
785
+ const content = await loadDocument(documentName);
786
+
787
+ return {
788
+ contents: [{
789
+ uri,
790
+ mimeType: "text/plain",
791
+ text: content
792
+ }]
793
+ };
794
+ }
795
+ );
796
+
797
+ // List available resources dynamically
798
+ server.registerResourceList(async () => {
799
+ const documents = await getAvailableDocuments();
800
+ return {
801
+ resources: documents.map(doc => ({
802
+ uri: `file://documents/${doc.name}`,
803
+ name: doc.name,
804
+ mimeType: "text/plain",
805
+ description: doc.description
806
+ }))
807
+ };
808
+ });
809
+ ```
810
+
811
+ **When to use Resources vs Tools:**
812
+ - **Resources**: For data access with simple URI-based parameters
813
+ - **Tools**: For complex operations requiring validation and business logic
814
+ - **Resources**: When data is relatively static or template-based
815
+ - **Tools**: When operations have side effects or complex workflows
816
+
817
+ ### Transport Options
818
+
819
+ The TypeScript SDK supports two main transport mechanisms:
820
+
821
+ #### Streamable HTTP (Recommended for Remote Servers)
822
+
823
+ ```typescript
824
+ import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
825
+ import express from "express";
826
+
827
+ const app = express();
828
+ app.use(express.json());
829
+
830
+ app.post('/mcp', async (req, res) => {
831
+ // Create new transport for each request (stateless, prevents request ID collisions)
832
+ const transport = new StreamableHTTPServerTransport({
833
+ sessionIdGenerator: undefined,
834
+ enableJsonResponse: true
835
+ });
836
+
837
+ res.on('close', () => transport.close());
838
+
839
+ await server.connect(transport);
840
+ await transport.handleRequest(req, res, req.body);
841
+ });
842
+
843
+ app.listen(3000);
844
+ ```
845
+
846
+ #### stdio (For Local Integrations)
847
+
848
+ ```typescript
849
+ import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
850
+
851
+ const transport = new StdioServerTransport();
852
+ await server.connect(transport);
853
+ ```
854
+
855
+ **Transport selection:**
856
+ - **Streamable HTTP**: Web services, remote access, multiple clients
857
+ - **stdio**: Command-line tools, local development, subprocess integration
858
+
859
+ ### Notification Support
860
+
861
+ Notify clients when server state changes:
862
+
863
+ ```typescript
864
+ // Notify when tools list changes
865
+ server.notification({
866
+ method: "notifications/tools/list_changed"
867
+ });
868
+
869
+ // Notify when resources change
870
+ server.notification({
871
+ method: "notifications/resources/list_changed"
872
+ });
873
+ ```
874
+
875
+ Use notifications sparingly - only when server capabilities genuinely change.
876
+
877
+ ---
878
+
879
+ ## Code Best Practices
880
+
881
+ ### Code Composability and Reusability
882
+
883
+ Your implementation MUST prioritize composability and code reuse:
884
+
885
+ 1. **Extract Common Functionality**:
886
+ - Create reusable helper functions for operations used across multiple tools
887
+ - Build shared API clients for HTTP requests instead of duplicating code
888
+ - Centralize error handling logic in utility functions
889
+ - Extract business logic into dedicated functions that can be composed
890
+ - Extract shared markdown or JSON field selection & formatting functionality
891
+
892
+ 2. **Avoid Duplication**:
893
+ - NEVER copy-paste similar code between tools
894
+ - If you find yourself writing similar logic twice, extract it into a function
895
+ - Common operations like pagination, filtering, field selection, and formatting should be shared
896
+ - Authentication/authorization logic should be centralized
897
+
898
+ ## Building and Running
899
+
900
+ Always build your TypeScript code before running:
901
+
902
+ ```bash
903
+ # Build the project
904
+ npm run build
905
+
906
+ # Run the server
907
+ npm start
908
+
909
+ # Development with auto-reload
910
+ npm run dev
911
+ ```
912
+
913
+ Always ensure `npm run build` completes successfully before considering the implementation complete.
914
+
915
+ ## Quality Checklist
916
+
917
+ Before finalizing your Node/TypeScript MCP server implementation, ensure:
918
+
919
+ ### Strategic Design
920
+ - [ ] Tools enable complete workflows, not just API endpoint wrappers
921
+ - [ ] Tool names reflect natural task subdivisions
922
+ - [ ] Response formats optimize for agent context efficiency
923
+ - [ ] Human-readable identifiers used where appropriate
924
+ - [ ] Error messages guide agents toward correct usage
925
+
926
+ ### Implementation Quality
927
+ - [ ] FOCUSED IMPLEMENTATION: Most important and valuable tools implemented
928
+ - [ ] All tools registered using `registerTool` with complete configuration
929
+ - [ ] All tools include `title`, `description`, `inputSchema`, and `annotations`
930
+ - [ ] Annotations correctly set (readOnlyHint, destructiveHint, idempotentHint, openWorldHint)
931
+ - [ ] All tools use Zod schemas for runtime input validation with `.strict()` enforcement
932
+ - [ ] All Zod schemas have proper constraints and descriptive error messages
933
+ - [ ] All tools have comprehensive descriptions with explicit input/output types
934
+ - [ ] Descriptions include return value examples and complete schema documentation
935
+ - [ ] Error messages are clear, actionable, and educational
936
+
937
+ ### TypeScript Quality
938
+ - [ ] TypeScript interfaces are defined for all data structures
939
+ - [ ] Strict TypeScript is enabled in tsconfig.json
940
+ - [ ] No use of `any` type - use `unknown` or proper types instead
941
+ - [ ] All async functions have explicit Promise<T> return types
942
+ - [ ] Error handling uses proper type guards (e.g., `axios.isAxiosError`, `z.ZodError`)
943
+
944
+ ### Advanced Features (where applicable)
945
+ - [ ] Resources registered for appropriate data endpoints
946
+ - [ ] Appropriate transport configured (stdio or streamable HTTP)
947
+ - [ ] Notifications implemented for dynamic server capabilities
948
+ - [ ] Type-safe with SDK interfaces
949
+
950
+ ### Project Configuration
951
+ - [ ] Package.json includes all necessary dependencies
952
+ - [ ] Build script produces working JavaScript in dist/ directory
953
+ - [ ] Main entry point is properly configured as dist/index.js
954
+ - [ ] Server name follows format: `{service}-mcp-server`
955
+ - [ ] tsconfig.json properly configured with strict mode
956
+
957
+ ### Code Quality
958
+ - [ ] Pagination is properly implemented where applicable
959
+ - [ ] Large responses check CHARACTER_LIMIT constant and truncate with clear messages
960
+ - [ ] Filtering options are provided for potentially large result sets
961
+ - [ ] All network operations handle timeouts and connection errors gracefully
962
+ - [ ] Common functionality is extracted into reusable functions
963
+ - [ ] Return types are consistent across similar operations
964
+
965
+ ### Testing and Build
966
+ - [ ] `npm run build` completes successfully without errors
967
+ - [ ] dist/index.js created and executable
968
+ - [ ] Server runs: `node dist/index.js --help`
969
+ - [ ] All imports resolve correctly
970
+ - [ ] Sample tool calls work as expected
mcp-builder/reference/python_mcp_server.md ADDED
@@ -0,0 +1,719 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python MCP Server Implementation Guide
2
+
3
+ ## Overview
4
+
5
+ This document provides Python-specific best practices and examples for implementing MCP servers using the MCP Python SDK. It covers server setup, tool registration patterns, input validation with Pydantic, error handling, and complete working examples.
6
+
7
+ ---
8
+
9
+ ## Quick Reference
10
+
11
+ ### Key Imports
12
+ ```python
13
+ from mcp.server.fastmcp import FastMCP
14
+ from pydantic import BaseModel, Field, field_validator, ConfigDict
15
+ from typing import Optional, List, Dict, Any
16
+ from enum import Enum
17
+ import httpx
18
+ ```
19
+
20
+ ### Server Initialization
21
+ ```python
22
+ mcp = FastMCP("service_mcp")
23
+ ```
24
+
25
+ ### Tool Registration Pattern
26
+ ```python
27
+ @mcp.tool(name="tool_name", annotations={...})
28
+ async def tool_function(params: InputModel) -> str:
29
+ # Implementation
30
+ pass
31
+ ```
32
+
33
+ ---
34
+
35
+ ## MCP Python SDK and FastMCP
36
+
37
+ The official MCP Python SDK provides FastMCP, a high-level framework for building MCP servers. It provides:
38
+ - Automatic description and inputSchema generation from function signatures and docstrings
39
+ - Pydantic model integration for input validation
40
+ - Decorator-based tool registration with `@mcp.tool`
41
+
42
+ **For complete SDK documentation, use WebFetch to load:**
43
+ `https://raw.githubusercontent.com/modelcontextprotocol/python-sdk/main/README.md`
44
+
45
+ ## Server Naming Convention
46
+
47
+ Python MCP servers must follow this naming pattern:
48
+ - **Format**: `{service}_mcp` (lowercase with underscores)
49
+ - **Examples**: `github_mcp`, `jira_mcp`, `stripe_mcp`
50
+
51
+ The name should be:
52
+ - General (not tied to specific features)
53
+ - Descriptive of the service/API being integrated
54
+ - Easy to infer from the task description
55
+ - Without version numbers or dates
56
+
57
+ ## Tool Implementation
58
+
59
+ ### Tool Naming
60
+
61
+ Use snake_case for tool names (e.g., "search_users", "create_project", "get_channel_info") with clear, action-oriented names.
62
+
63
+ **Avoid Naming Conflicts**: Include the service context to prevent overlaps:
64
+ - Use "slack_send_message" instead of just "send_message"
65
+ - Use "github_create_issue" instead of just "create_issue"
66
+ - Use "asana_list_tasks" instead of just "list_tasks"
67
+
68
+ ### Tool Structure with FastMCP
69
+
70
+ Tools are defined using the `@mcp.tool` decorator with Pydantic models for input validation:
71
+
72
+ ```python
73
+ from pydantic import BaseModel, Field, ConfigDict
74
+ from mcp.server.fastmcp import FastMCP
75
+
76
+ # Initialize the MCP server
77
+ mcp = FastMCP("example_mcp")
78
+
79
+ # Define Pydantic model for input validation
80
+ class ServiceToolInput(BaseModel):
81
+ '''Input model for service tool operation.'''
82
+ model_config = ConfigDict(
83
+ str_strip_whitespace=True, # Auto-strip whitespace from strings
84
+ validate_assignment=True, # Validate on assignment
85
+ extra='forbid' # Forbid extra fields
86
+ )
87
+
88
+ param1: str = Field(..., description="First parameter description (e.g., 'user123', 'project-abc')", min_length=1, max_length=100)
89
+ param2: Optional[int] = Field(default=None, description="Optional integer parameter with constraints", ge=0, le=1000)
90
+ tags: Optional[List[str]] = Field(default_factory=list, description="List of tags to apply", max_items=10)
91
+
92
+ @mcp.tool(
93
+ name="service_tool_name",
94
+ annotations={
95
+ "title": "Human-Readable Tool Title",
96
+ "readOnlyHint": True, # Tool does not modify environment
97
+ "destructiveHint": False, # Tool does not perform destructive operations
98
+ "idempotentHint": True, # Repeated calls have no additional effect
99
+ "openWorldHint": False # Tool does not interact with external entities
100
+ }
101
+ )
102
+ async def service_tool_name(params: ServiceToolInput) -> str:
103
+ '''Tool description automatically becomes the 'description' field.
104
+
105
+ This tool performs a specific operation on the service. It validates all inputs
106
+ using the ServiceToolInput Pydantic model before processing.
107
+
108
+ Args:
109
+ params (ServiceToolInput): Validated input parameters containing:
110
+ - param1 (str): First parameter description
111
+ - param2 (Optional[int]): Optional parameter with default
112
+ - tags (Optional[List[str]]): List of tags
113
+
114
+ Returns:
115
+ str: JSON-formatted response containing operation results
116
+ '''
117
+ # Implementation here
118
+ pass
119
+ ```
120
+
121
+ ## Pydantic v2 Key Features
122
+
123
+ - Use `model_config` instead of nested `Config` class
124
+ - Use `field_validator` instead of deprecated `validator`
125
+ - Use `model_dump()` instead of deprecated `dict()`
126
+ - Validators require `@classmethod` decorator
127
+ - Type hints are required for validator methods
128
+
129
+ ```python
130
+ from pydantic import BaseModel, Field, field_validator, ConfigDict
131
+
132
+ class CreateUserInput(BaseModel):
133
+ model_config = ConfigDict(
134
+ str_strip_whitespace=True,
135
+ validate_assignment=True
136
+ )
137
+
138
+ name: str = Field(..., description="User's full name", min_length=1, max_length=100)
139
+ email: str = Field(..., description="User's email address", pattern=r'^[\w\.-]+@[\w\.-]+\.\w+$')
140
+ age: int = Field(..., description="User's age", ge=0, le=150)
141
+
142
+ @field_validator('email')
143
+ @classmethod
144
+ def validate_email(cls, v: str) -> str:
145
+ if not v.strip():
146
+ raise ValueError("Email cannot be empty")
147
+ return v.lower()
148
+ ```
149
+
150
+ ## Response Format Options
151
+
152
+ Support multiple output formats for flexibility:
153
+
154
+ ```python
155
+ from enum import Enum
156
+
157
+ class ResponseFormat(str, Enum):
158
+ '''Output format for tool responses.'''
159
+ MARKDOWN = "markdown"
160
+ JSON = "json"
161
+
162
+ class UserSearchInput(BaseModel):
163
+ query: str = Field(..., description="Search query")
164
+ response_format: ResponseFormat = Field(
165
+ default=ResponseFormat.MARKDOWN,
166
+ description="Output format: 'markdown' for human-readable or 'json' for machine-readable"
167
+ )
168
+ ```
169
+
170
+ **Markdown format**:
171
+ - Use headers, lists, and formatting for clarity
172
+ - Convert timestamps to human-readable format (e.g., "2024-01-15 10:30:00 UTC" instead of epoch)
173
+ - Show display names with IDs in parentheses (e.g., "@john.doe (U123456)")
174
+ - Omit verbose metadata (e.g., show only one profile image URL, not all sizes)
175
+ - Group related information logically
176
+
177
+ **JSON format**:
178
+ - Return complete, structured data suitable for programmatic processing
179
+ - Include all available fields and metadata
180
+ - Use consistent field names and types
181
+
182
+ ## Pagination Implementation
183
+
184
+ For tools that list resources:
185
+
186
+ ```python
187
+ class ListInput(BaseModel):
188
+ limit: Optional[int] = Field(default=20, description="Maximum results to return", ge=1, le=100)
189
+ offset: Optional[int] = Field(default=0, description="Number of results to skip for pagination", ge=0)
190
+
191
+ async def list_items(params: ListInput) -> str:
192
+ # Make API request with pagination
193
+ data = await api_request(limit=params.limit, offset=params.offset)
194
+
195
+ # Return pagination info
196
+ response = {
197
+ "total": data["total"],
198
+ "count": len(data["items"]),
199
+ "offset": params.offset,
200
+ "items": data["items"],
201
+ "has_more": data["total"] > params.offset + len(data["items"]),
202
+ "next_offset": params.offset + len(data["items"]) if data["total"] > params.offset + len(data["items"]) else None
203
+ }
204
+ return json.dumps(response, indent=2)
205
+ ```
206
+
207
+ ## Error Handling
208
+
209
+ Provide clear, actionable error messages:
210
+
211
+ ```python
212
+ def _handle_api_error(e: Exception) -> str:
213
+ '''Consistent error formatting across all tools.'''
214
+ if isinstance(e, httpx.HTTPStatusError):
215
+ if e.response.status_code == 404:
216
+ return "Error: Resource not found. Please check the ID is correct."
217
+ elif e.response.status_code == 403:
218
+ return "Error: Permission denied. You don't have access to this resource."
219
+ elif e.response.status_code == 429:
220
+ return "Error: Rate limit exceeded. Please wait before making more requests."
221
+ return f"Error: API request failed with status {e.response.status_code}"
222
+ elif isinstance(e, httpx.TimeoutException):
223
+ return "Error: Request timed out. Please try again."
224
+ return f"Error: Unexpected error occurred: {type(e).__name__}"
225
+ ```
226
+
227
+ ## Shared Utilities
228
+
229
+ Extract common functionality into reusable functions:
230
+
231
+ ```python
232
+ # Shared API request function
233
+ async def _make_api_request(endpoint: str, method: str = "GET", **kwargs) -> dict:
234
+ '''Reusable function for all API calls.'''
235
+ async with httpx.AsyncClient() as client:
236
+ response = await client.request(
237
+ method,
238
+ f"{API_BASE_URL}/{endpoint}",
239
+ timeout=30.0,
240
+ **kwargs
241
+ )
242
+ response.raise_for_status()
243
+ return response.json()
244
+ ```
245
+
246
+ ## Async/Await Best Practices
247
+
248
+ Always use async/await for network requests and I/O operations:
249
+
250
+ ```python
251
+ # Good: Async network request
252
+ async def fetch_data(resource_id: str) -> dict:
253
+ async with httpx.AsyncClient() as client:
254
+ response = await client.get(f"{API_URL}/resource/{resource_id}")
255
+ response.raise_for_status()
256
+ return response.json()
257
+
258
+ # Bad: Synchronous request
259
+ def fetch_data(resource_id: str) -> dict:
260
+ response = requests.get(f"{API_URL}/resource/{resource_id}") # Blocks
261
+ return response.json()
262
+ ```
263
+
264
+ ## Type Hints
265
+
266
+ Use type hints throughout:
267
+
268
+ ```python
269
+ from typing import Optional, List, Dict, Any
270
+
271
+ async def get_user(user_id: str) -> Dict[str, Any]:
272
+ data = await fetch_user(user_id)
273
+ return {"id": data["id"], "name": data["name"]}
274
+ ```
275
+
276
+ ## Tool Docstrings
277
+
278
+ Every tool must have comprehensive docstrings with explicit type information:
279
+
280
+ ```python
281
+ async def search_users(params: UserSearchInput) -> str:
282
+ '''
283
+ Search for users in the Example system by name, email, or team.
284
+
285
+ This tool searches across all user profiles in the Example platform,
286
+ supporting partial matches and various search filters. It does NOT
287
+ create or modify users, only searches existing ones.
288
+
289
+ Args:
290
+ params (UserSearchInput): Validated input parameters containing:
291
+ - query (str): Search string to match against names/emails (e.g., "john", "@example.com", "team:marketing")
292
+ - limit (Optional[int]): Maximum results to return, between 1-100 (default: 20)
293
+ - offset (Optional[int]): Number of results to skip for pagination (default: 0)
294
+
295
+ Returns:
296
+ str: JSON-formatted string containing search results with the following schema:
297
+
298
+ Success response:
299
+ {
300
+ "total": int, # Total number of matches found
301
+ "count": int, # Number of results in this response
302
+ "offset": int, # Current pagination offset
303
+ "users": [
304
+ {
305
+ "id": str, # User ID (e.g., "U123456789")
306
+ "name": str, # Full name (e.g., "John Doe")
307
+ "email": str, # Email address (e.g., "john@example.com")
308
+ "team": str # Team name (e.g., "Marketing") - optional
309
+ }
310
+ ]
311
+ }
312
+
313
+ Error response:
314
+ "Error: <error message>" or "No users found matching '<query>'"
315
+
316
+ Examples:
317
+ - Use when: "Find all marketing team members" -> params with query="team:marketing"
318
+ - Use when: "Search for John's account" -> params with query="john"
319
+ - Don't use when: You need to create a user (use example_create_user instead)
320
+ - Don't use when: You have a user ID and need full details (use example_get_user instead)
321
+
322
+ Error Handling:
323
+ - Input validation errors are handled by Pydantic model
324
+ - Returns "Error: Rate limit exceeded" if too many requests (429 status)
325
+ - Returns "Error: Invalid API authentication" if API key is invalid (401 status)
326
+ - Returns formatted list of results or "No users found matching 'query'"
327
+ '''
328
+ ```
329
+
330
+ ## Complete Example
331
+
332
+ See below for a complete Python MCP server example:
333
+
334
+ ```python
335
+ #!/usr/bin/env python3
336
+ '''
337
+ MCP Server for Example Service.
338
+
339
+ This server provides tools to interact with Example API, including user search,
340
+ project management, and data export capabilities.
341
+ '''
342
+
343
+ from typing import Optional, List, Dict, Any
344
+ from enum import Enum
345
+ import httpx
346
+ from pydantic import BaseModel, Field, field_validator, ConfigDict
347
+ from mcp.server.fastmcp import FastMCP
348
+
349
+ # Initialize the MCP server
350
+ mcp = FastMCP("example_mcp")
351
+
352
+ # Constants
353
+ API_BASE_URL = "https://api.example.com/v1"
354
+
355
+ # Enums
356
+ class ResponseFormat(str, Enum):
357
+ '''Output format for tool responses.'''
358
+ MARKDOWN = "markdown"
359
+ JSON = "json"
360
+
361
+ # Pydantic Models for Input Validation
362
+ class UserSearchInput(BaseModel):
363
+ '''Input model for user search operations.'''
364
+ model_config = ConfigDict(
365
+ str_strip_whitespace=True,
366
+ validate_assignment=True
367
+ )
368
+
369
+ query: str = Field(..., description="Search string to match against names/emails", min_length=2, max_length=200)
370
+ limit: Optional[int] = Field(default=20, description="Maximum results to return", ge=1, le=100)
371
+ offset: Optional[int] = Field(default=0, description="Number of results to skip for pagination", ge=0)
372
+ response_format: ResponseFormat = Field(default=ResponseFormat.MARKDOWN, description="Output format")
373
+
374
+ @field_validator('query')
375
+ @classmethod
376
+ def validate_query(cls, v: str) -> str:
377
+ if not v.strip():
378
+ raise ValueError("Query cannot be empty or whitespace only")
379
+ return v.strip()
380
+
381
+ # Shared utility functions
382
+ async def _make_api_request(endpoint: str, method: str = "GET", **kwargs) -> dict:
383
+ '''Reusable function for all API calls.'''
384
+ async with httpx.AsyncClient() as client:
385
+ response = await client.request(
386
+ method,
387
+ f"{API_BASE_URL}/{endpoint}",
388
+ timeout=30.0,
389
+ **kwargs
390
+ )
391
+ response.raise_for_status()
392
+ return response.json()
393
+
394
+ def _handle_api_error(e: Exception) -> str:
395
+ '''Consistent error formatting across all tools.'''
396
+ if isinstance(e, httpx.HTTPStatusError):
397
+ if e.response.status_code == 404:
398
+ return "Error: Resource not found. Please check the ID is correct."
399
+ elif e.response.status_code == 403:
400
+ return "Error: Permission denied. You don't have access to this resource."
401
+ elif e.response.status_code == 429:
402
+ return "Error: Rate limit exceeded. Please wait before making more requests."
403
+ return f"Error: API request failed with status {e.response.status_code}"
404
+ elif isinstance(e, httpx.TimeoutException):
405
+ return "Error: Request timed out. Please try again."
406
+ return f"Error: Unexpected error occurred: {type(e).__name__}"
407
+
408
+ # Tool definitions
409
+ @mcp.tool(
410
+ name="example_search_users",
411
+ annotations={
412
+ "title": "Search Example Users",
413
+ "readOnlyHint": True,
414
+ "destructiveHint": False,
415
+ "idempotentHint": True,
416
+ "openWorldHint": True
417
+ }
418
+ )
419
+ async def example_search_users(params: UserSearchInput) -> str:
420
+ '''Search for users in the Example system by name, email, or team.
421
+
422
+ [Full docstring as shown above]
423
+ '''
424
+ try:
425
+ # Make API request using validated parameters
426
+ data = await _make_api_request(
427
+ "users/search",
428
+ params={
429
+ "q": params.query,
430
+ "limit": params.limit,
431
+ "offset": params.offset
432
+ }
433
+ )
434
+
435
+ users = data.get("users", [])
436
+ total = data.get("total", 0)
437
+
438
+ if not users:
439
+ return f"No users found matching '{params.query}'"
440
+
441
+ # Format response based on requested format
442
+ if params.response_format == ResponseFormat.MARKDOWN:
443
+ lines = [f"# User Search Results: '{params.query}'", ""]
444
+ lines.append(f"Found {total} users (showing {len(users)})")
445
+ lines.append("")
446
+
447
+ for user in users:
448
+ lines.append(f"## {user['name']} ({user['id']})")
449
+ lines.append(f"- **Email**: {user['email']}")
450
+ if user.get('team'):
451
+ lines.append(f"- **Team**: {user['team']}")
452
+ lines.append("")
453
+
454
+ return "\n".join(lines)
455
+
456
+ else:
457
+ # Machine-readable JSON format
458
+ import json
459
+ response = {
460
+ "total": total,
461
+ "count": len(users),
462
+ "offset": params.offset,
463
+ "users": users
464
+ }
465
+ return json.dumps(response, indent=2)
466
+
467
+ except Exception as e:
468
+ return _handle_api_error(e)
469
+
470
+ if __name__ == "__main__":
471
+ mcp.run()
472
+ ```
473
+
474
+ ---
475
+
476
+ ## Advanced FastMCP Features
477
+
478
+ ### Context Parameter Injection
479
+
480
+ FastMCP can automatically inject a `Context` parameter into tools for advanced capabilities like logging, progress reporting, resource reading, and user interaction:
481
+
482
+ ```python
483
+ from mcp.server.fastmcp import FastMCP, Context
484
+
485
+ mcp = FastMCP("example_mcp")
486
+
487
+ @mcp.tool()
488
+ async def advanced_search(query: str, ctx: Context) -> str:
489
+ '''Advanced tool with context access for logging and progress.'''
490
+
491
+ # Report progress for long operations
492
+ await ctx.report_progress(0.25, "Starting search...")
493
+
494
+ # Log information for debugging
495
+ await ctx.log_info("Processing query", {"query": query, "timestamp": datetime.now()})
496
+
497
+ # Perform search
498
+ results = await search_api(query)
499
+ await ctx.report_progress(0.75, "Formatting results...")
500
+
501
+ # Access server configuration
502
+ server_name = ctx.fastmcp.name
503
+
504
+ return format_results(results)
505
+
506
+ @mcp.tool()
507
+ async def interactive_tool(resource_id: str, ctx: Context) -> str:
508
+ '''Tool that can request additional input from users.'''
509
+
510
+ # Request sensitive information when needed
511
+ api_key = await ctx.elicit(
512
+ prompt="Please provide your API key:",
513
+ input_type="password"
514
+ )
515
+
516
+ # Use the provided key
517
+ return await api_call(resource_id, api_key)
518
+ ```
519
+
520
+ **Context capabilities:**
521
+ - `ctx.report_progress(progress, message)` - Report progress for long operations
522
+ - `ctx.log_info(message, data)` / `ctx.log_error()` / `ctx.log_debug()` - Logging
523
+ - `ctx.elicit(prompt, input_type)` - Request input from users
524
+ - `ctx.fastmcp.name` - Access server configuration
525
+ - `ctx.read_resource(uri)` - Read MCP resources
526
+
527
+ ### Resource Registration
528
+
529
+ Expose data as resources for efficient, template-based access:
530
+
531
+ ```python
532
+ @mcp.resource("file://documents/{name}")
533
+ async def get_document(name: str) -> str:
534
+ '''Expose documents as MCP resources.
535
+
536
+ Resources are useful for static or semi-static data that doesn't
537
+ require complex parameters. They use URI templates for flexible access.
538
+ '''
539
+ document_path = f"./docs/{name}"
540
+ with open(document_path, "r") as f:
541
+ return f.read()
542
+
543
+ @mcp.resource("config://settings/{key}")
544
+ async def get_setting(key: str, ctx: Context) -> str:
545
+ '''Expose configuration as resources with context.'''
546
+ settings = await load_settings()
547
+ return json.dumps(settings.get(key, {}))
548
+ ```
549
+
550
+ **When to use Resources vs Tools:**
551
+ - **Resources**: For data access with simple parameters (URI templates)
552
+ - **Tools**: For complex operations with validation and business logic
553
+
554
+ ### Structured Output Types
555
+
556
+ FastMCP supports multiple return types beyond strings:
557
+
558
+ ```python
559
+ from typing import TypedDict
560
+ from dataclasses import dataclass
561
+ from pydantic import BaseModel
562
+
563
+ # TypedDict for structured returns
564
+ class UserData(TypedDict):
565
+ id: str
566
+ name: str
567
+ email: str
568
+
569
+ @mcp.tool()
570
+ async def get_user_typed(user_id: str) -> UserData:
571
+ '''Returns structured data - FastMCP handles serialization.'''
572
+ return {"id": user_id, "name": "John Doe", "email": "john@example.com"}
573
+
574
+ # Pydantic models for complex validation
575
+ class DetailedUser(BaseModel):
576
+ id: str
577
+ name: str
578
+ email: str
579
+ created_at: datetime
580
+ metadata: Dict[str, Any]
581
+
582
+ @mcp.tool()
583
+ async def get_user_detailed(user_id: str) -> DetailedUser:
584
+ '''Returns Pydantic model - automatically generates schema.'''
585
+ user = await fetch_user(user_id)
586
+ return DetailedUser(**user)
587
+ ```
588
+
589
+ ### Lifespan Management
590
+
591
+ Initialize resources that persist across requests:
592
+
593
+ ```python
594
+ from contextlib import asynccontextmanager
595
+
596
+ @asynccontextmanager
597
+ async def app_lifespan():
598
+ '''Manage resources that live for the server's lifetime.'''
599
+ # Initialize connections, load config, etc.
600
+ db = await connect_to_database()
601
+ config = load_configuration()
602
+
603
+ # Make available to all tools
604
+ yield {"db": db, "config": config}
605
+
606
+ # Cleanup on shutdown
607
+ await db.close()
608
+
609
+ mcp = FastMCP("example_mcp", lifespan=app_lifespan)
610
+
611
+ @mcp.tool()
612
+ async def query_data(query: str, ctx: Context) -> str:
613
+ '''Access lifespan resources through context.'''
614
+ db = ctx.request_context.lifespan_state["db"]
615
+ results = await db.query(query)
616
+ return format_results(results)
617
+ ```
618
+
619
+ ### Transport Options
620
+
621
+ FastMCP supports two main transport mechanisms:
622
+
623
+ ```python
624
+ # stdio transport (for local tools) - default
625
+ if __name__ == "__main__":
626
+ mcp.run()
627
+
628
+ # Streamable HTTP transport (for remote servers)
629
+ if __name__ == "__main__":
630
+ mcp.run(transport="streamable_http", port=8000)
631
+ ```
632
+
633
+ **Transport selection:**
634
+ - **stdio**: Command-line tools, local integrations, subprocess execution
635
+ - **Streamable HTTP**: Web services, remote access, multiple clients
636
+
637
+ ---
638
+
639
+ ## Code Best Practices
640
+
641
+ ### Code Composability and Reusability
642
+
643
+ Your implementation MUST prioritize composability and code reuse:
644
+
645
+ 1. **Extract Common Functionality**:
646
+ - Create reusable helper functions for operations used across multiple tools
647
+ - Build shared API clients for HTTP requests instead of duplicating code
648
+ - Centralize error handling logic in utility functions
649
+ - Extract business logic into dedicated functions that can be composed
650
+ - Extract shared markdown or JSON field selection & formatting functionality
651
+
652
+ 2. **Avoid Duplication**:
653
+ - NEVER copy-paste similar code between tools
654
+ - If you find yourself writing similar logic twice, extract it into a function
655
+ - Common operations like pagination, filtering, field selection, and formatting should be shared
656
+ - Authentication/authorization logic should be centralized
657
+
658
+ ### Python-Specific Best Practices
659
+
660
+ 1. **Use Type Hints**: Always include type annotations for function parameters and return values
661
+ 2. **Pydantic Models**: Define clear Pydantic models for all input validation
662
+ 3. **Avoid Manual Validation**: Let Pydantic handle input validation with constraints
663
+ 4. **Proper Imports**: Group imports (standard library, third-party, local)
664
+ 5. **Error Handling**: Use specific exception types (httpx.HTTPStatusError, not generic Exception)
665
+ 6. **Async Context Managers**: Use `async with` for resources that need cleanup
666
+ 7. **Constants**: Define module-level constants in UPPER_CASE
667
+
668
+ ## Quality Checklist
669
+
670
+ Before finalizing your Python MCP server implementation, ensure:
671
+
672
+ ### Strategic Design
673
+ - [ ] Tools enable complete workflows, not just API endpoint wrappers
674
+ - [ ] Tool names reflect natural task subdivisions
675
+ - [ ] Response formats optimize for agent context efficiency
676
+ - [ ] Human-readable identifiers used where appropriate
677
+ - [ ] Error messages guide agents toward correct usage
678
+
679
+ ### Implementation Quality
680
+ - [ ] FOCUSED IMPLEMENTATION: Most important and valuable tools implemented
681
+ - [ ] All tools have descriptive names and documentation
682
+ - [ ] Return types are consistent across similar operations
683
+ - [ ] Error handling is implemented for all external calls
684
+ - [ ] Server name follows format: `{service}_mcp`
685
+ - [ ] All network operations use async/await
686
+ - [ ] Common functionality is extracted into reusable functions
687
+ - [ ] Error messages are clear, actionable, and educational
688
+ - [ ] Outputs are properly validated and formatted
689
+
690
+ ### Tool Configuration
691
+ - [ ] All tools implement 'name' and 'annotations' in the decorator
692
+ - [ ] Annotations correctly set (readOnlyHint, destructiveHint, idempotentHint, openWorldHint)
693
+ - [ ] All tools use Pydantic BaseModel for input validation with Field() definitions
694
+ - [ ] All Pydantic Fields have explicit types and descriptions with constraints
695
+ - [ ] All tools have comprehensive docstrings with explicit input/output types
696
+ - [ ] Docstrings include complete schema structure for dict/JSON returns
697
+ - [ ] Pydantic models handle input validation (no manual validation needed)
698
+
699
+ ### Advanced Features (where applicable)
700
+ - [ ] Context injection used for logging, progress, or elicitation
701
+ - [ ] Resources registered for appropriate data endpoints
702
+ - [ ] Lifespan management implemented for persistent connections
703
+ - [ ] Structured output types used (TypedDict, Pydantic models)
704
+ - [ ] Appropriate transport configured (stdio or streamable HTTP)
705
+
706
+ ### Code Quality
707
+ - [ ] File includes proper imports including Pydantic imports
708
+ - [ ] Pagination is properly implemented where applicable
709
+ - [ ] Filtering options are provided for potentially large result sets
710
+ - [ ] All async functions are properly defined with `async def`
711
+ - [ ] HTTP client usage follows async patterns with proper context managers
712
+ - [ ] Type hints are used throughout the code
713
+ - [ ] Constants are defined at module level in UPPER_CASE
714
+
715
+ ### Testing
716
+ - [ ] Server runs successfully: `python your_server.py --help`
717
+ - [ ] All imports resolve correctly
718
+ - [ ] Sample tool calls work as expected
719
+ - [ ] Error scenarios handled gracefully
mcp-builder/scripts/connections.py ADDED
@@ -0,0 +1,151 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Lightweight connection handling for MCP servers."""
2
+
3
+ from abc import ABC, abstractmethod
4
+ from contextlib import AsyncExitStack
5
+ from typing import Any
6
+
7
+ from mcp import ClientSession, StdioServerParameters
8
+ from mcp.client.sse import sse_client
9
+ from mcp.client.stdio import stdio_client
10
+ from mcp.client.streamable_http import streamablehttp_client
11
+
12
+
13
+ class MCPConnection(ABC):
14
+ """Base class for MCP server connections."""
15
+
16
+ def __init__(self):
17
+ self.session = None
18
+ self._stack = None
19
+
20
+ @abstractmethod
21
+ def _create_context(self):
22
+ """Create the connection context based on connection type."""
23
+
24
+ async def __aenter__(self):
25
+ """Initialize MCP server connection."""
26
+ self._stack = AsyncExitStack()
27
+ await self._stack.__aenter__()
28
+
29
+ try:
30
+ ctx = self._create_context()
31
+ result = await self._stack.enter_async_context(ctx)
32
+
33
+ if len(result) == 2:
34
+ read, write = result
35
+ elif len(result) == 3:
36
+ read, write, _ = result
37
+ else:
38
+ raise ValueError(f"Unexpected context result: {result}")
39
+
40
+ session_ctx = ClientSession(read, write)
41
+ self.session = await self._stack.enter_async_context(session_ctx)
42
+ await self.session.initialize()
43
+ return self
44
+ except BaseException:
45
+ await self._stack.__aexit__(None, None, None)
46
+ raise
47
+
48
+ async def __aexit__(self, exc_type, exc_val, exc_tb):
49
+ """Clean up MCP server connection resources."""
50
+ if self._stack:
51
+ await self._stack.__aexit__(exc_type, exc_val, exc_tb)
52
+ self.session = None
53
+ self._stack = None
54
+
55
+ async def list_tools(self) -> list[dict[str, Any]]:
56
+ """Retrieve available tools from the MCP server."""
57
+ response = await self.session.list_tools()
58
+ return [
59
+ {
60
+ "name": tool.name,
61
+ "description": tool.description,
62
+ "input_schema": tool.inputSchema,
63
+ }
64
+ for tool in response.tools
65
+ ]
66
+
67
+ async def call_tool(self, tool_name: str, arguments: dict[str, Any]) -> Any:
68
+ """Call a tool on the MCP server with provided arguments."""
69
+ result = await self.session.call_tool(tool_name, arguments=arguments)
70
+ return result.content
71
+
72
+
73
+ class MCPConnectionStdio(MCPConnection):
74
+ """MCP connection using standard input/output."""
75
+
76
+ def __init__(self, command: str, args: list[str] = None, env: dict[str, str] = None):
77
+ super().__init__()
78
+ self.command = command
79
+ self.args = args or []
80
+ self.env = env
81
+
82
+ def _create_context(self):
83
+ return stdio_client(
84
+ StdioServerParameters(command=self.command, args=self.args, env=self.env)
85
+ )
86
+
87
+
88
+ class MCPConnectionSSE(MCPConnection):
89
+ """MCP connection using Server-Sent Events."""
90
+
91
+ def __init__(self, url: str, headers: dict[str, str] = None):
92
+ super().__init__()
93
+ self.url = url
94
+ self.headers = headers or {}
95
+
96
+ def _create_context(self):
97
+ return sse_client(url=self.url, headers=self.headers)
98
+
99
+
100
+ class MCPConnectionHTTP(MCPConnection):
101
+ """MCP connection using Streamable HTTP."""
102
+
103
+ def __init__(self, url: str, headers: dict[str, str] = None):
104
+ super().__init__()
105
+ self.url = url
106
+ self.headers = headers or {}
107
+
108
+ def _create_context(self):
109
+ return streamablehttp_client(url=self.url, headers=self.headers)
110
+
111
+
112
+ def create_connection(
113
+ transport: str,
114
+ command: str = None,
115
+ args: list[str] = None,
116
+ env: dict[str, str] = None,
117
+ url: str = None,
118
+ headers: dict[str, str] = None,
119
+ ) -> MCPConnection:
120
+ """Factory function to create the appropriate MCP connection.
121
+
122
+ Args:
123
+ transport: Connection type ("stdio", "sse", or "http")
124
+ command: Command to run (stdio only)
125
+ args: Command arguments (stdio only)
126
+ env: Environment variables (stdio only)
127
+ url: Server URL (sse and http only)
128
+ headers: HTTP headers (sse and http only)
129
+
130
+ Returns:
131
+ MCPConnection instance
132
+ """
133
+ transport = transport.lower()
134
+
135
+ if transport == "stdio":
136
+ if not command:
137
+ raise ValueError("Command is required for stdio transport")
138
+ return MCPConnectionStdio(command=command, args=args, env=env)
139
+
140
+ elif transport == "sse":
141
+ if not url:
142
+ raise ValueError("URL is required for sse transport")
143
+ return MCPConnectionSSE(url=url, headers=headers)
144
+
145
+ elif transport in ["http", "streamable_http", "streamable-http"]:
146
+ if not url:
147
+ raise ValueError("URL is required for http transport")
148
+ return MCPConnectionHTTP(url=url, headers=headers)
149
+
150
+ else:
151
+ raise ValueError(f"Unsupported transport type: {transport}. Use 'stdio', 'sse', or 'http'")
mcp-builder/scripts/evaluation.py ADDED
@@ -0,0 +1,373 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """MCP Server Evaluation Harness
2
+
3
+ This script evaluates MCP servers by running test questions against them using Claude.
4
+ """
5
+
6
+ import argparse
7
+ import asyncio
8
+ import json
9
+ import re
10
+ import sys
11
+ import time
12
+ import traceback
13
+ import xml.etree.ElementTree as ET
14
+ from pathlib import Path
15
+ from typing import Any
16
+
17
+ from anthropic import Anthropic
18
+
19
+ from connections import create_connection
20
+
21
+ EVALUATION_PROMPT = """You are an AI assistant with access to tools.
22
+
23
+ When given a task, you MUST:
24
+ 1. Use the available tools to complete the task
25
+ 2. Provide summary of each step in your approach, wrapped in <summary> tags
26
+ 3. Provide feedback on the tools provided, wrapped in <feedback> tags
27
+ 4. Provide your final response, wrapped in <response> tags
28
+
29
+ Summary Requirements:
30
+ - In your <summary> tags, you must explain:
31
+ - The steps you took to complete the task
32
+ - Which tools you used, in what order, and why
33
+ - The inputs you provided to each tool
34
+ - The outputs you received from each tool
35
+ - A summary for how you arrived at the response
36
+
37
+ Feedback Requirements:
38
+ - In your <feedback> tags, provide constructive feedback on the tools:
39
+ - Comment on tool names: Are they clear and descriptive?
40
+ - Comment on input parameters: Are they well-documented? Are required vs optional parameters clear?
41
+ - Comment on descriptions: Do they accurately describe what the tool does?
42
+ - Comment on any errors encountered during tool usage: Did the tool fail to execute? Did the tool return too many tokens?
43
+ - Identify specific areas for improvement and explain WHY they would help
44
+ - Be specific and actionable in your suggestions
45
+
46
+ Response Requirements:
47
+ - Your response should be concise and directly address what was asked
48
+ - Always wrap your final response in <response> tags
49
+ - If you cannot solve the task return <response>NOT_FOUND</response>
50
+ - For numeric responses, provide just the number
51
+ - For IDs, provide just the ID
52
+ - For names or text, provide the exact text requested
53
+ - Your response should go last"""
54
+
55
+
56
+ def parse_evaluation_file(file_path: Path) -> list[dict[str, Any]]:
57
+ """Parse XML evaluation file with qa_pair elements."""
58
+ try:
59
+ tree = ET.parse(file_path)
60
+ root = tree.getroot()
61
+ evaluations = []
62
+
63
+ for qa_pair in root.findall(".//qa_pair"):
64
+ question_elem = qa_pair.find("question")
65
+ answer_elem = qa_pair.find("answer")
66
+
67
+ if question_elem is not None and answer_elem is not None:
68
+ evaluations.append({
69
+ "question": (question_elem.text or "").strip(),
70
+ "answer": (answer_elem.text or "").strip(),
71
+ })
72
+
73
+ return evaluations
74
+ except Exception as e:
75
+ print(f"Error parsing evaluation file {file_path}: {e}")
76
+ return []
77
+
78
+
79
+ def extract_xml_content(text: str, tag: str) -> str | None:
80
+ """Extract content from XML tags."""
81
+ pattern = rf"<{tag}>(.*?)</{tag}>"
82
+ matches = re.findall(pattern, text, re.DOTALL)
83
+ return matches[-1].strip() if matches else None
84
+
85
+
86
+ async def agent_loop(
87
+ client: Anthropic,
88
+ model: str,
89
+ question: str,
90
+ tools: list[dict[str, Any]],
91
+ connection: Any,
92
+ ) -> tuple[str, dict[str, Any]]:
93
+ """Run the agent loop with MCP tools."""
94
+ messages = [{"role": "user", "content": question}]
95
+
96
+ response = await asyncio.to_thread(
97
+ client.messages.create,
98
+ model=model,
99
+ max_tokens=4096,
100
+ system=EVALUATION_PROMPT,
101
+ messages=messages,
102
+ tools=tools,
103
+ )
104
+
105
+ messages.append({"role": "assistant", "content": response.content})
106
+
107
+ tool_metrics = {}
108
+
109
+ while response.stop_reason == "tool_use":
110
+ tool_use = next(block for block in response.content if block.type == "tool_use")
111
+ tool_name = tool_use.name
112
+ tool_input = tool_use.input
113
+
114
+ tool_start_ts = time.time()
115
+ try:
116
+ tool_result = await connection.call_tool(tool_name, tool_input)
117
+ tool_response = json.dumps(tool_result) if isinstance(tool_result, (dict, list)) else str(tool_result)
118
+ except Exception as e:
119
+ tool_response = f"Error executing tool {tool_name}: {str(e)}\n"
120
+ tool_response += traceback.format_exc()
121
+ tool_duration = time.time() - tool_start_ts
122
+
123
+ if tool_name not in tool_metrics:
124
+ tool_metrics[tool_name] = {"count": 0, "durations": []}
125
+ tool_metrics[tool_name]["count"] += 1
126
+ tool_metrics[tool_name]["durations"].append(tool_duration)
127
+
128
+ messages.append({
129
+ "role": "user",
130
+ "content": [{
131
+ "type": "tool_result",
132
+ "tool_use_id": tool_use.id,
133
+ "content": tool_response,
134
+ }]
135
+ })
136
+
137
+ response = await asyncio.to_thread(
138
+ client.messages.create,
139
+ model=model,
140
+ max_tokens=4096,
141
+ system=EVALUATION_PROMPT,
142
+ messages=messages,
143
+ tools=tools,
144
+ )
145
+ messages.append({"role": "assistant", "content": response.content})
146
+
147
+ response_text = next(
148
+ (block.text for block in response.content if hasattr(block, "text")),
149
+ None,
150
+ )
151
+ return response_text, tool_metrics
152
+
153
+
154
+ async def evaluate_single_task(
155
+ client: Anthropic,
156
+ model: str,
157
+ qa_pair: dict[str, Any],
158
+ tools: list[dict[str, Any]],
159
+ connection: Any,
160
+ task_index: int,
161
+ ) -> dict[str, Any]:
162
+ """Evaluate a single QA pair with the given tools."""
163
+ start_time = time.time()
164
+
165
+ print(f"Task {task_index + 1}: Running task with question: {qa_pair['question']}")
166
+ response, tool_metrics = await agent_loop(client, model, qa_pair["question"], tools, connection)
167
+
168
+ response_value = extract_xml_content(response, "response")
169
+ summary = extract_xml_content(response, "summary")
170
+ feedback = extract_xml_content(response, "feedback")
171
+
172
+ duration_seconds = time.time() - start_time
173
+
174
+ return {
175
+ "question": qa_pair["question"],
176
+ "expected": qa_pair["answer"],
177
+ "actual": response_value,
178
+ "score": int(response_value == qa_pair["answer"]) if response_value else 0,
179
+ "total_duration": duration_seconds,
180
+ "tool_calls": tool_metrics,
181
+ "num_tool_calls": sum(len(metrics["durations"]) for metrics in tool_metrics.values()),
182
+ "summary": summary,
183
+ "feedback": feedback,
184
+ }
185
+
186
+
187
+ REPORT_HEADER = """
188
+ # Evaluation Report
189
+
190
+ ## Summary
191
+
192
+ - **Accuracy**: {correct}/{total} ({accuracy:.1f}%)
193
+ - **Average Task Duration**: {average_duration_s:.2f}s
194
+ - **Average Tool Calls per Task**: {average_tool_calls:.2f}
195
+ - **Total Tool Calls**: {total_tool_calls}
196
+
197
+ ---
198
+ """
199
+
200
+ TASK_TEMPLATE = """
201
+ ### Task {task_num}
202
+
203
+ **Question**: {question}
204
+ **Ground Truth Answer**: `{expected_answer}`
205
+ **Actual Answer**: `{actual_answer}`
206
+ **Correct**: {correct_indicator}
207
+ **Duration**: {total_duration:.2f}s
208
+ **Tool Calls**: {tool_calls}
209
+
210
+ **Summary**
211
+ {summary}
212
+
213
+ **Feedback**
214
+ {feedback}
215
+
216
+ ---
217
+ """
218
+
219
+
220
+ async def run_evaluation(
221
+ eval_path: Path,
222
+ connection: Any,
223
+ model: str = "claude-3-7-sonnet-20250219",
224
+ ) -> str:
225
+ """Run evaluation with MCP server tools."""
226
+ print("🚀 Starting Evaluation")
227
+
228
+ client = Anthropic()
229
+
230
+ tools = await connection.list_tools()
231
+ print(f"📋 Loaded {len(tools)} tools from MCP server")
232
+
233
+ qa_pairs = parse_evaluation_file(eval_path)
234
+ print(f"📋 Loaded {len(qa_pairs)} evaluation tasks")
235
+
236
+ results = []
237
+ for i, qa_pair in enumerate(qa_pairs):
238
+ print(f"Processing task {i + 1}/{len(qa_pairs)}")
239
+ result = await evaluate_single_task(client, model, qa_pair, tools, connection, i)
240
+ results.append(result)
241
+
242
+ correct = sum(r["score"] for r in results)
243
+ accuracy = (correct / len(results)) * 100 if results else 0
244
+ average_duration_s = sum(r["total_duration"] for r in results) / len(results) if results else 0
245
+ average_tool_calls = sum(r["num_tool_calls"] for r in results) / len(results) if results else 0
246
+ total_tool_calls = sum(r["num_tool_calls"] for r in results)
247
+
248
+ report = REPORT_HEADER.format(
249
+ correct=correct,
250
+ total=len(results),
251
+ accuracy=accuracy,
252
+ average_duration_s=average_duration_s,
253
+ average_tool_calls=average_tool_calls,
254
+ total_tool_calls=total_tool_calls,
255
+ )
256
+
257
+ report += "".join([
258
+ TASK_TEMPLATE.format(
259
+ task_num=i + 1,
260
+ question=qa_pair["question"],
261
+ expected_answer=qa_pair["answer"],
262
+ actual_answer=result["actual"] or "N/A",
263
+ correct_indicator="✅" if result["score"] else "❌",
264
+ total_duration=result["total_duration"],
265
+ tool_calls=json.dumps(result["tool_calls"], indent=2),
266
+ summary=result["summary"] or "N/A",
267
+ feedback=result["feedback"] or "N/A",
268
+ )
269
+ for i, (qa_pair, result) in enumerate(zip(qa_pairs, results))
270
+ ])
271
+
272
+ return report
273
+
274
+
275
+ def parse_headers(header_list: list[str]) -> dict[str, str]:
276
+ """Parse header strings in format 'Key: Value' into a dictionary."""
277
+ headers = {}
278
+ if not header_list:
279
+ return headers
280
+
281
+ for header in header_list:
282
+ if ":" in header:
283
+ key, value = header.split(":", 1)
284
+ headers[key.strip()] = value.strip()
285
+ else:
286
+ print(f"Warning: Ignoring malformed header: {header}")
287
+ return headers
288
+
289
+
290
+ def parse_env_vars(env_list: list[str]) -> dict[str, str]:
291
+ """Parse environment variable strings in format 'KEY=VALUE' into a dictionary."""
292
+ env = {}
293
+ if not env_list:
294
+ return env
295
+
296
+ for env_var in env_list:
297
+ if "=" in env_var:
298
+ key, value = env_var.split("=", 1)
299
+ env[key.strip()] = value.strip()
300
+ else:
301
+ print(f"Warning: Ignoring malformed environment variable: {env_var}")
302
+ return env
303
+
304
+
305
+ async def main():
306
+ parser = argparse.ArgumentParser(
307
+ description="Evaluate MCP servers using test questions",
308
+ formatter_class=argparse.RawDescriptionHelpFormatter,
309
+ epilog="""
310
+ Examples:
311
+ # Evaluate a local stdio MCP server
312
+ python evaluation.py -t stdio -c python -a my_server.py eval.xml
313
+
314
+ # Evaluate an SSE MCP server
315
+ python evaluation.py -t sse -u https://example.com/mcp -H "Authorization: Bearer token" eval.xml
316
+
317
+ # Evaluate an HTTP MCP server with custom model
318
+ python evaluation.py -t http -u https://example.com/mcp -m claude-3-5-sonnet-20241022 eval.xml
319
+ """,
320
+ )
321
+
322
+ parser.add_argument("eval_file", type=Path, help="Path to evaluation XML file")
323
+ parser.add_argument("-t", "--transport", choices=["stdio", "sse", "http"], default="stdio", help="Transport type (default: stdio)")
324
+ parser.add_argument("-m", "--model", default="claude-3-7-sonnet-20250219", help="Claude model to use (default: claude-3-7-sonnet-20250219)")
325
+
326
+ stdio_group = parser.add_argument_group("stdio options")
327
+ stdio_group.add_argument("-c", "--command", help="Command to run MCP server (stdio only)")
328
+ stdio_group.add_argument("-a", "--args", nargs="+", help="Arguments for the command (stdio only)")
329
+ stdio_group.add_argument("-e", "--env", nargs="+", help="Environment variables in KEY=VALUE format (stdio only)")
330
+
331
+ remote_group = parser.add_argument_group("sse/http options")
332
+ remote_group.add_argument("-u", "--url", help="MCP server URL (sse/http only)")
333
+ remote_group.add_argument("-H", "--header", nargs="+", dest="headers", help="HTTP headers in 'Key: Value' format (sse/http only)")
334
+
335
+ parser.add_argument("-o", "--output", type=Path, help="Output file for evaluation report (default: stdout)")
336
+
337
+ args = parser.parse_args()
338
+
339
+ if not args.eval_file.exists():
340
+ print(f"Error: Evaluation file not found: {args.eval_file}")
341
+ sys.exit(1)
342
+
343
+ headers = parse_headers(args.headers) if args.headers else None
344
+ env_vars = parse_env_vars(args.env) if args.env else None
345
+
346
+ try:
347
+ connection = create_connection(
348
+ transport=args.transport,
349
+ command=args.command,
350
+ args=args.args,
351
+ env=env_vars,
352
+ url=args.url,
353
+ headers=headers,
354
+ )
355
+ except ValueError as e:
356
+ print(f"Error: {e}")
357
+ sys.exit(1)
358
+
359
+ print(f"🔗 Connecting to MCP server via {args.transport}...")
360
+
361
+ async with connection:
362
+ print("✅ Connected successfully")
363
+ report = await run_evaluation(args.eval_file, connection, args.model)
364
+
365
+ if args.output:
366
+ args.output.write_text(report)
367
+ print(f"\n✅ Report saved to {args.output}")
368
+ else:
369
+ print("\n" + report)
370
+
371
+
372
+ if __name__ == "__main__":
373
+ asyncio.run(main())
mcp-builder/scripts/example_evaluation.xml ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <evaluation>
2
+ <qa_pair>
3
+ <question>Calculate the compound interest on $10,000 invested at 5% annual interest rate, compounded monthly for 3 years. What is the final amount in dollars (rounded to 2 decimal places)?</question>
4
+ <answer>11614.72</answer>
5
+ </qa_pair>
6
+ <qa_pair>
7
+ <question>A projectile is launched at a 45-degree angle with an initial velocity of 50 m/s. Calculate the total distance (in meters) it has traveled from the launch point after 2 seconds, assuming g=9.8 m/s². Round to 2 decimal places.</question>
8
+ <answer>87.25</answer>
9
+ </qa_pair>
10
+ <qa_pair>
11
+ <question>A sphere has a volume of 500 cubic meters. Calculate its surface area in square meters. Round to 2 decimal places.</question>
12
+ <answer>304.65</answer>
13
+ </qa_pair>
14
+ <qa_pair>
15
+ <question>Calculate the population standard deviation of this dataset: [12, 15, 18, 22, 25, 30, 35]. Round to 2 decimal places.</question>
16
+ <answer>7.61</answer>
17
+ </qa_pair>
18
+ <qa_pair>
19
+ <question>Calculate the pH of a solution with a hydrogen ion concentration of 3.5 × 10^-5 M. Round to 2 decimal places.</question>
20
+ <answer>4.46</answer>
21
+ </qa_pair>
22
+ </evaluation>
mcp-builder/scripts/requirements.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ anthropic>=0.39.0
2
+ mcp>=1.1.0
skill-creator/LICENSE.txt ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ Apache License
3
+ Version 2.0, January 2004
4
+ http://www.apache.org/licenses/
5
+
6
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
7
+
8
+ 1. Definitions.
9
+
10
+ "License" shall mean the terms and conditions for use, reproduction,
11
+ and distribution as defined by Sections 1 through 9 of this document.
12
+
13
+ "Licensor" shall mean the copyright owner or entity authorized by
14
+ the copyright owner that is granting the License.
15
+
16
+ "Legal Entity" shall mean the union of the acting entity and all
17
+ other entities that control, are controlled by, or are under common
18
+ control with that entity. For the purposes of this definition,
19
+ "control" means (i) the power, direct or indirect, to cause the
20
+ direction or management of such entity, whether by contract or
21
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
22
+ outstanding shares, or (iii) beneficial ownership of such entity.
23
+
24
+ "You" (or "Your") shall mean an individual or Legal Entity
25
+ exercising permissions granted by this License.
26
+
27
+ "Source" form shall mean the preferred form for making modifications,
28
+ including but not limited to software source code, documentation
29
+ source, and configuration files.
30
+
31
+ "Object" form shall mean any form resulting from mechanical
32
+ transformation or translation of a Source form, including but
33
+ not limited to compiled object code, generated documentation,
34
+ and conversions to other media types.
35
+
36
+ "Work" shall mean the work of authorship, whether in Source or
37
+ Object form, made available under the License, as indicated by a
38
+ copyright notice that is included in or attached to the work
39
+ (an example is provided in the Appendix below).
40
+
41
+ "Derivative Works" shall mean any work, whether in Source or Object
42
+ form, that is based on (or derived from) the Work and for which the
43
+ editorial revisions, annotations, elaborations, or other modifications
44
+ represent, as a whole, an original work of authorship. For the purposes
45
+ of this License, Derivative Works shall not include works that remain
46
+ separable from, or merely link (or bind by name) to the interfaces of,
47
+ the Work and Derivative Works thereof.
48
+
49
+ "Contribution" shall mean any work of authorship, including
50
+ the original version of the Work and any modifications or additions
51
+ to that Work or Derivative Works thereof, that is intentionally
52
+ submitted to Licensor for inclusion in the Work by the copyright owner
53
+ or by an individual or Legal Entity authorized to submit on behalf of
54
+ the copyright owner. For the purposes of this definition, "submitted"
55
+ means any form of electronic, verbal, or written communication sent
56
+ to the Licensor or its representatives, including but not limited to
57
+ communication on electronic mailing lists, source code control systems,
58
+ and issue tracking systems that are managed by, or on behalf of, the
59
+ Licensor for the purpose of discussing and improving the Work, but
60
+ excluding communication that is conspicuously marked or otherwise
61
+ designated in writing by the copyright owner as "Not a Contribution."
62
+
63
+ "Contributor" shall mean Licensor and any individual or Legal Entity
64
+ on behalf of whom a Contribution has been received by Licensor and
65
+ subsequently incorporated within the Work.
66
+
67
+ 2. Grant of Copyright License. Subject to the terms and conditions of
68
+ this License, each Contributor hereby grants to You a perpetual,
69
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
70
+ copyright license to reproduce, prepare Derivative Works of,
71
+ publicly display, publicly perform, sublicense, and distribute the
72
+ Work and such Derivative Works in Source or Object form.
73
+
74
+ 3. Grant of Patent License. Subject to the terms and conditions of
75
+ this License, each Contributor hereby grants to You a perpetual,
76
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
77
+ (except as stated in this section) patent license to make, have made,
78
+ use, offer to sell, sell, import, and otherwise transfer the Work,
79
+ where such license applies only to those patent claims licensable
80
+ by such Contributor that are necessarily infringed by their
81
+ Contribution(s) alone or by combination of their Contribution(s)
82
+ with the Work to which such Contribution(s) was submitted. If You
83
+ institute patent litigation against any entity (including a
84
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
85
+ or a Contribution incorporated within the Work constitutes direct
86
+ or contributory patent infringement, then any patent licenses
87
+ granted to You under this License for that Work shall terminate
88
+ as of the date such litigation is filed.
89
+
90
+ 4. Redistribution. You may reproduce and distribute copies of the
91
+ Work or Derivative Works thereof in any medium, with or without
92
+ modifications, and in Source or Object form, provided that You
93
+ meet the following conditions:
94
+
95
+ (a) You must give any other recipients of the Work or
96
+ Derivative Works a copy of this License; and
97
+
98
+ (b) You must cause any modified files to carry prominent notices
99
+ stating that You changed the files; and
100
+
101
+ (c) You must retain, in the Source form of any Derivative Works
102
+ that You distribute, all copyright, patent, trademark, and
103
+ attribution notices from the Source form of the Work,
104
+ excluding those notices that do not pertain to any part of
105
+ the Derivative Works; and
106
+
107
+ (d) If the Work includes a "NOTICE" text file as part of its
108
+ distribution, then any Derivative Works that You distribute must
109
+ include a readable copy of the attribution notices contained
110
+ within such NOTICE file, excluding those notices that do not
111
+ pertain to any part of the Derivative Works, in at least one
112
+ of the following places: within a NOTICE text file distributed
113
+ as part of the Derivative Works; within the Source form or
114
+ documentation, if provided along with the Derivative Works; or,
115
+ within a display generated by the Derivative Works, if and
116
+ wherever such third-party notices normally appear. The contents
117
+ of the NOTICE file are for informational purposes only and
118
+ do not modify the License. You may add Your own attribution
119
+ notices within Derivative Works that You distribute, alongside
120
+ or as an addendum to the NOTICE text from the Work, provided
121
+ that such additional attribution notices cannot be construed
122
+ as modifying the License.
123
+
124
+ You may add Your own copyright statement to Your modifications and
125
+ may provide additional or different license terms and conditions
126
+ for use, reproduction, or distribution of Your modifications, or
127
+ for any such Derivative Works as a whole, provided Your use,
128
+ reproduction, and distribution of the Work otherwise complies with
129
+ the conditions stated in this License.
130
+
131
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
132
+ any Contribution intentionally submitted for inclusion in the Work
133
+ by You to the Licensor shall be under the terms and conditions of
134
+ this License, without any additional terms or conditions.
135
+ Notwithstanding the above, nothing herein shall supersede or modify
136
+ the terms of any separate license agreement you may have executed
137
+ with Licensor regarding such Contributions.
138
+
139
+ 6. Trademarks. This License does not grant permission to use the trade
140
+ names, trademarks, service marks, or product names of the Licensor,
141
+ except as required for reasonable and customary use in describing the
142
+ origin of the Work and reproducing the content of the NOTICE file.
143
+
144
+ 7. Disclaimer of Warranty. Unless required by applicable law or
145
+ agreed to in writing, Licensor provides the Work (and each
146
+ Contributor provides its Contributions) on an "AS IS" BASIS,
147
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
148
+ implied, including, without limitation, any warranties or conditions
149
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
150
+ PARTICULAR PURPOSE. You are solely responsible for determining the
151
+ appropriateness of using or redistributing the Work and assume any
152
+ risks associated with Your exercise of permissions under this License.
153
+
154
+ 8. Limitation of Liability. In no event and under no legal theory,
155
+ whether in tort (including negligence), contract, or otherwise,
156
+ unless required by applicable law (such as deliberate and grossly
157
+ negligent acts) or agreed to in writing, shall any Contributor be
158
+ liable to You for damages, including any direct, indirect, special,
159
+ incidental, or consequential damages of any character arising as a
160
+ result of this License or out of the use or inability to use the
161
+ Work (including but not limited to damages for loss of goodwill,
162
+ work stoppage, computer failure or malfunction, or any and all
163
+ other commercial damages or losses), even if such Contributor
164
+ has been advised of the possibility of such damages.
165
+
166
+ 9. Accepting Warranty or Additional Liability. While redistributing
167
+ the Work or Derivative Works thereof, You may choose to offer,
168
+ and charge a fee for, acceptance of support, warranty, indemnity,
169
+ or other liability obligations and/or rights consistent with this
170
+ License. However, in accepting such obligations, You may act only
171
+ on Your own behalf and on Your sole responsibility, not on behalf
172
+ of any other Contributor, and only if You agree to indemnify,
173
+ defend, and hold each Contributor harmless for any liability
174
+ incurred by, or claims asserted against, such Contributor by reason
175
+ of your accepting any such warranty or additional liability.
176
+
177
+ END OF TERMS AND CONDITIONS
178
+
179
+ APPENDIX: How to apply the Apache License to your work.
180
+
181
+ To apply the Apache License to your work, attach the following
182
+ boilerplate notice, with the fields enclosed by brackets "[]"
183
+ replaced with your own identifying information. (Don't include
184
+ the brackets!) The text should be enclosed in the appropriate
185
+ comment syntax for the file format. We also recommend that a
186
+ file or class name and description of purpose be included on the
187
+ same "printed page" as the copyright notice for easier
188
+ identification within third-party archives.
189
+
190
+ Copyright [yyyy] [name of copyright owner]
191
+
192
+ Licensed under the Apache License, Version 2.0 (the "License");
193
+ you may not use this file except in compliance with the License.
194
+ You may obtain a copy of the License at
195
+
196
+ http://www.apache.org/licenses/LICENSE-2.0
197
+
198
+ Unless required by applicable law or agreed to in writing, software
199
+ distributed under the License is distributed on an "AS IS" BASIS,
200
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
201
+ See the License for the specific language governing permissions and
202
+ limitations under the License.
skill-creator/SKILL.md ADDED
@@ -0,0 +1,356 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: skill-creator
3
+ description: Guide for creating effective skills. This skill should be used when users want to create a new skill (or update an existing skill) that extends Claude's capabilities with specialized knowledge, workflows, or tool integrations.
4
+ license: Complete terms in LICENSE.txt
5
+ ---
6
+
7
+ # Skill Creator
8
+
9
+ This skill provides guidance for creating effective skills.
10
+
11
+ ## About Skills
12
+
13
+ Skills are modular, self-contained packages that extend Claude's capabilities by providing
14
+ specialized knowledge, workflows, and tools. Think of them as "onboarding guides" for specific
15
+ domains or tasks—they transform Claude from a general-purpose agent into a specialized agent
16
+ equipped with procedural knowledge that no model can fully possess.
17
+
18
+ ### What Skills Provide
19
+
20
+ 1. Specialized workflows - Multi-step procedures for specific domains
21
+ 2. Tool integrations - Instructions for working with specific file formats or APIs
22
+ 3. Domain expertise - Company-specific knowledge, schemas, business logic
23
+ 4. Bundled resources - Scripts, references, and assets for complex and repetitive tasks
24
+
25
+ ## Core Principles
26
+
27
+ ### Concise is Key
28
+
29
+ The context window is a public good. Skills share the context window with everything else Claude needs: system prompt, conversation history, other Skills' metadata, and the actual user request.
30
+
31
+ **Default assumption: Claude is already very smart.** Only add context Claude doesn't already have. Challenge each piece of information: "Does Claude really need this explanation?" and "Does this paragraph justify its token cost?"
32
+
33
+ Prefer concise examples over verbose explanations.
34
+
35
+ ### Set Appropriate Degrees of Freedom
36
+
37
+ Match the level of specificity to the task's fragility and variability:
38
+
39
+ **High freedom (text-based instructions)**: Use when multiple approaches are valid, decisions depend on context, or heuristics guide the approach.
40
+
41
+ **Medium freedom (pseudocode or scripts with parameters)**: Use when a preferred pattern exists, some variation is acceptable, or configuration affects behavior.
42
+
43
+ **Low freedom (specific scripts, few parameters)**: Use when operations are fragile and error-prone, consistency is critical, or a specific sequence must be followed.
44
+
45
+ Think of Claude as exploring a path: a narrow bridge with cliffs needs specific guardrails (low freedom), while an open field allows many routes (high freedom).
46
+
47
+ ### Anatomy of a Skill
48
+
49
+ Every skill consists of a required SKILL.md file and optional bundled resources:
50
+
51
+ ```
52
+ skill-name/
53
+ ├── SKILL.md (required)
54
+ │ ├── YAML frontmatter metadata (required)
55
+ │ │ ├── name: (required)
56
+ │ │ └── description: (required)
57
+ │ └── Markdown instructions (required)
58
+ └── Bundled Resources (optional)
59
+ ├── scripts/ - Executable code (Python/Bash/etc.)
60
+ ├── references/ - Documentation intended to be loaded into context as needed
61
+ └── assets/ - Files used in output (templates, icons, fonts, etc.)
62
+ ```
63
+
64
+ #### SKILL.md (required)
65
+
66
+ Every SKILL.md consists of:
67
+
68
+ - **Frontmatter** (YAML): Contains `name` and `description` fields. These are the only fields that Claude reads to determine when the skill gets used, thus it is very important to be clear and comprehensive in describing what the skill is, and when it should be used.
69
+ - **Body** (Markdown): Instructions and guidance for using the skill. Only loaded AFTER the skill triggers (if at all).
70
+
71
+ #### Bundled Resources (optional)
72
+
73
+ ##### Scripts (`scripts/`)
74
+
75
+ Executable code (Python/Bash/etc.) for tasks that require deterministic reliability or are repeatedly rewritten.
76
+
77
+ - **When to include**: When the same code is being rewritten repeatedly or deterministic reliability is needed
78
+ - **Example**: `scripts/rotate_pdf.py` for PDF rotation tasks
79
+ - **Benefits**: Token efficient, deterministic, may be executed without loading into context
80
+ - **Note**: Scripts may still need to be read by Claude for patching or environment-specific adjustments
81
+
82
+ ##### References (`references/`)
83
+
84
+ Documentation and reference material intended to be loaded as needed into context to inform Claude's process and thinking.
85
+
86
+ - **When to include**: For documentation that Claude should reference while working
87
+ - **Examples**: `references/finance.md` for financial schemas, `references/mnda.md` for company NDA template, `references/policies.md` for company policies, `references/api_docs.md` for API specifications
88
+ - **Use cases**: Database schemas, API documentation, domain knowledge, company policies, detailed workflow guides
89
+ - **Benefits**: Keeps SKILL.md lean, loaded only when Claude determines it's needed
90
+ - **Best practice**: If files are large (>10k words), include grep search patterns in SKILL.md
91
+ - **Avoid duplication**: Information should live in either SKILL.md or references files, not both. Prefer references files for detailed information unless it's truly core to the skill—this keeps SKILL.md lean while making information discoverable without hogging the context window. Keep only essential procedural instructions and workflow guidance in SKILL.md; move detailed reference material, schemas, and examples to references files.
92
+
93
+ ##### Assets (`assets/`)
94
+
95
+ Files not intended to be loaded into context, but rather used within the output Claude produces.
96
+
97
+ - **When to include**: When the skill needs files that will be used in the final output
98
+ - **Examples**: `assets/logo.png` for brand assets, `assets/slides.pptx` for PowerPoint templates, `assets/frontend-template/` for HTML/React boilerplate, `assets/font.ttf` for typography
99
+ - **Use cases**: Templates, images, icons, boilerplate code, fonts, sample documents that get copied or modified
100
+ - **Benefits**: Separates output resources from documentation, enables Claude to use files without loading them into context
101
+
102
+ #### What to Not Include in a Skill
103
+
104
+ A skill should only contain essential files that directly support its functionality. Do NOT create extraneous documentation or auxiliary files, including:
105
+
106
+ - README.md
107
+ - INSTALLATION_GUIDE.md
108
+ - QUICK_REFERENCE.md
109
+ - CHANGELOG.md
110
+ - etc.
111
+
112
+ The skill should only contain the information needed for an AI agent to do the job at hand. It should not contain auxilary context about the process that went into creating it, setup and testing procedures, user-facing documentation, etc. Creating additional documentation files just adds clutter and confusion.
113
+
114
+ ### Progressive Disclosure Design Principle
115
+
116
+ Skills use a three-level loading system to manage context efficiently:
117
+
118
+ 1. **Metadata (name + description)** - Always in context (~100 words)
119
+ 2. **SKILL.md body** - When skill triggers (<5k words)
120
+ 3. **Bundled resources** - As needed by Claude (Unlimited because scripts can be executed without reading into context window)
121
+
122
+ #### Progressive Disclosure Patterns
123
+
124
+ Keep SKILL.md body to the essentials and under 500 lines to minimize context bloat. Split content into separate files when approaching this limit. When splitting out content into other files, it is very important to reference them from SKILL.md and describe clearly when to read them, to ensure the reader of the skill knows they exist and when to use them.
125
+
126
+ **Key principle:** When a skill supports multiple variations, frameworks, or options, keep only the core workflow and selection guidance in SKILL.md. Move variant-specific details (patterns, examples, configuration) into separate reference files.
127
+
128
+ **Pattern 1: High-level guide with references**
129
+
130
+ ```markdown
131
+ # PDF Processing
132
+
133
+ ## Quick start
134
+
135
+ Extract text with pdfplumber:
136
+ [code example]
137
+
138
+ ## Advanced features
139
+
140
+ - **Form filling**: See [FORMS.md](FORMS.md) for complete guide
141
+ - **API reference**: See [REFERENCE.md](REFERENCE.md) for all methods
142
+ - **Examples**: See [EXAMPLES.md](EXAMPLES.md) for common patterns
143
+ ```
144
+
145
+ Claude loads FORMS.md, REFERENCE.md, or EXAMPLES.md only when needed.
146
+
147
+ **Pattern 2: Domain-specific organization**
148
+
149
+ For Skills with multiple domains, organize content by domain to avoid loading irrelevant context:
150
+
151
+ ```
152
+ bigquery-skill/
153
+ ├── SKILL.md (overview and navigation)
154
+ └── reference/
155
+ ├── finance.md (revenue, billing metrics)
156
+ ├── sales.md (opportunities, pipeline)
157
+ ├── product.md (API usage, features)
158
+ └── marketing.md (campaigns, attribution)
159
+ ```
160
+
161
+ When a user asks about sales metrics, Claude only reads sales.md.
162
+
163
+ Similarly, for skills supporting multiple frameworks or variants, organize by variant:
164
+
165
+ ```
166
+ cloud-deploy/
167
+ ├── SKILL.md (workflow + provider selection)
168
+ └── references/
169
+ ├── aws.md (AWS deployment patterns)
170
+ ├── gcp.md (GCP deployment patterns)
171
+ └── azure.md (Azure deployment patterns)
172
+ ```
173
+
174
+ When the user chooses AWS, Claude only reads aws.md.
175
+
176
+ **Pattern 3: Conditional details**
177
+
178
+ Show basic content, link to advanced content:
179
+
180
+ ```markdown
181
+ # DOCX Processing
182
+
183
+ ## Creating documents
184
+
185
+ Use docx-js for new documents. See [DOCX-JS.md](DOCX-JS.md).
186
+
187
+ ## Editing documents
188
+
189
+ For simple edits, modify the XML directly.
190
+
191
+ **For tracked changes**: See [REDLINING.md](REDLINING.md)
192
+ **For OOXML details**: See [OOXML.md](OOXML.md)
193
+ ```
194
+
195
+ Claude reads REDLINING.md or OOXML.md only when the user needs those features.
196
+
197
+ **Important guidelines:**
198
+
199
+ - **Avoid deeply nested references** - Keep references one level deep from SKILL.md. All reference files should link directly from SKILL.md.
200
+ - **Structure longer reference files** - For files longer than 100 lines, include a table of contents at the top so Claude can see the full scope when previewing.
201
+
202
+ ## Skill Creation Process
203
+
204
+ Skill creation involves these steps:
205
+
206
+ 1. Understand the skill with concrete examples
207
+ 2. Plan reusable skill contents (scripts, references, assets)
208
+ 3. Initialize the skill (run init_skill.py)
209
+ 4. Edit the skill (implement resources and write SKILL.md)
210
+ 5. Package the skill (run package_skill.py)
211
+ 6. Iterate based on real usage
212
+
213
+ Follow these steps in order, skipping only if there is a clear reason why they are not applicable.
214
+
215
+ ### Step 1: Understanding the Skill with Concrete Examples
216
+
217
+ Skip this step only when the skill's usage patterns are already clearly understood. It remains valuable even when working with an existing skill.
218
+
219
+ To create an effective skill, clearly understand concrete examples of how the skill will be used. This understanding can come from either direct user examples or generated examples that are validated with user feedback.
220
+
221
+ For example, when building an image-editor skill, relevant questions include:
222
+
223
+ - "What functionality should the image-editor skill support? Editing, rotating, anything else?"
224
+ - "Can you give some examples of how this skill would be used?"
225
+ - "I can imagine users asking for things like 'Remove the red-eye from this image' or 'Rotate this image'. Are there other ways you imagine this skill being used?"
226
+ - "What would a user say that should trigger this skill?"
227
+
228
+ To avoid overwhelming users, avoid asking too many questions in a single message. Start with the most important questions and follow up as needed for better effectiveness.
229
+
230
+ Conclude this step when there is a clear sense of the functionality the skill should support.
231
+
232
+ ### Step 2: Planning the Reusable Skill Contents
233
+
234
+ To turn concrete examples into an effective skill, analyze each example by:
235
+
236
+ 1. Considering how to execute on the example from scratch
237
+ 2. Identifying what scripts, references, and assets would be helpful when executing these workflows repeatedly
238
+
239
+ Example: When building a `pdf-editor` skill to handle queries like "Help me rotate this PDF," the analysis shows:
240
+
241
+ 1. Rotating a PDF requires re-writing the same code each time
242
+ 2. A `scripts/rotate_pdf.py` script would be helpful to store in the skill
243
+
244
+ Example: When designing a `frontend-webapp-builder` skill for queries like "Build me a todo app" or "Build me a dashboard to track my steps," the analysis shows:
245
+
246
+ 1. Writing a frontend webapp requires the same boilerplate HTML/React each time
247
+ 2. An `assets/hello-world/` template containing the boilerplate HTML/React project files would be helpful to store in the skill
248
+
249
+ Example: When building a `big-query` skill to handle queries like "How many users have logged in today?" the analysis shows:
250
+
251
+ 1. Querying BigQuery requires re-discovering the table schemas and relationships each time
252
+ 2. A `references/schema.md` file documenting the table schemas would be helpful to store in the skill
253
+
254
+ To establish the skill's contents, analyze each concrete example to create a list of the reusable resources to include: scripts, references, and assets.
255
+
256
+ ### Step 3: Initializing the Skill
257
+
258
+ At this point, it is time to actually create the skill.
259
+
260
+ Skip this step only if the skill being developed already exists, and iteration or packaging is needed. In this case, continue to the next step.
261
+
262
+ When creating a new skill from scratch, always run the `init_skill.py` script. The script conveniently generates a new template skill directory that automatically includes everything a skill requires, making the skill creation process much more efficient and reliable.
263
+
264
+ Usage:
265
+
266
+ ```bash
267
+ scripts/init_skill.py <skill-name> --path <output-directory>
268
+ ```
269
+
270
+ The script:
271
+
272
+ - Creates the skill directory at the specified path
273
+ - Generates a SKILL.md template with proper frontmatter and TODO placeholders
274
+ - Creates example resource directories: `scripts/`, `references/`, and `assets/`
275
+ - Adds example files in each directory that can be customized or deleted
276
+
277
+ After initialization, customize or remove the generated SKILL.md and example files as needed.
278
+
279
+ ### Step 4: Edit the Skill
280
+
281
+ When editing the (newly-generated or existing) skill, remember that the skill is being created for another instance of Claude to use. Include information that would be beneficial and non-obvious to Claude. Consider what procedural knowledge, domain-specific details, or reusable assets would help another Claude instance execute these tasks more effectively.
282
+
283
+ #### Learn Proven Design Patterns
284
+
285
+ Consult these helpful guides based on your skill's needs:
286
+
287
+ - **Multi-step processes**: See references/workflows.md for sequential workflows and conditional logic
288
+ - **Specific output formats or quality standards**: See references/output-patterns.md for template and example patterns
289
+
290
+ These files contain established best practices for effective skill design.
291
+
292
+ #### Start with Reusable Skill Contents
293
+
294
+ To begin implementation, start with the reusable resources identified above: `scripts/`, `references/`, and `assets/` files. Note that this step may require user input. For example, when implementing a `brand-guidelines` skill, the user may need to provide brand assets or templates to store in `assets/`, or documentation to store in `references/`.
295
+
296
+ Added scripts must be tested by actually running them to ensure there are no bugs and that the output matches what is expected. If there are many similar scripts, only a representative sample needs to be tested to ensure confidence that they all work while balancing time to completion.
297
+
298
+ Any example files and directories not needed for the skill should be deleted. The initialization script creates example files in `scripts/`, `references/`, and `assets/` to demonstrate structure, but most skills won't need all of them.
299
+
300
+ #### Update SKILL.md
301
+
302
+ **Writing Guidelines:** Always use imperative/infinitive form.
303
+
304
+ ##### Frontmatter
305
+
306
+ Write the YAML frontmatter with `name` and `description`:
307
+
308
+ - `name`: The skill name
309
+ - `description`: This is the primary triggering mechanism for your skill, and helps Claude understand when to use the skill.
310
+ - Include both what the Skill does and specific triggers/contexts for when to use it.
311
+ - Include all "when to use" information here - Not in the body. The body is only loaded after triggering, so "When to Use This Skill" sections in the body are not helpful to Claude.
312
+ - Example description for a `docx` skill: "Comprehensive document creation, editing, and analysis with support for tracked changes, comments, formatting preservation, and text extraction. Use when Claude needs to work with professional documents (.docx files) for: (1) Creating new documents, (2) Modifying or editing content, (3) Working with tracked changes, (4) Adding comments, or any other document tasks"
313
+
314
+ Do not include any other fields in YAML frontmatter.
315
+
316
+ ##### Body
317
+
318
+ Write instructions for using the skill and its bundled resources.
319
+
320
+ ### Step 5: Packaging a Skill
321
+
322
+ Once development of the skill is complete, it must be packaged into a distributable .skill file that gets shared with the user. The packaging process automatically validates the skill first to ensure it meets all requirements:
323
+
324
+ ```bash
325
+ scripts/package_skill.py <path/to/skill-folder>
326
+ ```
327
+
328
+ Optional output directory specification:
329
+
330
+ ```bash
331
+ scripts/package_skill.py <path/to/skill-folder> ./dist
332
+ ```
333
+
334
+ The packaging script will:
335
+
336
+ 1. **Validate** the skill automatically, checking:
337
+
338
+ - YAML frontmatter format and required fields
339
+ - Skill naming conventions and directory structure
340
+ - Description completeness and quality
341
+ - File organization and resource references
342
+
343
+ 2. **Package** the skill if validation passes, creating a .skill file named after the skill (e.g., `my-skill.skill`) that includes all files and maintains the proper directory structure for distribution. The .skill file is a zip file with a .skill extension.
344
+
345
+ If validation fails, the script will report the errors and exit without creating a package. Fix any validation errors and run the packaging command again.
346
+
347
+ ### Step 6: Iterate
348
+
349
+ After testing the skill, users may request improvements. Often this happens right after using the skill, with fresh context of how the skill performed.
350
+
351
+ **Iteration workflow:**
352
+
353
+ 1. Use the skill on real tasks
354
+ 2. Notice struggles or inefficiencies
355
+ 3. Identify how SKILL.md or bundled resources should be updated
356
+ 4. Implement changes and test again
skill-creator/references/output-patterns.md ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Output Patterns
2
+
3
+ Use these patterns when skills need to produce consistent, high-quality output.
4
+
5
+ ## Template Pattern
6
+
7
+ Provide templates for output format. Match the level of strictness to your needs.
8
+
9
+ **For strict requirements (like API responses or data formats):**
10
+
11
+ ```markdown
12
+ ## Report structure
13
+
14
+ ALWAYS use this exact template structure:
15
+
16
+ # [Analysis Title]
17
+
18
+ ## Executive summary
19
+ [One-paragraph overview of key findings]
20
+
21
+ ## Key findings
22
+ - Finding 1 with supporting data
23
+ - Finding 2 with supporting data
24
+ - Finding 3 with supporting data
25
+
26
+ ## Recommendations
27
+ 1. Specific actionable recommendation
28
+ 2. Specific actionable recommendation
29
+ ```
30
+
31
+ **For flexible guidance (when adaptation is useful):**
32
+
33
+ ```markdown
34
+ ## Report structure
35
+
36
+ Here is a sensible default format, but use your best judgment:
37
+
38
+ # [Analysis Title]
39
+
40
+ ## Executive summary
41
+ [Overview]
42
+
43
+ ## Key findings
44
+ [Adapt sections based on what you discover]
45
+
46
+ ## Recommendations
47
+ [Tailor to the specific context]
48
+
49
+ Adjust sections as needed for the specific analysis type.
50
+ ```
51
+
52
+ ## Examples Pattern
53
+
54
+ For skills where output quality depends on seeing examples, provide input/output pairs:
55
+
56
+ ```markdown
57
+ ## Commit message format
58
+
59
+ Generate commit messages following these examples:
60
+
61
+ **Example 1:**
62
+ Input: Added user authentication with JWT tokens
63
+ Output:
64
+ ```
65
+ feat(auth): implement JWT-based authentication
66
+
67
+ Add login endpoint and token validation middleware
68
+ ```
69
+
70
+ **Example 2:**
71
+ Input: Fixed bug where dates displayed incorrectly in reports
72
+ Output:
73
+ ```
74
+ fix(reports): correct date formatting in timezone conversion
75
+
76
+ Use UTC timestamps consistently across report generation
77
+ ```
78
+
79
+ Follow this style: type(scope): brief description, then detailed explanation.
80
+ ```
81
+
82
+ Examples help Claude understand the desired style and level of detail more clearly than descriptions alone.
skill-creator/references/workflows.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Workflow Patterns
2
+
3
+ ## Sequential Workflows
4
+
5
+ For complex tasks, break operations into clear, sequential steps. It is often helpful to give Claude an overview of the process towards the beginning of SKILL.md:
6
+
7
+ ```markdown
8
+ Filling a PDF form involves these steps:
9
+
10
+ 1. Analyze the form (run analyze_form.py)
11
+ 2. Create field mapping (edit fields.json)
12
+ 3. Validate mapping (run validate_fields.py)
13
+ 4. Fill the form (run fill_form.py)
14
+ 5. Verify output (run verify_output.py)
15
+ ```
16
+
17
+ ## Conditional Workflows
18
+
19
+ For tasks with branching logic, guide Claude through decision points:
20
+
21
+ ```markdown
22
+ 1. Determine the modification type:
23
+ **Creating new content?** → Follow "Creation workflow" below
24
+ **Editing existing content?** → Follow "Editing workflow" below
25
+
26
+ 2. Creation workflow: [steps]
27
+ 3. Editing workflow: [steps]
28
+ ```
skill-creator/scripts/init_skill.py ADDED
@@ -0,0 +1,303 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Skill Initializer - Creates a new skill from template
4
+
5
+ Usage:
6
+ init_skill.py <skill-name> --path <path>
7
+
8
+ Examples:
9
+ init_skill.py my-new-skill --path skills/public
10
+ init_skill.py my-api-helper --path skills/private
11
+ init_skill.py custom-skill --path /custom/location
12
+ """
13
+
14
+ import sys
15
+ from pathlib import Path
16
+
17
+
18
+ SKILL_TEMPLATE = """---
19
+ name: {skill_name}
20
+ description: [TODO: Complete and informative explanation of what the skill does and when to use it. Include WHEN to use this skill - specific scenarios, file types, or tasks that trigger it.]
21
+ ---
22
+
23
+ # {skill_title}
24
+
25
+ ## Overview
26
+
27
+ [TODO: 1-2 sentences explaining what this skill enables]
28
+
29
+ ## Structuring This Skill
30
+
31
+ [TODO: Choose the structure that best fits this skill's purpose. Common patterns:
32
+
33
+ **1. Workflow-Based** (best for sequential processes)
34
+ - Works well when there are clear step-by-step procedures
35
+ - Example: DOCX skill with "Workflow Decision Tree" → "Reading" → "Creating" → "Editing"
36
+ - Structure: ## Overview → ## Workflow Decision Tree → ## Step 1 → ## Step 2...
37
+
38
+ **2. Task-Based** (best for tool collections)
39
+ - Works well when the skill offers different operations/capabilities
40
+ - Example: PDF skill with "Quick Start" → "Merge PDFs" → "Split PDFs" → "Extract Text"
41
+ - Structure: ## Overview → ## Quick Start → ## Task Category 1 → ## Task Category 2...
42
+
43
+ **3. Reference/Guidelines** (best for standards or specifications)
44
+ - Works well for brand guidelines, coding standards, or requirements
45
+ - Example: Brand styling with "Brand Guidelines" → "Colors" → "Typography" → "Features"
46
+ - Structure: ## Overview → ## Guidelines → ## Specifications → ## Usage...
47
+
48
+ **4. Capabilities-Based** (best for integrated systems)
49
+ - Works well when the skill provides multiple interrelated features
50
+ - Example: Product Management with "Core Capabilities" → numbered capability list
51
+ - Structure: ## Overview → ## Core Capabilities → ### 1. Feature → ### 2. Feature...
52
+
53
+ Patterns can be mixed and matched as needed. Most skills combine patterns (e.g., start with task-based, add workflow for complex operations).
54
+
55
+ Delete this entire "Structuring This Skill" section when done - it's just guidance.]
56
+
57
+ ## [TODO: Replace with the first main section based on chosen structure]
58
+
59
+ [TODO: Add content here. See examples in existing skills:
60
+ - Code samples for technical skills
61
+ - Decision trees for complex workflows
62
+ - Concrete examples with realistic user requests
63
+ - References to scripts/templates/references as needed]
64
+
65
+ ## Resources
66
+
67
+ This skill includes example resource directories that demonstrate how to organize different types of bundled resources:
68
+
69
+ ### scripts/
70
+ Executable code (Python/Bash/etc.) that can be run directly to perform specific operations.
71
+
72
+ **Examples from other skills:**
73
+ - PDF skill: `fill_fillable_fields.py`, `extract_form_field_info.py` - utilities for PDF manipulation
74
+ - DOCX skill: `document.py`, `utilities.py` - Python modules for document processing
75
+
76
+ **Appropriate for:** Python scripts, shell scripts, or any executable code that performs automation, data processing, or specific operations.
77
+
78
+ **Note:** Scripts may be executed without loading into context, but can still be read by Claude for patching or environment adjustments.
79
+
80
+ ### references/
81
+ Documentation and reference material intended to be loaded into context to inform Claude's process and thinking.
82
+
83
+ **Examples from other skills:**
84
+ - Product management: `communication.md`, `context_building.md` - detailed workflow guides
85
+ - BigQuery: API reference documentation and query examples
86
+ - Finance: Schema documentation, company policies
87
+
88
+ **Appropriate for:** In-depth documentation, API references, database schemas, comprehensive guides, or any detailed information that Claude should reference while working.
89
+
90
+ ### assets/
91
+ Files not intended to be loaded into context, but rather used within the output Claude produces.
92
+
93
+ **Examples from other skills:**
94
+ - Brand styling: PowerPoint template files (.pptx), logo files
95
+ - Frontend builder: HTML/React boilerplate project directories
96
+ - Typography: Font files (.ttf, .woff2)
97
+
98
+ **Appropriate for:** Templates, boilerplate code, document templates, images, icons, fonts, or any files meant to be copied or used in the final output.
99
+
100
+ ---
101
+
102
+ **Any unneeded directories can be deleted.** Not every skill requires all three types of resources.
103
+ """
104
+
105
+ EXAMPLE_SCRIPT = '''#!/usr/bin/env python3
106
+ """
107
+ Example helper script for {skill_name}
108
+
109
+ This is a placeholder script that can be executed directly.
110
+ Replace with actual implementation or delete if not needed.
111
+
112
+ Example real scripts from other skills:
113
+ - pdf/scripts/fill_fillable_fields.py - Fills PDF form fields
114
+ - pdf/scripts/convert_pdf_to_images.py - Converts PDF pages to images
115
+ """
116
+
117
+ def main():
118
+ print("This is an example script for {skill_name}")
119
+ # TODO: Add actual script logic here
120
+ # This could be data processing, file conversion, API calls, etc.
121
+
122
+ if __name__ == "__main__":
123
+ main()
124
+ '''
125
+
126
+ EXAMPLE_REFERENCE = """# Reference Documentation for {skill_title}
127
+
128
+ This is a placeholder for detailed reference documentation.
129
+ Replace with actual reference content or delete if not needed.
130
+
131
+ Example real reference docs from other skills:
132
+ - product-management/references/communication.md - Comprehensive guide for status updates
133
+ - product-management/references/context_building.md - Deep-dive on gathering context
134
+ - bigquery/references/ - API references and query examples
135
+
136
+ ## When Reference Docs Are Useful
137
+
138
+ Reference docs are ideal for:
139
+ - Comprehensive API documentation
140
+ - Detailed workflow guides
141
+ - Complex multi-step processes
142
+ - Information too lengthy for main SKILL.md
143
+ - Content that's only needed for specific use cases
144
+
145
+ ## Structure Suggestions
146
+
147
+ ### API Reference Example
148
+ - Overview
149
+ - Authentication
150
+ - Endpoints with examples
151
+ - Error codes
152
+ - Rate limits
153
+
154
+ ### Workflow Guide Example
155
+ - Prerequisites
156
+ - Step-by-step instructions
157
+ - Common patterns
158
+ - Troubleshooting
159
+ - Best practices
160
+ """
161
+
162
+ EXAMPLE_ASSET = """# Example Asset File
163
+
164
+ This placeholder represents where asset files would be stored.
165
+ Replace with actual asset files (templates, images, fonts, etc.) or delete if not needed.
166
+
167
+ Asset files are NOT intended to be loaded into context, but rather used within
168
+ the output Claude produces.
169
+
170
+ Example asset files from other skills:
171
+ - Brand guidelines: logo.png, slides_template.pptx
172
+ - Frontend builder: hello-world/ directory with HTML/React boilerplate
173
+ - Typography: custom-font.ttf, font-family.woff2
174
+ - Data: sample_data.csv, test_dataset.json
175
+
176
+ ## Common Asset Types
177
+
178
+ - Templates: .pptx, .docx, boilerplate directories
179
+ - Images: .png, .jpg, .svg, .gif
180
+ - Fonts: .ttf, .otf, .woff, .woff2
181
+ - Boilerplate code: Project directories, starter files
182
+ - Icons: .ico, .svg
183
+ - Data files: .csv, .json, .xml, .yaml
184
+
185
+ Note: This is a text placeholder. Actual assets can be any file type.
186
+ """
187
+
188
+
189
+ def title_case_skill_name(skill_name):
190
+ """Convert hyphenated skill name to Title Case for display."""
191
+ return ' '.join(word.capitalize() for word in skill_name.split('-'))
192
+
193
+
194
+ def init_skill(skill_name, path):
195
+ """
196
+ Initialize a new skill directory with template SKILL.md.
197
+
198
+ Args:
199
+ skill_name: Name of the skill
200
+ path: Path where the skill directory should be created
201
+
202
+ Returns:
203
+ Path to created skill directory, or None if error
204
+ """
205
+ # Determine skill directory path
206
+ skill_dir = Path(path).resolve() / skill_name
207
+
208
+ # Check if directory already exists
209
+ if skill_dir.exists():
210
+ print(f"❌ Error: Skill directory already exists: {skill_dir}")
211
+ return None
212
+
213
+ # Create skill directory
214
+ try:
215
+ skill_dir.mkdir(parents=True, exist_ok=False)
216
+ print(f"✅ Created skill directory: {skill_dir}")
217
+ except Exception as e:
218
+ print(f"❌ Error creating directory: {e}")
219
+ return None
220
+
221
+ # Create SKILL.md from template
222
+ skill_title = title_case_skill_name(skill_name)
223
+ skill_content = SKILL_TEMPLATE.format(
224
+ skill_name=skill_name,
225
+ skill_title=skill_title
226
+ )
227
+
228
+ skill_md_path = skill_dir / 'SKILL.md'
229
+ try:
230
+ skill_md_path.write_text(skill_content)
231
+ print("✅ Created SKILL.md")
232
+ except Exception as e:
233
+ print(f"❌ Error creating SKILL.md: {e}")
234
+ return None
235
+
236
+ # Create resource directories with example files
237
+ try:
238
+ # Create scripts/ directory with example script
239
+ scripts_dir = skill_dir / 'scripts'
240
+ scripts_dir.mkdir(exist_ok=True)
241
+ example_script = scripts_dir / 'example.py'
242
+ example_script.write_text(EXAMPLE_SCRIPT.format(skill_name=skill_name))
243
+ example_script.chmod(0o755)
244
+ print("✅ Created scripts/example.py")
245
+
246
+ # Create references/ directory with example reference doc
247
+ references_dir = skill_dir / 'references'
248
+ references_dir.mkdir(exist_ok=True)
249
+ example_reference = references_dir / 'api_reference.md'
250
+ example_reference.write_text(EXAMPLE_REFERENCE.format(skill_title=skill_title))
251
+ print("✅ Created references/api_reference.md")
252
+
253
+ # Create assets/ directory with example asset placeholder
254
+ assets_dir = skill_dir / 'assets'
255
+ assets_dir.mkdir(exist_ok=True)
256
+ example_asset = assets_dir / 'example_asset.txt'
257
+ example_asset.write_text(EXAMPLE_ASSET)
258
+ print("✅ Created assets/example_asset.txt")
259
+ except Exception as e:
260
+ print(f"❌ Error creating resource directories: {e}")
261
+ return None
262
+
263
+ # Print next steps
264
+ print(f"\n✅ Skill '{skill_name}' initialized successfully at {skill_dir}")
265
+ print("\nNext steps:")
266
+ print("1. Edit SKILL.md to complete the TODO items and update the description")
267
+ print("2. Customize or delete the example files in scripts/, references/, and assets/")
268
+ print("3. Run the validator when ready to check the skill structure")
269
+
270
+ return skill_dir
271
+
272
+
273
+ def main():
274
+ if len(sys.argv) < 4 or sys.argv[2] != '--path':
275
+ print("Usage: init_skill.py <skill-name> --path <path>")
276
+ print("\nSkill name requirements:")
277
+ print(" - Hyphen-case identifier (e.g., 'data-analyzer')")
278
+ print(" - Lowercase letters, digits, and hyphens only")
279
+ print(" - Max 40 characters")
280
+ print(" - Must match directory name exactly")
281
+ print("\nExamples:")
282
+ print(" init_skill.py my-new-skill --path skills/public")
283
+ print(" init_skill.py my-api-helper --path skills/private")
284
+ print(" init_skill.py custom-skill --path /custom/location")
285
+ sys.exit(1)
286
+
287
+ skill_name = sys.argv[1]
288
+ path = sys.argv[3]
289
+
290
+ print(f"🚀 Initializing skill: {skill_name}")
291
+ print(f" Location: {path}")
292
+ print()
293
+
294
+ result = init_skill(skill_name, path)
295
+
296
+ if result:
297
+ sys.exit(0)
298
+ else:
299
+ sys.exit(1)
300
+
301
+
302
+ if __name__ == "__main__":
303
+ main()
skill-creator/scripts/package_skill.py ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Skill Packager - Creates a distributable .skill file of a skill folder
4
+
5
+ Usage:
6
+ python utils/package_skill.py <path/to/skill-folder> [output-directory]
7
+
8
+ Example:
9
+ python utils/package_skill.py skills/public/my-skill
10
+ python utils/package_skill.py skills/public/my-skill ./dist
11
+ """
12
+
13
+ import sys
14
+ import zipfile
15
+ from pathlib import Path
16
+ from quick_validate import validate_skill
17
+
18
+
19
+ def package_skill(skill_path, output_dir=None):
20
+ """
21
+ Package a skill folder into a .skill file.
22
+
23
+ Args:
24
+ skill_path: Path to the skill folder
25
+ output_dir: Optional output directory for the .skill file (defaults to current directory)
26
+
27
+ Returns:
28
+ Path to the created .skill file, or None if error
29
+ """
30
+ skill_path = Path(skill_path).resolve()
31
+
32
+ # Validate skill folder exists
33
+ if not skill_path.exists():
34
+ print(f"❌ Error: Skill folder not found: {skill_path}")
35
+ return None
36
+
37
+ if not skill_path.is_dir():
38
+ print(f"❌ Error: Path is not a directory: {skill_path}")
39
+ return None
40
+
41
+ # Validate SKILL.md exists
42
+ skill_md = skill_path / "SKILL.md"
43
+ if not skill_md.exists():
44
+ print(f"❌ Error: SKILL.md not found in {skill_path}")
45
+ return None
46
+
47
+ # Run validation before packaging
48
+ print("🔍 Validating skill...")
49
+ valid, message = validate_skill(skill_path)
50
+ if not valid:
51
+ print(f"❌ Validation failed: {message}")
52
+ print(" Please fix the validation errors before packaging.")
53
+ return None
54
+ print(f"✅ {message}\n")
55
+
56
+ # Determine output location
57
+ skill_name = skill_path.name
58
+ if output_dir:
59
+ output_path = Path(output_dir).resolve()
60
+ output_path.mkdir(parents=True, exist_ok=True)
61
+ else:
62
+ output_path = Path.cwd()
63
+
64
+ skill_filename = output_path / f"{skill_name}.skill"
65
+
66
+ # Create the .skill file (zip format)
67
+ try:
68
+ with zipfile.ZipFile(skill_filename, 'w', zipfile.ZIP_DEFLATED) as zipf:
69
+ # Walk through the skill directory
70
+ for file_path in skill_path.rglob('*'):
71
+ if file_path.is_file():
72
+ # Calculate the relative path within the zip
73
+ arcname = file_path.relative_to(skill_path.parent)
74
+ zipf.write(file_path, arcname)
75
+ print(f" Added: {arcname}")
76
+
77
+ print(f"\n✅ Successfully packaged skill to: {skill_filename}")
78
+ return skill_filename
79
+
80
+ except Exception as e:
81
+ print(f"❌ Error creating .skill file: {e}")
82
+ return None
83
+
84
+
85
+ def main():
86
+ if len(sys.argv) < 2:
87
+ print("Usage: python utils/package_skill.py <path/to/skill-folder> [output-directory]")
88
+ print("\nExample:")
89
+ print(" python utils/package_skill.py skills/public/my-skill")
90
+ print(" python utils/package_skill.py skills/public/my-skill ./dist")
91
+ sys.exit(1)
92
+
93
+ skill_path = sys.argv[1]
94
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
95
+
96
+ print(f"📦 Packaging skill: {skill_path}")
97
+ if output_dir:
98
+ print(f" Output directory: {output_dir}")
99
+ print()
100
+
101
+ result = package_skill(skill_path, output_dir)
102
+
103
+ if result:
104
+ sys.exit(0)
105
+ else:
106
+ sys.exit(1)
107
+
108
+
109
+ if __name__ == "__main__":
110
+ main()
skill-creator/scripts/quick_validate.py ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Quick validation script for skills - minimal version
4
+ """
5
+
6
+ import sys
7
+ import os
8
+ import re
9
+ import yaml
10
+ from pathlib import Path
11
+
12
+ def validate_skill(skill_path):
13
+ """Basic validation of a skill"""
14
+ skill_path = Path(skill_path)
15
+
16
+ # Check SKILL.md exists
17
+ skill_md = skill_path / 'SKILL.md'
18
+ if not skill_md.exists():
19
+ return False, "SKILL.md not found"
20
+
21
+ # Read and validate frontmatter
22
+ content = skill_md.read_text()
23
+ if not content.startswith('---'):
24
+ return False, "No YAML frontmatter found"
25
+
26
+ # Extract frontmatter
27
+ match = re.match(r'^---\n(.*?)\n---', content, re.DOTALL)
28
+ if not match:
29
+ return False, "Invalid frontmatter format"
30
+
31
+ frontmatter_text = match.group(1)
32
+
33
+ # Parse YAML frontmatter
34
+ try:
35
+ frontmatter = yaml.safe_load(frontmatter_text)
36
+ if not isinstance(frontmatter, dict):
37
+ return False, "Frontmatter must be a YAML dictionary"
38
+ except yaml.YAMLError as e:
39
+ return False, f"Invalid YAML in frontmatter: {e}"
40
+
41
+ # Define allowed properties
42
+ ALLOWED_PROPERTIES = {'name', 'description', 'license', 'allowed-tools', 'metadata'}
43
+
44
+ # Check for unexpected properties (excluding nested keys under metadata)
45
+ unexpected_keys = set(frontmatter.keys()) - ALLOWED_PROPERTIES
46
+ if unexpected_keys:
47
+ return False, (
48
+ f"Unexpected key(s) in SKILL.md frontmatter: {', '.join(sorted(unexpected_keys))}. "
49
+ f"Allowed properties are: {', '.join(sorted(ALLOWED_PROPERTIES))}"
50
+ )
51
+
52
+ # Check required fields
53
+ if 'name' not in frontmatter:
54
+ return False, "Missing 'name' in frontmatter"
55
+ if 'description' not in frontmatter:
56
+ return False, "Missing 'description' in frontmatter"
57
+
58
+ # Extract name for validation
59
+ name = frontmatter.get('name', '')
60
+ if not isinstance(name, str):
61
+ return False, f"Name must be a string, got {type(name).__name__}"
62
+ name = name.strip()
63
+ if name:
64
+ # Check naming convention (hyphen-case: lowercase with hyphens)
65
+ if not re.match(r'^[a-z0-9-]+$', name):
66
+ return False, f"Name '{name}' should be hyphen-case (lowercase letters, digits, and hyphens only)"
67
+ if name.startswith('-') or name.endswith('-') or '--' in name:
68
+ return False, f"Name '{name}' cannot start/end with hyphen or contain consecutive hyphens"
69
+ # Check name length (max 64 characters per spec)
70
+ if len(name) > 64:
71
+ return False, f"Name is too long ({len(name)} characters). Maximum is 64 characters."
72
+
73
+ # Extract and validate description
74
+ description = frontmatter.get('description', '')
75
+ if not isinstance(description, str):
76
+ return False, f"Description must be a string, got {type(description).__name__}"
77
+ description = description.strip()
78
+ if description:
79
+ # Check for angle brackets
80
+ if '<' in description or '>' in description:
81
+ return False, "Description cannot contain angle brackets (< or >)"
82
+ # Check description length (max 1024 characters per spec)
83
+ if len(description) > 1024:
84
+ return False, f"Description is too long ({len(description)} characters). Maximum is 1024 characters."
85
+
86
+ return True, "Skill is valid!"
87
+
88
+ if __name__ == "__main__":
89
+ if len(sys.argv) != 2:
90
+ print("Usage: python quick_validate.py <skill_directory>")
91
+ sys.exit(1)
92
+
93
+ valid, message = validate_skill(sys.argv[1])
94
+ print(message)
95
+ sys.exit(0 if valid else 1)