Nicolai-Rhesis-AI commited on
Commit
4371926
·
verified ·
1 Parent(s): daa3803

Update README.md

Browse files

Aligned HF description with GH.

Files changed (1) hide show
  1. README.md +256 -36
README.md CHANGED
@@ -1,71 +1,291 @@
1
- # Rhesis: Open-Source Gen AI Testing 🦫
2
 
3
  <p align="center">
4
- <img src="https://cdn.prod.website-files.com/68c3e3b148a4fd9bcf76eb6a/68c95daec03defb40e24fca4_Rhesis%20AI_Logo_RGB_Website%20logo-p-500.png" alt="Rhesis Logo" width="200"/>
5
  </p>
6
 
7
  <p align="center">
8
  <a href="https://github.com/rhesis-ai/rhesis/blob/main/LICENSE">
9
- <img src="https://img.shields.io/badge/license-MIT%20%2B%20Commercial-blue" alt="License" style="display:inline-block;">
10
  </a>
11
  <a href="https://pypi.org/project/rhesis-sdk/">
12
- <img src="https://img.shields.io/pypi/v/rhesis-sdk" alt="PyPI Version" style="display:inline-block;">
13
  </a>
14
  <a href="https://pypi.org/project/rhesis-sdk/">
15
- <img src="https://img.shields.io/pypi/pyversions/rhesis-sdk" alt="Python Versions" style="display:inline-block;">
 
 
 
16
  </a>
17
  <a href="https://discord.rhesis.ai">
18
- <img src="https://img.shields.io/discord/1340989671601209408?color=7289da&label=Discord&logo=discord&logoColor=white" alt="Discord" style="display:inline-block;">
19
  </a>
20
  <a href="https://www.linkedin.com/company/rhesis-ai">
21
- <img src="https://img.shields.io/badge/LinkedIn-Rhesis_AI-blue?logo=linkedin" alt="LinkedIn" style="display:inline-block;">
 
 
 
22
  </a>
23
  <a href="https://docs.rhesis.ai">
24
- <img src="https://img.shields.io/badge/docs-rhesis.ai-blue" alt="Documentation" style="display:inline-block;">
25
  </a>
26
  </p>
27
 
28
- > Your team defines expectations, Rhesis generates and executes thousands of test scenarios. So that you know what you ship.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
- Open-source Gen AI testing platform. Collaborative test management that turns domain expertise into comprehensive automated testing. These datasets help teams assess Gen AI application robustness, reliability, and compliance across real-world scenarios.
31
 
32
- ### Using our datasets
33
 
34
- Our datasets are designed to test various aspects of LLM application behavior, from reliability to safety and bias detection. To get started:
35
 
36
- 1. Browse the available test sets here on Hugging Face.
37
- 2. Select the dataset that aligns with your evaluation needs.
38
- 3. Load and apply the test cases to assess your application’s behavior.
 
 
 
39
 
40
- For more advanced testing and seamless integration, the [Rhesis SDK](https://github.com/rhesis-ai/rhesis-sdk) provides tools to automate dataset handling, generate structured test cases, and streamline evaluation workflows.
41
 
42
- ## Key features
 
 
 
43
 
44
- - **Curated Test Sets** – Pre-built datasets covering diverse evaluation criteria.
45
- - **Dynamic Test Generation** – Generate custom test sets tailored to specific use cases.
46
- - **Scalability** – Use datasets for one-off evaluations or integrate them into automated testing pipelines.
47
 
48
- For questions or custom datasets, reach out at **hello@rhesis.ai**.
 
49
 
50
- ### Example use cases:
 
 
 
 
 
 
 
51
 
52
- - **AI Financial Advisor**:
53
- Evaluate the reliability and accuracy of financial guidance provided by LLM applications, ensuring sound advice for users.
54
-
55
- - **AI Claim Processing**:
56
- Test for and eliminate biases in LLM-supported claim decisions, ensuring fair and compliant processing of insurance claims.
57
 
58
- - **AI Sales Advisor**:
59
- Validate the accuracy of product recommendations, enhancing customer satisfaction and driving more successful sales.
60
 
61
- - **AI Support Chatbot**:
62
- Ensure that your chatbot consistently delivers helpful, accurate, and empathetic responses across various scenarios.
63
 
64
- ### Disclaimer
65
 
66
- Some test cases may contain sensitive, challenging, or potentially upsetting content. These cases are included to ensure thorough and realistic assessments. Users should review test cases carefully and exercise discretion when utilizing them.
67
 
68
- ### Connect with us
69
 
70
- For more details about our testing platform, datasets, and solutions, including the Rhesis AI SDK, visit [Rhesis AI](https://www.rhesis.ai/).
71
- Join our **[Discord community](https://discord.rhesis.ai)** to connect with other AI engineers, discuss best practices, and stay updated on new test sets.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Rhesis: Collaborative Testing for LLM & Agentic Applications
2
 
3
  <p align="center">
4
+ <img src="https://github.com/user-attachments/assets/ff43ca6a-ffde-4aff-9ff9-eec3897d0d02" alt="Rhesis AI Logo" height="80">
5
  </p>
6
 
7
  <p align="center">
8
  <a href="https://github.com/rhesis-ai/rhesis/blob/main/LICENSE">
9
+ <img src="https://img.shields.io/badge/license-MIT%20%2B%20Enterprise-blue" alt="License">
10
  </a>
11
  <a href="https://pypi.org/project/rhesis-sdk/">
12
+ <img src="https://img.shields.io/pypi/v/rhesis-sdk" alt="PyPI Version">
13
  </a>
14
  <a href="https://pypi.org/project/rhesis-sdk/">
15
+ <img src="https://img.shields.io/pypi/pyversions/rhesis-sdk" alt="Python Versions">
16
+ </a>
17
+ <a href="https://codecov.io/gh/rhesis-ai/rhesis">
18
+ <img src="https://codecov.io/gh/rhesis-ai/rhesis/graph/badge.svg?token=1XQV983JEJ" alt="codecov">
19
  </a>
20
  <a href="https://discord.rhesis.ai">
21
+ <img src="https://img.shields.io/discord/1340989671601209408?color=7289da&label=Discord&logo=discord&logoColor=white" alt="Discord">
22
  </a>
23
  <a href="https://www.linkedin.com/company/rhesis-ai">
24
+ <img src="https://img.shields.io/badge/LinkedIn-Rhesis_AI-blue?logo=linkedin" alt="LinkedIn">
25
+ </a>
26
+ <a href="https://huggingface.co/rhesis">
27
+ <img src="https://img.shields.io/badge/🤗-Rhesis-yellow" alt="Hugging Face">
28
  </a>
29
  <a href="https://docs.rhesis.ai">
30
+ <img src="https://img.shields.io/badge/docs-rhesis.ai-blue" alt="Documentation">
31
  </a>
32
  </p>
33
 
34
+ <p align="center">
35
+ <a href="https://rhesis.ai"><strong>Website</strong></a> ·
36
+ <a href="https://docs.rhesis.ai"><strong>Docs</strong></a> ·
37
+ <a href="https://discord.rhesis.ai"><strong>Discord</strong></a> ·
38
+ <a href="https://github.com/rhesis-ai/rhesis/blob/main/CHANGELOG.md"><strong>Changelog</strong></a>
39
+ </p>
40
+
41
+ <h3 align="center">More than just evals.<br><strong>Collaborative agent testing for teams.</strong></h3>
42
+
43
+ <p align="center">
44
+ Generate tests from requirements, simulate conversation flows, detect adversarial behaviors, evaluate with 60+ metrics, and trace failures with OpenTelemetry. Engineers and domain experts, working together.
45
+ </p>
46
+
47
+ <p align="center">
48
+ <a href="https://rhesis.ai/?video=open" target="_blank">
49
+ <img src="https://github.com/rhesis-ai/rhesis/blob/main/.github/images/GH_Short_Demo.png"
50
+ loading="lazy"
51
+ width="1080"
52
+ alt="Rhesis Platform Overview - Click to watch demo">
53
+ </a>
54
+ </p>
55
+
56
+ ---
57
+
58
+ ## Core features
59
+
60
+ <p align="center">
61
+ <img src="https://github.com/rhesis-ai/rhesis/blob/main/.github/images/GH_Features.png"
62
+ loading="lazy"
63
+ width="1080"
64
+ alt="Rhesis Core Features">
65
+ </p>
66
+
67
+ ### Test generation
68
+
69
+ **AI-Powered Synthesis** - Describe requirements in plain language. Rhesis generates hundreds of test scenarios including edge cases and adversarial prompts.
70
+
71
+ **Knowledge-Aware** - Connect context sources via file upload or MCP (Notion, GitHub, Jira, Confluence) for better test generation.
72
+
73
+ ### Single-turn & conversation simulation
74
+
75
+ **Single-turn** for Q&A validation. **Conversation simulation** for dialogue flows.
76
+
77
+ **Penelope Agent** simulates realistic conversations to test context retention, role adherence, and dialogue coherence across extended interactions.
78
+
79
+ ### Adversarial testing (red-teaming)
80
+
81
+ **Polyphemus Agent** proactively finds vulnerabilities:
82
+
83
+ - Jailbreak attempts and prompt injection
84
+ - PII leakage and data extraction
85
+ - Harmful content generation
86
+ - Role violation and instruction bypassing
87
+
88
+ **Garak Integration** - Built-in support for [garak](https://github.com/leondz/garak), the LLM vulnerability scanner, for comprehensive security testing.
89
+
90
+ ### 60+ pre-built metrics
91
+
92
+ | Framework | Example Metrics |
93
+ |-----------|-----------------|
94
+ | **RAGAS** | Context relevance, faithfulness, answer accuracy |
95
+ | **DeepEval** | Bias, toxicity, PII leakage, role violation, turn relevancy, knowledge retention |
96
+ | **Garak** | Jailbreak detection, prompt injection, XSS, malware generation, data leakage |
97
+ | **Custom** | NumericJudge, CategoricalJudge for domain-specific evaluation |
98
+
99
+ All metrics include LLM-as-Judge reasoning explanations.
100
+
101
+ ### Traces & observability
102
+
103
+ Monitor your LLM applications with OpenTelemetry-based tracing:
104
+
105
+ ```python
106
+ from rhesis.sdk.decorators import observe
107
+
108
+ @observe.llm(model="gpt-4")
109
+ def generate_response(prompt: str) -> str:
110
+ # Your LLM call here
111
+ return response
112
+ ```
113
+
114
+ Track LLM calls, latency, token usage, and link traces to test results for debugging.
115
+
116
+ ### Bring your own model
117
+
118
+ Use any LLM provider for test generation and evaluation:
119
+
120
+ **Cloud:** OpenAI, Anthropic, Google Gemini, Mistral, Cohere, Groq, Together AI
121
+
122
+ **Local/Self-hosted:** Ollama, vLLM, LiteLLM
123
+
124
+ See [Model Configuration Docs](https://docs.rhesis.ai/sdk/models) for setup instructions.
125
+
126
+ ---
127
 
128
+ ## Curated Test Sets on Hugging Face
129
 
130
+ We publish curated test datasets on [Hugging Face](https://huggingface.co/rhesis) to help teams assess their LLM applications. These test sets cover diverse evaluation scenarios across conversational AI, agentic systems, RAG applications, and more—helping you validate robustness, reliability, safety, and compliance.
131
 
132
+ ### What's available
133
 
134
+ Test sets designed for:
135
+ - **Conversational AI** - Multi-turn dialogue, context retention, role adherence
136
+ - **Agentic Systems** - Tool selection, goal achievement, multi-agent coordination
137
+ - **RAG Systems** - Context relevance, faithfulness, hallucination detection
138
+ - **Adversarial Testing** - Jailbreak resistance, prompt injection, PII leakage
139
+ - **Domain-Specific Applications** - Finance, healthcare, customer support, sales, and more
140
 
141
+ ### Using our test sets
142
 
143
+ **Option 1: Rhesis Platform**
144
+ 1. Download a test set from [Hugging Face](https://huggingface.co/rhesis)
145
+ 2. In the Rhesis platform, navigate to **Test Sets** → **Import from file**
146
+ 3. Upload the downloaded CSV file
147
 
148
+ **Option 2: Python SDK**
 
 
149
 
150
+ ```python
151
+ from rhesis.sdk import TestSet
152
 
153
+ # Load tests from a CSV file downloaded from Hugging Face
154
+ test_set = TestSet.from_csv(
155
+ "tests.csv",
156
+ name="Imported Tests",
157
+ description="Tests imported from Hugging Face"
158
+ )
159
+ print(f"Loaded {len(test_set.tests)} tests")
160
+ ```
161
 
162
+ > **Disclaimer:** Some test cases may contain sensitive or challenging content included for thorough realistic assessment. Review test cases carefully and exercise discretion when utilizing them.
 
 
 
 
163
 
164
+ ---
 
165
 
166
+ ## Why Rhesis?
 
167
 
168
+ **Platform for teams. SDK for developers.**
169
 
170
+ Use the collaborative platform for team-based testing: product managers define requirements, domain experts review results, engineers integrate via CI/CD. Or integrate directly with the Python SDK for code-first workflows.
171
 
172
+ ### The testing lifecycle
173
 
174
+ Six integrated phases from project setup to team collaboration:
175
+
176
+ | Phase | What You Do |
177
+ |--------------------------------|-------------|
178
+ | **[1. Projects](https://docs.rhesis.ai/platform/projects)** | Configure your AI application, upload & connect context sources (files, docs), set up SDK connectors |
179
+ | **[2. Requirements](https://docs.rhesis.ai/platform/behaviors)** | Define expected behaviors (what your app should and shouldn't do), cover all relevant aspects from product, marketing, customer support, legal and compliance teams |
180
+ | **[3. Metrics](https://docs.rhesis.ai/platform/metrics)** | Select from 60+ pre-built metrics or create custom LLM-as-Judge evaluations to assess whether your requirements are met |
181
+ | **[4. Tests](https://docs.rhesis.ai/platform/tests)** | Generate single-turn and conversation simulation test scenarios. Organize in test sets and understand your test coverage |
182
+ | **[5. Execution](https://docs.rhesis.ai/platform/test-execution)** | Run tests via UI, SDK, or API; integrate into CI/CD pipelines; collect traces during execution |
183
+ | **[6. Collaboration](https://docs.rhesis.ai/platform/test-runs)** | Review results with your team through comments, tasks, workflows, and side-by-side comparisons |
184
+
185
+ ### Rhesis vs...
186
+
187
+ | Instead of... | Rhesis gives you... |
188
+ |---------------|---------------------|
189
+ | **Manual testing** | AI-generated test cases based on your context, hundreds in minutes |
190
+ | **Traditional test frameworks** | Non-deterministic output handling built-in |
191
+ | **LLM observability tools** | Pre-production validation, not post-production monitoring |
192
+ | **Red-teaming services** | Continuous, self-service adversarial testing, not one-time audits |
193
+
194
+ ---
195
+
196
+ ## Deployment options
197
+
198
+ | Option | Best For | Setup Time |
199
+ |--------|----------|------------|
200
+ | **[Rhesis Cloud](https://app.rhesis.ai)** | Teams wanting managed deployment | Instant |
201
+ | **Docker** | Local development and testing | 5 minutes |
202
+ | **Kubernetes** | Production self-hosting | [See docs](https://docs.rhesis.ai/getting-started/self-hosting) |
203
+
204
+ ### Quick Start
205
+
206
+ **Option 1: Cloud (fastest)** - [app.rhesis.ai](https://app.rhesis.ai) - Managed service, just connect your app
207
+
208
+ **Option 2: Self-host with Docker**
209
+ ```bash
210
+ git clone https://github.com/rhesis-ai/rhesis.git && cd rhesis && ./rh start
211
+ ```
212
+
213
+ **Access:** Frontend at `localhost:3000`, API at `localhost:8080/docs`
214
+
215
+ **Commands:** `./rh logs` · `./rh stop` · `./rh restart` · `./rh delete`
216
+
217
+ > **Note:** This setup enables auto-login for local testing. For production, see [Self-hosting Documentation](https://docs.rhesis.ai/getting-started/self-hosting).
218
+
219
+ **Option 3: Python SDK**
220
+ ```bash
221
+ pip install rhesis-sdk
222
+ ```
223
+
224
+ ---
225
+
226
+ ## Integrations
227
+
228
+ Connect Rhesis to your LLM stack:
229
+
230
+ | Integration | Languages | Description |
231
+ |-------------|-----------|-------------|
232
+ | **Rhesis SDK** | Python, JS/TS | Native SDK with decorators for endpoints and observability. Full control over test execution and tracing. |
233
+ | **OpenAI** | Python | Drop-in replacement for OpenAI SDK. Automatic instrumentation with zero code changes. |
234
+ | **Anthropic** | Python | Native support for Claude models with automatic tracing. |
235
+ | **LangChain** | Python | Add Rhesis callback handler to your LangChain app for automatic tracing and test execution. |
236
+ | **LangGraph** | Python | Built-in integration for LangGraph agent workflows with full observability. |
237
+ | **AutoGen** | Python | Automatic instrumentation for Microsoft AutoGen multi-agent conversations. |
238
+ | **LiteLLM** | Python | Unified interface for 100+ LLMs (OpenAI, Azure, Anthropic, Cohere, Ollama, vLLM, HuggingFace, Replicate). |
239
+ | **Google Gemini** | Python | Native integration for Google's Gemini models. |
240
+ | **Ollama** | Python | Local LLM deployment with Ollama integration. |
241
+ | **OpenRouter** | Python | Access to multiple LLM providers through OpenRouter. |
242
+ | **Vertex AI** | Python | Google Cloud Vertex AI model support. |
243
+ | **HuggingFace** | Python | Direct integration with HuggingFace models. |
244
+ | **REST API** | Any | Direct API access for custom integrations. [OpenAPI spec available](https://api.rhesis.ai/docs). |
245
+
246
+ See [Integration Docs](https://docs.rhesis.ai/development) for setup instructions.
247
+
248
+ ---
249
+
250
+ ## Open source
251
+
252
+ [MIT licensed](LICENSE). No plans to relicense core features. Enterprise version live in `ee/` folders and remain separate.
253
+
254
+ We built Rhesis because existing LLM testing tools didn't meet our needs. If you face the same challenges, contributions are welcome.
255
+
256
+ ---
257
+
258
+ ## Contributing
259
+
260
+ See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
261
+
262
+ **Ways to contribute:** Fix bugs or add features · Contribute test sets for common failure modes · Improve documentation · Help others in Discord or GitHub discussions
263
+
264
+ ---
265
+
266
+ ## Support
267
+
268
+ - **[Documentation](https://docs.rhesis.ai)** - Guides and API reference
269
+ - **[Discord](https://discord.rhesis.ai)** - Community support
270
+ - **[GitHub Issues](https://github.com/rhesis-ai/rhesis/issues)** - Bug reports and feature requests
271
+
272
+ ---
273
+
274
+ ## Security & privacy
275
+
276
+ We take data security seriously. See our [Privacy Policy](https://rhesis.ai/privacy-policy) for details.
277
+
278
+ **Telemetry:** Rhesis collects basic, anonymized usage statistics to improve the product. No sensitive data is collected or shared with third parties.
279
+
280
+ - **Self-hosted:** Opt out by setting `OTEL_RHESIS_TELEMETRY_ENABLED=false`
281
+ - **Cloud:** Telemetry enabled as part of Terms & Conditions
282
+
283
+ ---
284
+
285
+ <p align="center">
286
+ <strong>Made with <img src="https://github.com/user-attachments/assets/598c2d81-572c-46bd-b718-dee32cdc749c" height="16" alt="Rhesis logo"> in Potsdam, Germany 🇩🇪</strong>
287
+ </p>
288
+
289
+ <p align="center">
290
+ <a href="https://rhesis.ai">Learn more at rhesis.ai</a>
291
+ </p>