Spaces:

rhesis
/

README

No application file

App Files Files Community

Nicolai-Rhesis-AI commited on Feb 16

Commit

4371926

verified ·

1 Parent(s): daa3803

Update README.md

Browse files

Aligned HF description with GH.

Files changed (1) hide show

README.md +256 -36

README.md CHANGED Viewed

@@ -1,71 +1,291 @@
-# Rhesis: Open-Source Gen AI Testing 🦫
 <p align="center">
-  <img src="https://cdn.prod.website-files.com/68c3e3b148a4fd9bcf76eb6a/68c95daec03defb40e24fca4_Rhesis%20AI_Logo_RGB_Website%20logo-p-500.png" alt="Rhesis Logo" width="200"/>
 </p>
 <p align="center">
   <a href="https://github.com/rhesis-ai/rhesis/blob/main/LICENSE">
-    <img src="https://img.shields.io/badge/license-MIT%20%2B%20Commercial-blue" alt="License" style="display:inline-block;">
   </a>
   <a href="https://pypi.org/project/rhesis-sdk/">
-    <img src="https://img.shields.io/pypi/v/rhesis-sdk" alt="PyPI Version" style="display:inline-block;">
   </a>
   <a href="https://pypi.org/project/rhesis-sdk/">
-    <img src="https://img.shields.io/pypi/pyversions/rhesis-sdk" alt="Python Versions" style="display:inline-block;">
   </a>
   <a href="https://discord.rhesis.ai">
-    <img src="https://img.shields.io/discord/1340989671601209408?color=7289da&label=Discord&logo=discord&logoColor=white" alt="Discord" style="display:inline-block;">
   </a>
   <a href="https://www.linkedin.com/company/rhesis-ai">
-    <img src="https://img.shields.io/badge/LinkedIn-Rhesis_AI-blue?logo=linkedin" alt="LinkedIn" style="display:inline-block;">
   </a>
   <a href="https://docs.rhesis.ai">
-    <img src="https://img.shields.io/badge/docs-rhesis.ai-blue" alt="Documentation" style="display:inline-block;">
   </a>
 </p>
-> Your team defines expectations, Rhesis generates and executes thousands of test scenarios. So that you know what you ship.
-Open-source Gen AI testing platform. Collaborative test management that turns domain expertise into comprehensive automated testing. These datasets help teams assess Gen AI application robustness, reliability, and compliance across real-world scenarios.
-### Using our datasets
-Our datasets are designed to test various aspects of LLM application behavior, from reliability to safety and bias detection. To get started:
-1. Browse the available test sets here on Hugging Face.
-2. Select the dataset that aligns with your evaluation needs.
-3. Load and apply the test cases to assess your application’s behavior.
-For more advanced testing and seamless integration, the [Rhesis SDK](https://github.com/rhesis-ai/rhesis-sdk) provides tools to automate dataset handling, generate structured test cases, and streamline evaluation workflows.
-## Key features
-- **Curated Test Sets** – Pre-built datasets covering diverse evaluation criteria.
-- **Dynamic Test Generation** – Generate custom test sets tailored to specific use cases.
-- **Scalability** – Use datasets for one-off evaluations or integrate them into automated testing pipelines.
-For questions or custom datasets, reach out at **hello@rhesis.ai**.
-### Example use cases:
-- **AI Financial Advisor**:
-   Evaluate the reliability and accuracy of financial guidance provided by LLM applications, ensuring sound advice for users.
-- **AI Claim Processing**:
-   Test for and eliminate biases in LLM-supported claim decisions, ensuring fair and compliant processing of insurance claims.
-- **AI Sales Advisor**:
-   Validate the accuracy of product recommendations, enhancing customer satisfaction and driving more successful sales.
-- **AI Support Chatbot**:
-   Ensure that your chatbot consistently delivers helpful, accurate, and empathetic responses across various scenarios.
-### Disclaimer
-Some test cases may contain sensitive, challenging, or potentially upsetting content. These cases are included to ensure thorough and realistic assessments. Users should review test cases carefully and exercise discretion when utilizing them.
-### Connect with us
-For more details about our testing platform, datasets, and solutions, including the Rhesis AI SDK, visit [Rhesis AI](https://www.rhesis.ai/).
-Join our **[Discord community](https://discord.rhesis.ai)** to connect with other AI engineers, discuss best practices, and stay updated on new test sets.

+# Rhesis: Collaborative Testing for LLM & Agentic Applications
 <p align="center">
+  <img src="https://github.com/user-attachments/assets/ff43ca6a-ffde-4aff-9ff9-eec3897d0d02" alt="Rhesis AI Logo" height="80">
 </p>
 <p align="center">
   <a href="https://github.com/rhesis-ai/rhesis/blob/main/LICENSE">
+    <img src="https://img.shields.io/badge/license-MIT%20%2B%20Enterprise-blue" alt="License">
   </a>
   <a href="https://pypi.org/project/rhesis-sdk/">
+    <img src="https://img.shields.io/pypi/v/rhesis-sdk" alt="PyPI Version">
   </a>
   <a href="https://pypi.org/project/rhesis-sdk/">
+    <img src="https://img.shields.io/pypi/pyversions/rhesis-sdk" alt="Python Versions">
+  </a>
+  <a href="https://codecov.io/gh/rhesis-ai/rhesis">
+    <img src="https://codecov.io/gh/rhesis-ai/rhesis/graph/badge.svg?token=1XQV983JEJ" alt="codecov">
   </a>
   <a href="https://discord.rhesis.ai">
+    <img src="https://img.shields.io/discord/1340989671601209408?color=7289da&label=Discord&logo=discord&logoColor=white" alt="Discord">
   </a>
   <a href="https://www.linkedin.com/company/rhesis-ai">
+    <img src="https://img.shields.io/badge/LinkedIn-Rhesis_AI-blue?logo=linkedin" alt="LinkedIn">
+  </a>
+  <a href="https://huggingface.co/rhesis">
+    <img src="https://img.shields.io/badge/🤗-Rhesis-yellow" alt="Hugging Face">
   </a>
   <a href="https://docs.rhesis.ai">
+    <img src="https://img.shields.io/badge/docs-rhesis.ai-blue" alt="Documentation">
   </a>
 </p>
+<p align="center">
+  <a href="https://rhesis.ai"><strong>Website</strong></a> ·
+  <a href="https://docs.rhesis.ai"><strong>Docs</strong></a> ·
+  <a href="https://discord.rhesis.ai"><strong>Discord</strong></a> ·
+  <a href="https://github.com/rhesis-ai/rhesis/blob/main/CHANGELOG.md"><strong>Changelog</strong></a>
+</p>
+<h3 align="center">More than just evals.<br><strong>Collaborative agent testing for teams.</strong></h3>
+<p align="center">
+Generate tests from requirements, simulate conversation flows, detect adversarial behaviors, evaluate with 60+ metrics, and trace failures with OpenTelemetry. Engineers and domain experts, working together.
+</p>
+<p align="center">
+  <a href="https://rhesis.ai/?video=open" target="_blank">
+    <img src="https://github.com/rhesis-ai/rhesis/blob/main/.github/images/GH_Short_Demo.png"
+         loading="lazy"
+         width="1080"
+         alt="Rhesis Platform Overview - Click to watch demo">
+  </a>
+</p>
+---
+## Core features
+<p align="center">
+  <img src="https://github.com/rhesis-ai/rhesis/blob/main/.github/images/GH_Features.png"
+       loading="lazy"
+       width="1080"
+       alt="Rhesis Core Features">
+</p>
+### Test generation
+**AI-Powered Synthesis** - Describe requirements in plain language. Rhesis generates hundreds of test scenarios including edge cases and adversarial prompts.
+**Knowledge-Aware** - Connect context sources via file upload or MCP (Notion, GitHub, Jira, Confluence) for better test generation.
+### Single-turn & conversation simulation
+**Single-turn** for Q&A validation. **Conversation simulation** for dialogue flows.
+**Penelope Agent** simulates realistic conversations to test context retention, role adherence, and dialogue coherence across extended interactions.
+### Adversarial testing (red-teaming)
+**Polyphemus Agent** proactively finds vulnerabilities:
+- Jailbreak attempts and prompt injection
+- PII leakage and data extraction
+- Harmful content generation
+- Role violation and instruction bypassing
+**Garak Integration** - Built-in support for [garak](https://github.com/leondz/garak), the LLM vulnerability scanner, for comprehensive security testing.
+### 60+ pre-built metrics
+| Framework | Example Metrics |
+|-----------|-----------------|
+| **RAGAS** | Context relevance, faithfulness, answer accuracy |
+| **DeepEval** | Bias, toxicity, PII leakage, role violation, turn relevancy, knowledge retention |
+| **Garak** | Jailbreak detection, prompt injection, XSS, malware generation, data leakage |
+| **Custom** | NumericJudge, CategoricalJudge for domain-specific evaluation |
+All metrics include LLM-as-Judge reasoning explanations.
+### Traces & observability
+Monitor your LLM applications with OpenTelemetry-based tracing:
+```python
+from rhesis.sdk.decorators import observe
+@observe.llm(model="gpt-4")
+def generate_response(prompt: str) -> str:
+    # Your LLM call here
+    return response
+```
+Track LLM calls, latency, token usage, and link traces to test results for debugging.
+### Bring your own model
+Use any LLM provider for test generation and evaluation:
+**Cloud:** OpenAI, Anthropic, Google Gemini, Mistral, Cohere, Groq, Together AI
+**Local/Self-hosted:** Ollama, vLLM, LiteLLM
+See [Model Configuration Docs](https://docs.rhesis.ai/sdk/models) for setup instructions.
+---
+## Curated Test Sets on Hugging Face
+We publish curated test datasets on [Hugging Face](https://huggingface.co/rhesis) to help teams assess their LLM applications. These test sets cover diverse evaluation scenarios across conversational AI, agentic systems, RAG applications, and more—helping you validate robustness, reliability, safety, and compliance.
+### What's available
+Test sets designed for:
+- **Conversational AI** - Multi-turn dialogue, context retention, role adherence
+- **Agentic Systems** - Tool selection, goal achievement, multi-agent coordination
+- **RAG Systems** - Context relevance, faithfulness, hallucination detection
+- **Adversarial Testing** - Jailbreak resistance, prompt injection, PII leakage
+- **Domain-Specific Applications** - Finance, healthcare, customer support, sales, and more
+### Using our test sets
+**Option 1: Rhesis Platform**
+1. Download a test set from [Hugging Face](https://huggingface.co/rhesis)
+2. In the Rhesis platform, navigate to **Test Sets** → **Import from file**
+3. Upload the downloaded CSV file
+**Option 2: Python SDK**
+```python
+from rhesis.sdk import TestSet
+# Load tests from a CSV file downloaded from Hugging Face
+test_set = TestSet.from_csv(
+    "tests.csv",
+    name="Imported Tests",
+    description="Tests imported from Hugging Face"
+)
+print(f"Loaded {len(test_set.tests)} tests")
+```
+> **Disclaimer:** Some test cases may contain sensitive or challenging content included for thorough realistic assessment. Review test cases carefully and exercise discretion when utilizing them.
+---
+## Why Rhesis?
+**Platform for teams. SDK for developers.**
+Use the collaborative platform for team-based testing: product managers define requirements, domain experts review results, engineers integrate via CI/CD. Or integrate directly with the Python SDK for code-first workflows.
+### The testing lifecycle
+Six integrated phases from project setup to team collaboration:
+| Phase | What You Do |
+|--------------------------------|-------------|
+| **[1. Projects](https://docs.rhesis.ai/platform/projects)** | Configure your AI application, upload & connect context sources (files, docs), set up SDK connectors |
+| **[2. Requirements](https://docs.rhesis.ai/platform/behaviors)** | Define expected behaviors (what your app should and shouldn't do), cover all relevant aspects from product, marketing, customer support, legal and compliance teams |
+| **[3. Metrics](https://docs.rhesis.ai/platform/metrics)** | Select from 60+ pre-built metrics or create custom LLM-as-Judge evaluations to assess whether your requirements are met |
+| **[4. Tests](https://docs.rhesis.ai/platform/tests)** | Generate single-turn and conversation simulation test scenarios. Organize in test sets and understand your test coverage |
+| **[5. Execution](https://docs.rhesis.ai/platform/test-execution)** | Run tests via UI, SDK, or API; integrate into CI/CD pipelines; collect traces during execution |
+| **[6. Collaboration](https://docs.rhesis.ai/platform/test-runs)** | Review results with your team through comments, tasks, workflows, and side-by-side comparisons |
+### Rhesis vs...
+| Instead of... | Rhesis gives you... |
+|---------------|---------------------|
+| **Manual testing** | AI-generated test cases based on your context, hundreds in minutes |
+| **Traditional test frameworks** | Non-deterministic output handling built-in |
+| **LLM observability tools** | Pre-production validation, not post-production monitoring |
+| **Red-teaming services** | Continuous, self-service adversarial testing, not one-time audits |
+---
+## Deployment options
+| Option | Best For | Setup Time |
+|--------|----------|------------|
+| **[Rhesis Cloud](https://app.rhesis.ai)** | Teams wanting managed deployment | Instant |
+| **Docker** | Local development and testing | 5 minutes |
+| **Kubernetes** | Production self-hosting | [See docs](https://docs.rhesis.ai/getting-started/self-hosting) |
+### Quick Start
+**Option 1: Cloud (fastest)** - [app.rhesis.ai](https://app.rhesis.ai) - Managed service, just connect your app
+**Option 2: Self-host with Docker**
+```bash
+git clone https://github.com/rhesis-ai/rhesis.git && cd rhesis && ./rh start
+```
+**Access:** Frontend at `localhost:3000`, API at `localhost:8080/docs`
+**Commands:** `./rh logs` · `./rh stop` · `./rh restart` · `./rh delete`
+> **Note:** This setup enables auto-login for local testing. For production, see [Self-hosting Documentation](https://docs.rhesis.ai/getting-started/self-hosting).
+**Option 3: Python SDK**
+```bash
+pip install rhesis-sdk
+```
+---
+## Integrations
+Connect Rhesis to your LLM stack:
+| Integration | Languages | Description |
+|-------------|-----------|-------------|
+| **Rhesis SDK** | Python, JS/TS | Native SDK with decorators for endpoints and observability. Full control over test execution and tracing. |
+| **OpenAI** | Python | Drop-in replacement for OpenAI SDK. Automatic instrumentation with zero code changes. |
+| **Anthropic** | Python | Native support for Claude models with automatic tracing. |
+| **LangChain** | Python | Add Rhesis callback handler to your LangChain app for automatic tracing and test execution. |
+| **LangGraph** | Python | Built-in integration for LangGraph agent workflows with full observability. |
+| **AutoGen** | Python | Automatic instrumentation for Microsoft AutoGen multi-agent conversations. |
+| **LiteLLM** | Python | Unified interface for 100+ LLMs (OpenAI, Azure, Anthropic, Cohere, Ollama, vLLM, HuggingFace, Replicate). |
+| **Google Gemini** | Python | Native integration for Google's Gemini models. |
+| **Ollama** | Python | Local LLM deployment with Ollama integration. |
+| **OpenRouter** | Python | Access to multiple LLM providers through OpenRouter. |
+| **Vertex AI** | Python | Google Cloud Vertex AI model support. |
+| **HuggingFace** | Python | Direct integration with HuggingFace models. |
+| **REST API** | Any | Direct API access for custom integrations. [OpenAPI spec available](https://api.rhesis.ai/docs). |
+See [Integration Docs](https://docs.rhesis.ai/development) for setup instructions.
+---
+## Open source
+[MIT licensed](LICENSE). No plans to relicense core features. Enterprise version live in `ee/` folders and remain separate.
+We built Rhesis because existing LLM testing tools didn't meet our needs. If you face the same challenges, contributions are welcome.
+---
+## Contributing
+See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
+**Ways to contribute:** Fix bugs or add features · Contribute test sets for common failure modes · Improve documentation · Help others in Discord or GitHub discussions
+---
+## Support
+- **[Documentation](https://docs.rhesis.ai)** - Guides and API reference
+- **[Discord](https://discord.rhesis.ai)** - Community support
+- **[GitHub Issues](https://github.com/rhesis-ai/rhesis/issues)** - Bug reports and feature requests
+---
+## Security & privacy
+We take data security seriously. See our [Privacy Policy](https://rhesis.ai/privacy-policy) for details.
+**Telemetry:** Rhesis collects basic, anonymized usage statistics to improve the product. No sensitive data is collected or shared with third parties.
+- **Self-hosted:** Opt out by setting `OTEL_RHESIS_TELEMETRY_ENABLED=false`
+- **Cloud:** Telemetry enabled as part of Terms & Conditions
+---
+<p align="center">
+  <strong>Made with <img src="https://github.com/user-attachments/assets/598c2d81-572c-46bd-b718-dee32cdc749c" height="16" alt="Rhesis logo"> in Potsdam, Germany 🇩🇪</strong>
+</p>
+<p align="center">
+  <a href="https://rhesis.ai">Learn more at rhesis.ai</a>
+</p>