shibbir24's picture
Upload 520 files
e6410cf verified
# Evaluations API
This document outlines the API endpoints for managing evaluations in PySpur.
## List Available Evaluations
**Description**: Lists all available evaluations by scanning the tasks directory for YAML files. Returns metadata about each evaluation including name, description, type, and number of samples.
**URL**: `/evals/`
**Method**: GET
**Response Schema**:
```python
List[Dict[str, Any]]
```
Each dictionary in the list contains:
```python
{
"name": str, # Name of the evaluation
"description": str, # Description of the evaluation
"type": str, # Type of evaluation
"num_samples": str, # Number of samples in the evaluation
"paper_link": str, # Link to the paper describing the evaluation
"file_name": str # Name of the YAML file
}
```
## Launch Evaluation
**Description**: Launches an evaluation job by triggering the evaluator with the specified evaluation configuration. The evaluation is run asynchronously in the background.
**URL**: `/evals/launch/`
**Method**: POST
**Request Payload**:
```python
class EvalRunRequest:
eval_name: str # Name of the evaluation to run
workflow_id: str # ID of the workflow to evaluate
output_variable: str # Output variable to evaluate
num_samples: int = 100 # Number of random samples to evaluate
```
**Response Schema**:
```python
class EvalRunResponse:
run_id: str # ID of the evaluation run
eval_name: str # Name of the evaluation
workflow_id: str # ID of the workflow being evaluated
status: EvalRunStatusEnum # Status of the evaluation run
start_time: datetime # When the evaluation started
end_time: Optional[datetime] # When the evaluation ended (if completed)
results: Optional[Dict[str, Any]] # Results of the evaluation (if completed)
```
## Get Evaluation Run Status
**Description**: Gets the status of a specific evaluation run, including results if the evaluation has completed.
**URL**: `/evals/runs/{eval_run_id}`
**Method**: GET
**Parameters**:
```python
eval_run_id: str # ID of the evaluation run
```
**Response Schema**:
```python
class EvalRunResponse:
run_id: str # ID of the evaluation run
eval_name: str # Name of the evaluation
workflow_id: str # ID of the workflow being evaluated
status: EvalRunStatusEnum # Status of the evaluation run
start_time: datetime # When the evaluation started
end_time: Optional[datetime] # When the evaluation ended (if completed)
results: Optional[Dict[str, Any]] # Results of the evaluation (if completed)
```
## List Evaluation Runs
**Description**: Lists all evaluation runs, ordered by start time descending.
**URL**: `/evals/runs/`
**Method**: GET
**Response Schema**:
```python
List[EvalRunResponse]
```
Where `EvalRunResponse` contains:
```python
class EvalRunResponse:
run_id: str # ID of the evaluation run
eval_name: str # Name of the evaluation
workflow_id: str # ID of the workflow being evaluated
status: EvalRunStatusEnum # Status of the evaluation run
start_time: datetime # When the evaluation started
end_time: Optional[datetime] # When the evaluation ended (if completed)
results: Optional[Dict[str, Any]] # Results of the evaluation (if completed)
```