File size: 3,370 Bytes
e6410cf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
# Evaluations API

This document outlines the API endpoints for managing evaluations in PySpur.

## List Available Evaluations

**Description**: Lists all available evaluations by scanning the tasks directory for YAML files. Returns metadata about each evaluation including name, description, type, and number of samples.

**URL**: `/evals/`

**Method**: GET

**Response Schema**:
```python
List[Dict[str, Any]]
```

Each dictionary in the list contains:
```python
{
    "name": str,  # Name of the evaluation
    "description": str,  # Description of the evaluation
    "type": str,  # Type of evaluation
    "num_samples": str,  # Number of samples in the evaluation
    "paper_link": str,  # Link to the paper describing the evaluation
    "file_name": str  # Name of the YAML file
}
```

## Launch Evaluation

**Description**: Launches an evaluation job by triggering the evaluator with the specified evaluation configuration. The evaluation is run asynchronously in the background.

**URL**: `/evals/launch/`

**Method**: POST

**Request Payload**:
```python
class EvalRunRequest:

    eval_name: str  # Name of the evaluation to run
    workflow_id: str  # ID of the workflow to evaluate
    output_variable: str  # Output variable to evaluate
    num_samples: int = 100  # Number of random samples to evaluate
```

**Response Schema**:
```python
class EvalRunResponse:

    run_id: str  # ID of the evaluation run
    eval_name: str  # Name of the evaluation
    workflow_id: str  # ID of the workflow being evaluated
    status: EvalRunStatusEnum  # Status of the evaluation run
    start_time: datetime  # When the evaluation started
    end_time: Optional[datetime]  # When the evaluation ended (if completed)
    results: Optional[Dict[str, Any]]  # Results of the evaluation (if completed)
```

## Get Evaluation Run Status

**Description**: Gets the status of a specific evaluation run, including results if the evaluation has completed.

**URL**: `/evals/runs/{eval_run_id}`

**Method**: GET

**Parameters**:
```python
eval_run_id: str  # ID of the evaluation run
```

**Response Schema**:
```python
class EvalRunResponse:

    run_id: str  # ID of the evaluation run
    eval_name: str  # Name of the evaluation
    workflow_id: str  # ID of the workflow being evaluated
    status: EvalRunStatusEnum  # Status of the evaluation run
    start_time: datetime  # When the evaluation started
    end_time: Optional[datetime]  # When the evaluation ended (if completed)
    results: Optional[Dict[str, Any]]  # Results of the evaluation (if completed)
```

## List Evaluation Runs

**Description**: Lists all evaluation runs, ordered by start time descending.

**URL**: `/evals/runs/`

**Method**: GET

**Response Schema**:
```python
List[EvalRunResponse]
```

Where `EvalRunResponse` contains:

```python
class EvalRunResponse:

    run_id: str  # ID of the evaluation run
    eval_name: str  # Name of the evaluation
    workflow_id: str  # ID of the workflow being evaluated
    status: EvalRunStatusEnum  # Status of the evaluation run
    start_time: datetime  # When the evaluation started
    end_time: Optional[datetime]  # When the evaluation ended (if completed)
    results: Optional[Dict[str, Any]]  # Results of the evaluation (if completed)
```