File size: 2,562 Bytes
08c19c7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
"""
base_env.py
-----------
Abstract base class that every task environment must implement.
Follows the OpenEnv interface: reset / step / state.
"""

from abc import ABC, abstractmethod
from typing import Any, Dict

from env.schemas import Observation, Action, StepResult, ResetResult, StateResult


class BaseEnv(ABC):
    """
    OpenEnv-compliant base environment.

    Concrete task environments should subclass this and implement:
      - reset()   β†’ ResetResult
      - step()    β†’ StepResult
      - state()   β†’ StateResult
    """

    @abstractmethod
    def reset(self, seed: int | None = None) -> ResetResult:
        """
        Reset the environment to a fresh episode.

        Parameters
        ----------
        seed : optional RNG seed for reproducibility

        Returns
        -------
        ResetResult with the initial Observation and episode info.
        """
        ...

    @abstractmethod
    def step(self, action: Action) -> StepResult:
        """
        Apply an action and advance the episode by one step.

        Parameters
        ----------
        action : Action  – typed agent action

        Returns
        -------
        StepResult containing:
          - observation : updated Observation
          - reward      : Reward for this step
          - done        : True when the episode is over
          - info        : auxiliary diagnostic information
        """
        ...

    @abstractmethod
    def state(self) -> StateResult:
        """
        Return the full internal state (for debugging / graders).
        Should NOT be used by the agent during evaluation.

        Returns
        -------
        StateResult – internal episode state snapshot.
        """
        ...

    # ------------------------------------------------------------------
    # Optional helpers subclasses may override
    # ------------------------------------------------------------------

    def render(self) -> str:
        """Human-readable rendering of the current state."""
        s = self.state()
        return (
            f"Task: {s.task_id} | Contract: {s.contract_name} | "
            f"Step: {s.step_count} | Reward: {s.cumulative_reward:.2f} | "
            f"Done: {s.done}"
        )

    def action_space_description(self) -> Dict[str, Any]:
        """Returns a JSON-serialisable description of the action space."""
        return {}

    def observation_space_description(self) -> Dict[str, Any]:
        """Returns a JSON-serialisable description of the observation space."""
        return {}