env module

class env.DummyEnv(task: str | None = None, end_immediately: bool = True)[source]

Bases: Environment[DummyEnvState]

Simple Environment with basic functionality and no network usage.

State

alias of DummyEnvState

export_frame() Frame[source]

Export a snapshot of the environment as a Frame for visualization or debugging.

If you are not sure what to put in the Frame, just give it the entire state. See the Frame class itself for more information.

classmethod from_task(task: str) DummyEnv[source]

Create an environment from a task description.

A task is meant to be closer to a user prompt - like what you would expect in calling an LLM. This is how the environment should be used after training and in deployment. We don’t take config here, because the default environment config should be general for arbitrary tasks.

For example, with GSM8k/calculator: “What is 18 * (number of legs on a cat) / moons of mars?”

async reset() tuple[list[Message], list[Tool]][source]

Reset the environment and collect initial observation(s).

Possible observations could be instructions on how tools are related, or the goal of the environment.

Returns:

Two-tuple of initial observations and tools.

async step(action: ToolRequestMessage) tuple[list[Message], float, bool, bool][source]

Take a step in the environment.

Parameters:

action – Action to take.

Returns:

Four-tuple of new observations, instantaneous reward for this action, a flag

symbolizing if the episode is done, and a flag symbolizing if the episode was truncated (e.g. via early stopping).

class env.DummyEnvState(*, messages: list[Message], reward: float = 0, done: bool = False)[source]

Bases: BaseModel

done: bool
messages: list[Message]
model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'done': FieldInfo(annotation=bool, required=False, default=False), 'messages': FieldInfo(annotation=list[Message], required=True), 'reward': FieldInfo(annotation=float, required=False, default=0)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

reward: float
class env.DummyTaskDataset[source]

Bases: TaskDataset[DummyEnv]

A dummy task of infinite DummyEnvs.

get_new_env() DummyEnv[source]

Get an env from a non-indexable dataset.

class env.Environment[source]

Bases: ABC, Generic[TEnvState]

An environment is a stateful place where agents use tools and make observations.

Tools are housed in the environment because they can interact with the environment.

Environments (and their contained tools) are not trainable.

classmethod available() set[str][source]

See list of available environment classes for from_name.

This is not exhaustive, because some may be importable and so you should just try to call from_name. This is more for logging/debugging purposes.

async close() None[source]

Shutdown the environment.

If this is unimplemented, __del__ will manage cleanup.

async exec_tool_calls(message: ToolRequestMessage, ordered: bool = False, handle_tool_exc: bool = False, **function_kwargs) list[ToolResponseMessage][source]

Execute an ordered list of tool calls.

Parameters:
  • message – ToolRequestMessage containing the tool calls.

  • ordered – Opt-in flag for forcing sequential execution (according to order in the above message), otherwise tool calls are made concurrently.

  • handle_tool_exc – Opt-in flag to suppress Exceptions and return them as a ToolResponseMessage.

  • **function_kwargs – Keyword arguments to pass to all tool functions.

Returns:

Ordered list of ToolResponseMessages, order matches the order of tool calls

in the input message.

export_frame() Frame[source]

Export a snapshot of the environment as a Frame for visualization or debugging.

If you are not sure what to put in the Frame, just give it the entire state. See the Frame class itself for more information.

filter_invalid_tool_calls(message: ToolRequestMessage) tuple[ToolRequestMessage, ToolRequestMessage][source]

Split a list of tool calls into valid and invalid subsets.

Parameters:

message – Tool request message containing tool calls.

Returns:

Two-tuple of ToolRequestMessage containing valid messages and

ToolRequestMessage containing invalid messages

classmethod from_name(name: str, task: str | None = None, **env_kwargs) Self[source]

Create an environment from the name of the class. Call Environment.available() to see list.

classmethod from_task(task: str) Self[source]

Create an environment from a task description.

A task is meant to be closer to a user prompt - like what you would expect in calling an LLM. This is how the environment should be used after training and in deployment. We don’t take config here, because the default environment config should be general for arbitrary tasks.

For example, with GSM8k/calculator: “What is 18 * (number of legs on a cat) / moons of mars?”

abstract async reset() tuple[list[Message], list[Tool]][source]

Reset the environment and collect initial observation(s).

Possible observations could be instructions on how tools are related, or the goal of the environment.

Returns:

Two-tuple of initial observations and tools.

state: TEnvState
abstract async step(action: ToolRequestMessage) tuple[list[Message], float, bool, bool][source]

Take a step in the environment.

Parameters:

action – Action to take.

Returns:

Four-tuple of new observations, instantaneous reward for this action, a flag

symbolizing if the episode is done, and a flag symbolizing if the episode was truncated (e.g. via early stopping).

tools: list[Tool]
class env.Frame(*, deepcopy: bool = True, state: Annotated[dict | list | int | float | str | bool | BaseModel | None, WrapSerializer(func=_custom_serializer, return_type=PydanticUndefined, when_used=always)] = None, info: Annotated[dict | list | int | float | str | bool | BaseModel | None, WrapSerializer(func=_custom_serializer, return_type=PydanticUndefined, when_used=always)] = None)[source]

Bases: BaseModel

A frame is a snapshot at a given timestep. The name comes from video frame.

deepcopy: bool
info: Annotated[Serializable | None, WrapSerializer(_custom_serializer)]
classmethod make_deepcopy(v: dict | list | int | float | str | bool | BaseModel, info: ValidationInfo) dict | list | int | float | str | bool | BaseModel[source]
model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'deepcopy': FieldInfo(annotation=bool, required=False, default=True, description="Whether to deepcopy the state and info fields. Disable if you're sure they're immutable or desire mutability."), 'info': FieldInfo(annotation=Union[dict, list, int, float, str, bool, BaseModel, NoneType], required=False, default=None, description="Optional metadata that doesn't vary with state.", metadata=[WrapSerializer(func=<staticmethod(<function Frame._custom_serializer>)>, return_type=PydanticUndefined, when_used='always')]), 'state': FieldInfo(annotation=Union[dict, list, int, float, str, bool, BaseModel, NoneType], required=False, default=None, description='Either entire (or a subset of) the current state. Leave as default of None if state is irrelevant.', metadata=[WrapSerializer(func=<staticmethod(<function Frame._custom_serializer>)>, return_type=PydanticUndefined, when_used='always')])}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

state: Annotated[Serializable | None, WrapSerializer(_custom_serializer)]
class env.TaskConfig(*, name: str, task_kwargs: dict[str, BaseModel | JsonValue] = None, train_kwargs: dict[str, BaseModel | JsonValue] = None, eval_kwargs: dict[str, BaseModel | JsonValue] = None, test_kwargs: dict[str, BaseModel | JsonValue] = None)[source]

Bases: BaseModel

Convenience for making a config file entry for a TaskDataset.

eval_kwargs: dict[str, BaseModel | JsonValue]
make_dataset(split: str) TaskDataset[source]
model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'eval_kwargs': FieldInfo(annotation=dict[str, Union[BaseModel, JsonValue]], required=False, default_factory=dict, description='Additional arguments for the evaluation split.'), 'name': FieldInfo(annotation=str, required=True), 'task_kwargs': FieldInfo(annotation=dict[str, Union[BaseModel, JsonValue]], required=False, default_factory=dict, description='Arguments to pass to TaskDataset.from_name()'), 'test_kwargs': FieldInfo(annotation=dict[str, Union[BaseModel, JsonValue]], required=False, default_factory=dict, description='Additional arguments for the test split.'), 'train_kwargs': FieldInfo(annotation=dict[str, Union[BaseModel, JsonValue]], required=False, default_factory=dict, description='Additional arguments for the training split.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

name: str
task_kwargs: dict[str, BaseModel | JsonValue]
test_kwargs: dict[str, BaseModel | JsonValue]
train_kwargs: dict[str, BaseModel | JsonValue]
class env.TaskDataset[source]

Bases: ABC, Generic[TEnvironment]

A base class for a dataset of tasks as environments.

Examples of task datasets: GSM8k, HotPotQA, etc. These are related environments instances with different problem specifications and reward conditions.

classmethod from_name(name: str, **env_kwargs) TaskDataset[source]
get_new_env() TEnvironment[source]

Get an env from a non-indexable dataset.

get_new_env_by_idx(idx: int) TEnvironment[source]

Get an env from a finite dataset.

iter_batches(batch_size: int, shuffle: bool = False) Iterator[list[TEnvironment]][source]

Construct batches from this dataset.

Parameters:
  • batch_size – Size of each batch. Note that if this dataset’s size is finite and isn’t evenly divisible by this value, the last yielded batch will be smaller than batch_size.

  • shuffle – Opt-in flag to shuffle without replacement.

Yields:

An iterator over batches of environments.