Buckets:

hf-doc-build/doc / simulate /v0.0.1 /en /api /reward_functions.html
rtrm's picture
download
raw
12.3 kB
<meta charset="utf-8" /><meta http-equiv="content-security-policy" content=""><meta name="hf:doc:metadata" content="{&quot;local&quot;:&quot;simulate.RewardFunction&quot;,&quot;title&quot;:&quot;Reward Functions&quot;}" data-svelte="svelte-1phssyn">
<link rel="modulepreload" href="/docs/simulate/v0.0.1/en/_app/assets/pages/__layout.svelte-hf-doc-builder.css">
<link rel="modulepreload" href="/docs/simulate/v0.0.1/en/_app/start-hf-doc-builder.js">
<link rel="modulepreload" href="/docs/simulate/v0.0.1/en/_app/chunks/vendor-hf-doc-builder.js">
<link rel="modulepreload" href="/docs/simulate/v0.0.1/en/_app/chunks/paths-hf-doc-builder.js">
<link rel="modulepreload" href="/docs/simulate/v0.0.1/en/_app/pages/__layout.svelte-hf-doc-builder.js">
<link rel="modulepreload" href="/docs/simulate/v0.0.1/en/_app/pages/api/reward_functions.mdx-hf-doc-builder.js">
<link rel="modulepreload" href="/docs/simulate/v0.0.1/en/_app/chunks/Docstring-hf-doc-builder.js">
<link rel="modulepreload" href="/docs/simulate/v0.0.1/en/_app/chunks/IconCopyLink-hf-doc-builder.js">
<h1 class="relative group"><a id="simulate.RewardFunction" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#simulate.RewardFunction"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a>
<span>Reward Functions
</span></h1>
<p>🤗 Simulate provides a system to define simple and complex reward functions. This is achieved through the combination of “leaf” reward
functions, such as Sparse and Dense rewards functions, and predicate reward functions.</p>
<p>(LINK TO REWARD PREDICATE DIAGRAM)</p>
<p>Reward functions can be parameterized with a variety of distance metrics. Currently “euclidean”, “cosine” and “best_euclidean” are supported.
Through the combination of predicates and leaf rewards, complex reward functions can be created. A good example of the is the
<a href="https://github.com/huggingface/simulate/blob/main/examples/rl/sb3_move_boxes.py" rel="nofollow">Move Boxes</a> example.</p>
<p>The following “leaf” rewards are available in Simulate: </p>
<ul><li>“dense”: A reward that is non-zero at every time-step.</li>
<li>“sparse”: A reward that is triggered by the proximity of another object.</li>
<li>“timeout”: A timeout reward that is triggered after a certain number of time-steps.</li>
<li>“see”: Triggered when an object is in the field of view of an Actor.</li>
<li>“angle_to”: Triggered when the angle between two objects and a certain direction is less that a threshold.</li></ul>
<p>The “leaf” reward functions can be combined in a tree structure with the following predicate functions: </p>
<ul><li>“not”: Triggers when a reward is not triggered.</li>
<li>“and”: Triggers when both children of this node are returning a positive reward.</li>
<li>“or”: Triggers when one or both of the children of this node are returning a positive reward.</li>
<li>“xor”: Triggers when only one of the children of this node are returning a positive reward.</li></ul>
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<div><span class="group flex space-x-1.5 items-center text-gray-800 bg-gradient-to-r rounded-tr-lg -mt-4 -ml-4 pt-3 px-2.5" id="simulate.RewardFunction"><!-- HTML_TAG_START --><h3 class="!m-0"><span class="flex-1 break-all md:text-lg bg-gradient-to-r px-2.5 py-1.5 rounded-xl from-indigo-50/70 to-white dark:from-gray-900 dark:to-gray-950 dark:text-indigo-300 text-indigo-700"><svg class="mr-1.5 text-indigo-500 inline-block -mt-0.5" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width=".8em" height=".8em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 24 24"><path class="uim-quaternary" d="M20.23 7.24L12 12L3.77 7.24a1.98 1.98 0 0 1 .7-.71L11 2.76c.62-.35 1.38-.35 2 0l6.53 3.77c.29.173.531.418.7.71z" opacity=".25" fill="currentColor"></path><path class="uim-tertiary" d="M12 12v9.5a2.09 2.09 0 0 1-.91-.21L4.5 17.48a2.003 2.003 0 0 1-1-1.73v-7.5a2.06 2.06 0 0 1 .27-1.01L12 12z" opacity=".5" fill="currentColor"></path><path class="uim-primary" d="M20.5 8.25v7.5a2.003 2.003 0 0 1-1 1.73l-6.62 3.82c-.275.13-.576.198-.88.2V12l8.23-4.76c.175.308.268.656.27 1.01z" fill="currentColor"></path></svg><span class="font-light">class</span> <span class="font-medium">simulate.</span><span class="font-semibold">RewardFunction</span></span></h3><!-- HTML_TAG_END -->
<a id="simulate.RewardFunction" class="header-link invisible with-hover:group-hover:visible pr-2" href="#simulate.RewardFunction"><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></a>
<a class="!ml-auto !text-gray-400 !no-underline text-sm flex items-center" href="https://github.com/huggingface/simulate/blob/v0.0.1/src/simulate/assets/reward_functions.py#L16" target="_blank"><span>&lt;</span>
<span class="hidden md:block mx-0.5 hover:!underline">source</span>
<span>&gt;</span></a></span>
<p class="font-mono text-xs md:text-sm !leading-relaxed !my-6"><span>(</span>
<span class="comma cursor-default"><span class="rounded hover:bg-black hover:text-white dark:hover:bg-white dark:hover:text-black">type<span class="opacity-60">: typing.Optional[str] = None</span></span>
</span><span class="comma cursor-default"><span class="rounded hover:bg-black hover:text-white dark:hover:bg-white dark:hover:text-black">entity_a<span class="opacity-60">: typing.Optional[typing.Any] = None</span></span>
</span><span class="comma cursor-default"><span class="rounded hover:bg-black hover:text-white dark:hover:bg-white dark:hover:text-black">entity_b<span class="opacity-60">: typing.Optional[typing.Any] = None</span></span>
</span><span class="comma cursor-default"><span class="rounded hover:bg-black hover:text-white dark:hover:bg-white dark:hover:text-black">distance_metric<span class="opacity-60">: typing.Optional[str] = None</span></span>
</span><span class="comma cursor-default"><span class="rounded hover:bg-black hover:text-white dark:hover:bg-white dark:hover:text-black">direction<span class="opacity-60">: typing.Optional[typing.List[float]] = None</span></span>
</span><span class="comma cursor-default"><span class="rounded hover:bg-black hover:text-white dark:hover:bg-white dark:hover:text-black">scalar<span class="opacity-60">: float = 1.0</span></span>
</span><span class="comma cursor-default"><span class="rounded hover:bg-black hover:text-white dark:hover:bg-white dark:hover:text-black">threshold<span class="opacity-60">: float = 1.0</span></span>
</span><span class="comma cursor-default"><span class="rounded hover:bg-black hover:text-white dark:hover:bg-white dark:hover:text-black">is_terminal<span class="opacity-60">: bool = False</span></span>
</span><span class="comma cursor-default"><span class="rounded hover:bg-black hover:text-white dark:hover:bg-white dark:hover:text-black">is_collectable<span class="opacity-60">: bool = False</span></span>
</span><span class="comma cursor-default"><span class="rounded hover:bg-black hover:text-white dark:hover:bg-white dark:hover:text-black">trigger_once<span class="opacity-60">: bool = True</span></span>
</span><span class="comma cursor-default"><span class="rounded hover:bg-black hover:text-white dark:hover:bg-white dark:hover:text-black">reward_function_a<span class="opacity-60">: dataclasses.InitVar[typing.Optional[ForwardRef(&#39;RewardFunction&#39;)]] = None</span></span>
</span><span class="comma cursor-default"><span class="rounded hover:bg-black hover:text-white dark:hover:bg-white dark:hover:text-black">reward_function_b<span class="opacity-60">: dataclasses.InitVar[typing.Optional[ForwardRef(&#39;RewardFunction&#39;)]] = None</span></span>
</span><span class="comma cursor-default"><span class="rounded hover:bg-black hover:text-white dark:hover:bg-white dark:hover:text-black">name<span class="opacity-60">: dataclasses.InitVar[typing.Optional[str]] = None</span></span>
</span><span class="comma cursor-default"><span class="rounded hover:bg-black hover:text-white dark:hover:bg-white dark:hover:text-black">position<span class="opacity-60">: dataclasses.InitVar[typing.Optional[typing.List[float]]] = &lt;property object at 0x7fca120c6450&gt;</span></span>
</span><span class="comma cursor-default"><span class="rounded hover:bg-black hover:text-white dark:hover:bg-white dark:hover:text-black">rotation<span class="opacity-60">: dataclasses.InitVar[typing.Optional[typing.List[float]]] = &lt;property object at 0x7fca120c62c0&gt;</span></span>
</span><span class="comma cursor-default"><span class="rounded hover:bg-black hover:text-white dark:hover:bg-white dark:hover:text-black">scaling<span class="opacity-60">: dataclasses.InitVar[typing.Union[float, typing.List[float], NoneType]] = &lt;property object at 0x7fca120c6310&gt;</span></span>
</span><span class="comma cursor-default"><span class="rounded hover:bg-black hover:text-white dark:hover:bg-white dark:hover:text-black">transformation_matrix<span class="opacity-60">: dataclasses.InitVar[typing.Optional[typing.List[float]]] = &lt;property object at 0x7fca120c6360&gt;</span></span>
</span><span class="comma cursor-default"><span class="rounded hover:bg-black hover:text-white dark:hover:bg-white dark:hover:text-black">parent<span class="opacity-60">: dataclasses.InitVar[typing.Optional[typing.Any]] = None</span></span>
</span><span class="comma cursor-default"><span class="rounded hover:bg-black hover:text-white dark:hover:bg-white dark:hover:text-black">children<span class="opacity-60">: dataclasses.InitVar[typing.Optional[typing.List[typing.Any]]] = None</span></span>
</span><span class="comma cursor-default"><span class="rounded hover:bg-black hover:text-white dark:hover:bg-white dark:hover:text-black">created_from_file<span class="opacity-60">: dataclasses.InitVar[typing.Optional[str]] = None</span></span>
</span>
<span>)</span>
</p>
<div class="!mb-10 relative docstring-details ">
</div></div>
<p>An RL reward function</p></div>
<script type="module" data-hydrate="441ssd">
import { start } from "/docs/simulate/v0.0.1/en/_app/start-hf-doc-builder.js";
start({
target: document.querySelector('[data-hydrate="441ssd"]').parentNode,
paths: {"base":"/docs/simulate/v0.0.1/en","assets":"/docs/simulate/v0.0.1/en"},
session: {},
route: false,
spa: false,
trailing_slash: "never",
hydrate: {
status: 200,
error: null,
nodes: [
import("/docs/simulate/v0.0.1/en/_app/pages/__layout.svelte-hf-doc-builder.js"),
import("/docs/simulate/v0.0.1/en/_app/pages/api/reward_functions.mdx-hf-doc-builder.js")
],
params: {}
}
});
</script>

Xet Storage Details

Size:
12.3 kB
·
Xet hash:
2b62f21d31fc04d461e883514374a87962a632fb85d6123a2e7ee84d26da5d12

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.