--- language: - en license: apache-2.0 tags: - sentence-transformers - sentence-similarity - feature-extraction - dense - generated_from_trainer - dataset_size:180 - loss:MatryoshkaLoss - loss:MultipleNegativesRankingLoss base_model: shubharuidas/codebert-embed-base-dense-retriever widget: - source_sentence: Explain the __init__ logic sentences: - "async def test_handler_with_async_execution() -> None:\n \"\"\"Test handler\ \ works correctly with async tool execution.\"\"\"\n\n @tool\n def async_add(a:\ \ int, b: int) -> int:\n \"\"\"Async add two numbers.\"\"\"\n return\ \ a + b\n\n def modifying_handler(\n request: ToolCallRequest,\n \ \ execute: Callable[[ToolCallRequest], ToolMessage | Command],\n ) -> ToolMessage\ \ | Command:\n \"\"\"Handler that modifies arguments.\"\"\"\n #\ \ Add 10 to both arguments using override method\n modified_call = {\n\ \ **request.tool_call,\n \"args\": {\n **request.tool_call[\"\ args\"],\n \"a\": request.tool_call[\"args\"][\"a\"] + 10,\n \ \ \"b\": request.tool_call[\"args\"][\"b\"] + 10,\n },\n\ \ }\n modified_request = request.override(tool_call=modified_call)\n\ \ return execute(modified_request)\n\n tool_node = ToolNode([async_add],\ \ wrap_tool_call=modifying_handler)\n\n result = await tool_node.ainvoke(\n\ \ {\n \"messages\": [\n AIMessage(\n \ \ \"adding\",\n tool_calls=[\n \ \ {\n \"name\": \"async_add\",\n \ \ \"args\": {\"a\": 1, \"b\": 2},\n \ \ \"id\": \"call_13\",\n }\n ],\n \ \ )\n ]\n },\n config=_create_config_with_runtime(),\n\ \ )\n\n tool_message = result[\"messages\"][-1]\n assert isinstance(tool_message,\ \ ToolMessage)\n # Original: 1 + 2 = 3, with modifications: 11 + 12 = 23\n\ \ assert tool_message.content == \"23\"" - "def __init__(self) -> None:\n self.loads: set[str] = set()\n self.stores:\ \ set[str] = set()" - "class InternalServerError(APIStatusError):\n pass" - source_sentence: Explain the async _load_checkpoint_tuple logic sentences: - 'def task(__func_or_none__: Callable[P, Awaitable[T]]) -> _TaskFunction[P, T]: ...' - "class State(BaseModel):\n query: str\n inner: InnerObject\n \ \ answer: str | None = None\n docs: Annotated[list[str], sorted_add]" - "async def _load_checkpoint_tuple(self, value: DictRow) -> CheckpointTuple:\n\ \ \"\"\"\n Convert a database row into a CheckpointTuple object.\n\ \n Args:\n value: A row from the database containing checkpoint\ \ data.\n\n Returns:\n CheckpointTuple: A structured representation\ \ of the checkpoint,\n including its configuration, metadata, parent\ \ checkpoint (if any),\n and pending writes.\n \"\"\"\n \ \ return CheckpointTuple(\n {\n \"configurable\"\ : {\n \"thread_id\": value[\"thread_id\"],\n \ \ \"checkpoint_ns\": value[\"checkpoint_ns\"],\n \"checkpoint_id\"\ : value[\"checkpoint_id\"],\n }\n },\n {\n\ \ **value[\"checkpoint\"],\n \"channel_values\"\ : {\n **(value[\"checkpoint\"].get(\"channel_values\") or {}),\n\ \ **self._load_blobs(value[\"channel_values\"]),\n \ \ },\n },\n value[\"metadata\"],\n (\n\ \ {\n \"configurable\": {\n \ \ \"thread_id\": value[\"thread_id\"],\n \"checkpoint_ns\"\ : value[\"checkpoint_ns\"],\n \"checkpoint_id\": value[\"\ parent_checkpoint_id\"],\n }\n }\n \ \ if value[\"parent_checkpoint_id\"]\n else None\n \ \ ),\n await asyncio.to_thread(self._load_writes, value[\"pending_writes\"\ ]),\n )" - source_sentence: Explain the flattened_runs logic sentences: - "class ChannelWrite(RunnableCallable):\n \"\"\"Implements the logic for sending\ \ writes to CONFIG_KEY_SEND.\n Can be used as a runnable or as a static method\ \ to call imperatively.\"\"\"\n\n writes: list[ChannelWriteEntry | ChannelWriteTupleEntry\ \ | Send]\n \"\"\"Sequence of write entries or Send objects to write.\"\"\"\ \n\n def __init__(\n self,\n writes: Sequence[ChannelWriteEntry\ \ | ChannelWriteTupleEntry | Send],\n *,\n tags: Sequence[str] |\ \ None = None,\n ):\n super().__init__(\n func=self._write,\n\ \ afunc=self._awrite,\n name=None,\n tags=tags,\n\ \ trace=False,\n )\n self.writes = cast(\n \ \ list[ChannelWriteEntry | ChannelWriteTupleEntry | Send], writes\n )\n\ \n def get_name(self, suffix: str | None = None, *, name: str | None = None)\ \ -> str:\n if not name:\n name = f\"ChannelWrite<{','.join(w.channel\ \ if isinstance(w, ChannelWriteEntry) else '...' if isinstance(w, ChannelWriteTupleEntry)\ \ else w.node for w in self.writes)}>\"\n return super().get_name(suffix,\ \ name=name)\n\n def _write(self, input: Any, config: RunnableConfig) -> None:\n\ \ writes = [\n ChannelWriteEntry(write.channel, input, write.skip_none,\ \ write.mapper)\n if isinstance(write, ChannelWriteEntry) and write.value\ \ is PASSTHROUGH\n else ChannelWriteTupleEntry(write.mapper, input)\n\ \ if isinstance(write, ChannelWriteTupleEntry) and write.value is PASSTHROUGH\n\ \ else write\n for write in self.writes\n ]\n \ \ self.do_write(\n config,\n writes,\n )\n \ \ return input\n\n async def _awrite(self, input: Any, config: RunnableConfig)\ \ -> None:\n writes = [\n ChannelWriteEntry(write.channel, input,\ \ write.skip_none, write.mapper)\n if isinstance(write, ChannelWriteEntry)\ \ and write.value is PASSTHROUGH\n else ChannelWriteTupleEntry(write.mapper,\ \ input)\n if isinstance(write, ChannelWriteTupleEntry) and write.value\ \ is PASSTHROUGH\n else write\n for write in self.writes\n\ \ ]\n self.do_write(\n config,\n writes,\n\ \ )\n return input\n\n @staticmethod\n def do_write(\n \ \ config: RunnableConfig,\n writes: Sequence[ChannelWriteEntry | ChannelWriteTupleEntry\ \ | Send],\n allow_passthrough: bool = True,\n ) -> None:\n #\ \ validate\n for w in writes:\n if isinstance(w, ChannelWriteEntry):\n\ \ if w.channel == TASKS:\n raise InvalidUpdateError(\n\ \ \"Cannot write to the reserved channel TASKS\"\n \ \ )\n if w.value is PASSTHROUGH and not allow_passthrough:\n\ \ raise InvalidUpdateError(\"PASSTHROUGH value must be replaced\"\ )\n if isinstance(w, ChannelWriteTupleEntry):\n if w.value\ \ is PASSTHROUGH and not allow_passthrough:\n raise InvalidUpdateError(\"\ PASSTHROUGH value must be replaced\")\n # if we want to persist writes\ \ found before hitting a ParentCommand\n # can move this to a finally block\n\ \ write: TYPE_SEND = config[CONF][CONFIG_KEY_SEND]\n write(_assemble_writes(writes))\n\ \n @staticmethod\n def is_writer(runnable: Runnable) -> bool:\n \"\ \"\"Used by PregelNode to distinguish between writers and other runnables.\"\"\ \"\n return (\n isinstance(runnable, ChannelWrite)\n \ \ or getattr(runnable, \"_is_channel_writer\", MISSING) is not MISSING\n \ \ )\n\n @staticmethod\n def get_static_writes(\n runnable:\ \ Runnable,\n ) -> Sequence[tuple[str, Any, str | None]] | None:\n \"\ \"\"Used to get conditional writes a writer declares for static analysis.\"\"\"\ \n if isinstance(runnable, ChannelWrite):\n return [\n \ \ w\n for entry in runnable.writes\n if\ \ isinstance(entry, ChannelWriteTupleEntry) and entry.static\n \ \ for w in entry.static\n ] or None\n elif writes := getattr(runnable,\ \ \"_is_channel_writer\", MISSING):\n if writes is not MISSING:\n \ \ writes = cast(\n Sequence[tuple[ChannelWriteEntry\ \ | Send, str | None]],\n writes,\n )\n \ \ entries = [e for e, _ in writes]\n labels = [la for\ \ _, la in writes]\n return [(*t, la) for t, la in zip(_assemble_writes(entries),\ \ labels)]\n\n @staticmethod\n def register_writer(\n runnable: R,\n\ \ static: Sequence[tuple[ChannelWriteEntry | Send, str | None]] | None\ \ = None,\n ) -> R:\n \"\"\"Used to mark a runnable as a writer, so\ \ that it can be detected by is_writer.\n Instances of ChannelWrite are\ \ automatically marked as writers.\n Optionally, a list of declared writes\ \ can be passed for static analysis.\"\"\"\n # using object.__setattr__\ \ to work around objects that override __setattr__\n # eg. pydantic models\ \ and dataclasses\n object.__setattr__(runnable, \"_is_channel_writer\"\ , static)\n return runnable" - "def test_double_interrupt_subgraph(sync_checkpointer: BaseCheckpointSaver) ->\ \ None:\n class AgentState(TypedDict):\n input: str\n\n def node_1(state:\ \ AgentState):\n result = interrupt(\"interrupt node 1\")\n return\ \ {\"input\": result}\n\n def node_2(state: AgentState):\n result =\ \ interrupt(\"interrupt node 2\")\n return {\"input\": result}\n\n subgraph_builder\ \ = (\n StateGraph(AgentState)\n .add_node(\"node_1\", node_1)\n\ \ .add_node(\"node_2\", node_2)\n .add_edge(START, \"node_1\")\n\ \ .add_edge(\"node_1\", \"node_2\")\n .add_edge(\"node_2\", END)\n\ \ )\n\n # invoke the sub graph\n subgraph = subgraph_builder.compile(checkpointer=sync_checkpointer)\n\ \ thread = {\"configurable\": {\"thread_id\": str(uuid.uuid4())}}\n assert\ \ [c for c in subgraph.stream({\"input\": \"test\"}, thread)] == [\n {\n\ \ \"__interrupt__\": (\n Interrupt(\n \ \ value=\"interrupt node 1\",\n id=AnyStr(),\n \ \ ),\n )\n },\n ]\n # resume from the first interrupt\n\ \ assert [c for c in subgraph.stream(Command(resume=\"123\"), thread)] == [\n\ \ {\n \"node_1\": {\"input\": \"123\"},\n },\n \ \ {\n \"__interrupt__\": (\n Interrupt(\n \ \ value=\"interrupt node 2\",\n id=AnyStr(),\n \ \ ),\n )\n },\n ]\n # resume from the second\ \ interrupt\n assert [c for c in subgraph.stream(Command(resume=\"123\"), thread)]\ \ == [\n {\n \"node_2\": {\"input\": \"123\"},\n },\n\ \ ]\n\n subgraph = subgraph_builder.compile()\n\n def invoke_sub_agent(state:\ \ AgentState):\n return subgraph.invoke(state)\n\n thread = {\"configurable\"\ : {\"thread_id\": str(uuid.uuid4())}}\n parent_agent = (\n StateGraph(AgentState)\n\ \ .add_node(\"invoke_sub_agent\", invoke_sub_agent)\n .add_edge(START,\ \ \"invoke_sub_agent\")\n .add_edge(\"invoke_sub_agent\", END)\n \ \ .compile(checkpointer=sync_checkpointer)\n )\n\n assert [c for c in parent_agent.stream({\"\ input\": \"test\"}, thread)] == [\n {\n \"__interrupt__\": (\n\ \ Interrupt(\n value=\"interrupt node 1\",\n\ \ id=AnyStr(),\n ),\n )\n \ \ },\n ]\n\n # resume from the first interrupt\n assert [c for c in parent_agent.stream(Command(resume=True),\ \ thread)] == [\n {\n \"__interrupt__\": (\n \ \ Interrupt(\n value=\"interrupt node 2\",\n \ \ id=AnyStr(),\n ),\n )\n }\n ]\n\n \ \ # resume from 2nd interrupt\n assert [c for c in parent_agent.stream(Command(resume=True),\ \ thread)] == [\n {\n \"invoke_sub_agent\": {\"input\": True},\n\ \ },\n ]" - "def flattened_runs(self) -> list[Run]:\n q = [] + self.runs\n result\ \ = []\n while q:\n parent = q.pop()\n result.append(parent)\n\ \ if parent.child_runs:\n q.extend(parent.child_runs)\n\ \ return result" - source_sentence: Explain the SubGraphState logic sentences: - "class Cron(TypedDict):\n \"\"\"Represents a scheduled task.\"\"\"\n\n cron_id:\ \ str\n \"\"\"The ID of the cron.\"\"\"\n assistant_id: str\n \"\"\"\ The ID of the assistant.\"\"\"\n thread_id: str | None\n \"\"\"The ID of\ \ the thread.\"\"\"\n on_run_completed: OnCompletionBehavior | None\n \"\ \"\"What to do with the thread after the run completes. Only applicable for stateless\ \ crons.\"\"\"\n end_time: datetime | None\n \"\"\"The end date to stop\ \ running the cron.\"\"\"\n schedule: str\n \"\"\"The schedule to run, cron\ \ format.\"\"\"\n created_at: datetime\n \"\"\"The time the cron was created.\"\ \"\"\n updated_at: datetime\n \"\"\"The last time the cron was updated.\"\ \"\"\n payload: dict\n \"\"\"The run payload to use for creating new run.\"\ \"\"\n user_id: str | None\n \"\"\"The user ID of the cron.\"\"\"\n next_run_date:\ \ datetime | None\n \"\"\"The next run date of the cron.\"\"\"\n metadata:\ \ dict\n \"\"\"The metadata of the cron.\"\"\"" - "class SubGraphState(MessagesState):\n city: str" - "def task_path_str(tup: str | int | tuple) -> str:\n \"\"\"Generate a string\ \ representation of the task path.\"\"\"\n return (\n f\"~{', '.join(task_path_str(x)\ \ for x in tup)}\"\n if isinstance(tup, (tuple, list))\n else f\"\ {tup:010d}\"\n if isinstance(tup, int)\n else str(tup)\n )" - source_sentence: Best practices for test_list_namespaces_operations sentences: - "def test_doubly_nested_graph_state(\n sync_checkpointer: BaseCheckpointSaver,\n\ ) -> None:\n class State(TypedDict):\n my_key: str\n\n class ChildState(TypedDict):\n\ \ my_key: str\n\n class GrandChildState(TypedDict):\n my_key:\ \ str\n\n def grandchild_1(state: ChildState):\n return {\"my_key\"\ : state[\"my_key\"] + \" here\"}\n\n def grandchild_2(state: ChildState):\n\ \ return {\n \"my_key\": state[\"my_key\"] + \" and there\"\ ,\n }\n\n grandchild = StateGraph(GrandChildState)\n grandchild.add_node(\"\ grandchild_1\", grandchild_1)\n grandchild.add_node(\"grandchild_2\", grandchild_2)\n\ \ grandchild.add_edge(\"grandchild_1\", \"grandchild_2\")\n grandchild.set_entry_point(\"\ grandchild_1\")\n grandchild.set_finish_point(\"grandchild_2\")\n\n child\ \ = StateGraph(ChildState)\n child.add_node(\n \"child_1\",\n \ \ grandchild.compile(interrupt_before=[\"grandchild_2\"]),\n )\n child.set_entry_point(\"\ child_1\")\n child.set_finish_point(\"child_1\")\n\n def parent_1(state:\ \ State):\n return {\"my_key\": \"hi \" + state[\"my_key\"]}\n\n def\ \ parent_2(state: State):\n return {\"my_key\": state[\"my_key\"] + \"\ \ and back again\"}\n\n graph = StateGraph(State)\n graph.add_node(\"parent_1\"\ , parent_1)\n graph.add_node(\"child\", child.compile())\n graph.add_node(\"\ parent_2\", parent_2)\n graph.set_entry_point(\"parent_1\")\n graph.add_edge(\"\ parent_1\", \"child\")\n graph.add_edge(\"child\", \"parent_2\")\n graph.set_finish_point(\"\ parent_2\")\n\n app = graph.compile(checkpointer=sync_checkpointer)\n\n \ \ # test invoke w/ nested interrupt\n config = {\"configurable\": {\"thread_id\"\ : \"1\"}}\n assert [\n c\n for c in app.stream(\n \ \ {\"my_key\": \"my value\"}, config, subgraphs=True, durability=\"exit\"\n \ \ )\n ] == [\n ((), {\"parent_1\": {\"my_key\": \"hi my value\"\ }}),\n (\n (AnyStr(\"child:\"), AnyStr(\"child_1:\")),\n \ \ {\"grandchild_1\": {\"my_key\": \"hi my value here\"}},\n ),\n\ \ ((), {\"__interrupt__\": ()}),\n ]\n # get state without subgraphs\n\ \ outer_state = app.get_state(config)\n assert outer_state == StateSnapshot(\n\ \ values={\"my_key\": \"hi my value\"},\n tasks=(\n PregelTask(\n\ \ AnyStr(),\n \"child\",\n (PULL,\ \ \"child\"),\n state={\n \"configurable\":\ \ {\n \"thread_id\": \"1\",\n \"\ checkpoint_ns\": AnyStr(\"child\"),\n }\n },\n\ \ ),\n ),\n next=(\"child\",),\n config={\n \ \ \"configurable\": {\n \"thread_id\": \"1\",\n \ \ \"checkpoint_ns\": \"\",\n \"checkpoint_id\": AnyStr(),\n\ \ }\n },\n metadata={\n \"parents\": {},\n\ \ \"source\": \"loop\",\n \"step\": 1,\n },\n \ \ created_at=AnyStr(),\n parent_config=None,\n interrupts=(),\n\ \ )\n child_state = app.get_state(outer_state.tasks[0].state)\n assert\ \ child_state == StateSnapshot(\n values={\"my_key\": \"hi my value\"},\n\ \ tasks=(\n PregelTask(\n AnyStr(),\n \ \ \"child_1\",\n (PULL, \"child_1\"),\n \ \ state={\n \"configurable\": {\n \"\ thread_id\": \"1\",\n \"checkpoint_ns\": AnyStr(),\n \ \ }\n },\n ),\n ),\n \ \ next=(\"child_1\",),\n config={\n \"configurable\": {\n \ \ \"thread_id\": \"1\",\n \"checkpoint_ns\": AnyStr(\"\ child:\"),\n \"checkpoint_id\": AnyStr(),\n \"checkpoint_map\"\ : AnyDict(\n {\n \"\": AnyStr(),\n \ \ AnyStr(\"child:\"): AnyStr(),\n }\n\ \ ),\n }\n },\n metadata={\n \ \ \"parents\": {\"\": AnyStr()},\n \"source\": \"loop\",\n \ \ \"step\": 0,\n },\n created_at=AnyStr(),\n parent_config=None,\n\ \ interrupts=(),\n )\n grandchild_state = app.get_state(child_state.tasks[0].state)\n\ \ assert grandchild_state == StateSnapshot(\n values={\"my_key\": \"\ hi my value here\"},\n tasks=(\n PregelTask(\n \ \ AnyStr(),\n \"grandchild_2\",\n (PULL, \"grandchild_2\"\ ),\n ),\n ),\n next=(\"grandchild_2\",),\n config={\n\ \ \"configurable\": {\n \"thread_id\": \"1\",\n \ \ \"checkpoint_ns\": AnyStr(),\n \"checkpoint_id\":\ \ AnyStr(),\n \"checkpoint_map\": AnyDict(\n \ \ {\n \"\": AnyStr(),\n AnyStr(\"\ child:\"): AnyStr(),\n AnyStr(re.compile(r\"child:.+|child1:\"\ )): AnyStr(),\n }\n ),\n }\n \ \ },\n metadata={\n \"parents\": AnyDict(\n \ \ {\n \"\": AnyStr(),\n AnyStr(\"child:\"\ ): AnyStr(),\n }\n ),\n \"source\": \"loop\"\ ,\n \"step\": 1,\n },\n created_at=AnyStr(),\n \ \ parent_config=None,\n interrupts=(),\n )\n # get state with subgraphs\n\ \ assert app.get_state(config, subgraphs=True) == StateSnapshot(\n values={\"\ my_key\": \"hi my value\"},\n tasks=(\n PregelTask(\n \ \ AnyStr(),\n \"child\",\n (PULL, \"child\"\ ),\n state=StateSnapshot(\n values={\"my_key\"\ : \"hi my value\"},\n tasks=(\n PregelTask(\n\ \ AnyStr(),\n \"child_1\"\ ,\n (PULL, \"child_1\"),\n \ \ state=StateSnapshot(\n values={\"my_key\"\ : \"hi my value here\"},\n tasks=(\n \ \ PregelTask(\n \ \ AnyStr(),\n \"grandchild_2\",\n \ \ (PULL, \"grandchild_2\"),\n \ \ ),\n ),\n \ \ next=(\"grandchild_2\",),\n \ \ config={\n \"configurable\": {\n \ \ \"thread_id\": \"1\",\n \ \ \"checkpoint_ns\": AnyStr(),\n \ \ \"checkpoint_id\": AnyStr(),\n \ \ \"checkpoint_map\": AnyDict(\n \ \ {\n \"\": AnyStr(),\n \ \ AnyStr(\"child:\"): AnyStr(),\n\ \ AnyStr(\n \ \ re.compile(r\"child:.+|child1:\")\n \ \ ): AnyStr(),\n \ \ }\n ),\n \ \ }\n },\n \ \ metadata={\n \"parents\"\ : AnyDict(\n {\n \ \ \"\": AnyStr(),\n \ \ AnyStr(\"child:\"): AnyStr(),\n \ \ }\n ),\n \ \ \"source\": \"loop\",\n \"step\": 1,\n\ \ },\n created_at=AnyStr(),\n\ \ parent_config=None,\n \ \ interrupts=(),\n ),\n \ \ ),\n ),\n next=(\"child_1\",),\n \ \ config={\n \"configurable\": {\n \ \ \"thread_id\": \"1\",\n \ \ \"checkpoint_ns\": AnyStr(\"child:\"),\n \"checkpoint_id\"\ : AnyStr(),\n \"checkpoint_map\": AnyDict(\n \ \ {\"\": AnyStr(), AnyStr(\"child:\"): AnyStr()}\n \ \ ),\n }\n \ \ },\n metadata={\n \"parents\": {\"\ \": AnyStr()},\n \"source\": \"loop\",\n \ \ \"step\": 0,\n },\n created_at=AnyStr(),\n\ \ parent_config=None,\n interrupts=(),\n\ \ ),\n ),\n ),\n next=(\"child\",),\n\ \ config={\n \"configurable\": {\n \"thread_id\"\ : \"1\",\n \"checkpoint_ns\": \"\",\n \"checkpoint_id\"\ : AnyStr(),\n }\n },\n metadata={\n \"parents\"\ : {},\n \"source\": \"loop\",\n \"step\": 1,\n },\n\ \ created_at=AnyStr(),\n parent_config=None,\n interrupts=(),\n\ \ )\n # # resume\n assert [c for c in app.stream(None, config, subgraphs=True,\ \ durability=\"exit\")] == [\n (\n (AnyStr(\"child:\"), AnyStr(\"\ child_1:\")),\n {\"grandchild_2\": {\"my_key\": \"hi my value here\ \ and there\"}},\n ),\n ((AnyStr(\"child:\"),), {\"child_1\": {\"\ my_key\": \"hi my value here and there\"}}),\n ((), {\"child\": {\"my_key\"\ : \"hi my value here and there\"}}),\n ((), {\"parent_2\": {\"my_key\"\ : \"hi my value here and there and back again\"}}),\n ]\n # get state with\ \ and without subgraphs\n assert (\n app.get_state(config)\n \ \ == app.get_state(config, subgraphs=True)\n == StateSnapshot(\n \ \ values={\"my_key\": \"hi my value here and there and back again\"},\n \ \ tasks=(),\n next=(),\n config={\n \ \ \"configurable\": {\n \"thread_id\": \"1\",\n \ \ \"checkpoint_ns\": \"\",\n \"checkpoint_id\"\ : AnyStr(),\n }\n },\n metadata={\n \ \ \"parents\": {},\n \"source\": \"loop\",\n \ \ \"step\": 3,\n },\n created_at=AnyStr(),\n \ \ parent_config=(\n {\n \"configurable\"\ : {\n \"thread_id\": \"1\",\n \"\ checkpoint_ns\": \"\",\n \"checkpoint_id\": AnyStr(),\n\ \ }\n }\n ),\n interrupts=(),\n\ \ )\n )\n\n # get outer graph history\n outer_history = list(app.get_state_history(config))\n\ \ assert outer_history == [\n StateSnapshot(\n values={\"\ my_key\": \"hi my value here and there and back again\"},\n tasks=(),\n\ \ next=(),\n config={\n \"configurable\"\ : {\n \"thread_id\": \"1\",\n \"checkpoint_ns\"\ : \"\",\n \"checkpoint_id\": AnyStr(),\n }\n\ \ },\n metadata={\n \"parents\": {},\n \ \ \"source\": \"loop\",\n \"step\": 3,\n \ \ },\n created_at=AnyStr(),\n parent_config={\n \ \ \"configurable\": {\n \"thread_id\": \"1\",\n \ \ \"checkpoint_ns\": \"\",\n \"checkpoint_id\"\ : AnyStr(),\n }\n },\n interrupts=(),\n \ \ ),\n StateSnapshot(\n values={\"my_key\": \"hi my value\"\ },\n tasks=(\n PregelTask(\n AnyStr(),\n\ \ \"child\",\n (PULL, \"child\"),\n \ \ state={\n \"configurable\": {\n \ \ \"thread_id\": \"1\",\n \"checkpoint_ns\"\ : AnyStr(\"child\"),\n }\n },\n \ \ result=None,\n ),\n ),\n \ \ next=(\"child\",),\n config={\n \"configurable\"\ : {\n \"thread_id\": \"1\",\n \"checkpoint_ns\"\ : \"\",\n \"checkpoint_id\": AnyStr(),\n }\n\ \ },\n metadata={\n \"parents\": {},\n \ \ \"source\": \"loop\",\n \"step\": 1,\n \ \ },\n created_at=AnyStr(),\n parent_config=None,\n \ \ interrupts=(),\n ),\n ]\n # get child graph history\n\ \ child_history = list(app.get_state_history(outer_history[1].tasks[0].state))\n\ \ assert child_history == [\n StateSnapshot(\n values={\"\ my_key\": \"hi my value\"},\n next=(\"child_1\",),\n config={\n\ \ \"configurable\": {\n \"thread_id\": \"1\"\ ,\n \"checkpoint_ns\": AnyStr(\"child:\"),\n \ \ \"checkpoint_id\": AnyStr(),\n \"checkpoint_map\": AnyDict(\n\ \ {\"\": AnyStr(), AnyStr(\"child:\"): AnyStr()}\n \ \ ),\n }\n },\n metadata={\n\ \ \"source\": \"loop\",\n \"step\": 0,\n \ \ \"parents\": {\"\": AnyStr()},\n },\n created_at=AnyStr(),\n\ \ parent_config=None,\n tasks=(\n PregelTask(\n\ \ id=AnyStr(),\n name=\"child_1\",\n \ \ path=(PULL, \"child_1\"),\n state={\n \ \ \"configurable\": {\n \"thread_id\"\ : \"1\",\n \"checkpoint_ns\": AnyStr(\"child:\"),\n\ \ }\n },\n result=None,\n\ \ ),\n ),\n interrupts=(),\n ),\n\ \ ]\n # get grandchild graph history\n grandchild_history = list(app.get_state_history(child_history[0].tasks[0].state))\n\ \ assert grandchild_history == [\n StateSnapshot(\n values={\"\ my_key\": \"hi my value here\"},\n next=(\"grandchild_2\",),\n \ \ config={\n \"configurable\": {\n \"\ thread_id\": \"1\",\n \"checkpoint_ns\": AnyStr(),\n \ \ \"checkpoint_id\": AnyStr(),\n \"checkpoint_map\"\ : AnyDict(\n {\n \"\": AnyStr(),\n\ \ AnyStr(\"child:\"): AnyStr(),\n \ \ AnyStr(re.compile(r\"child:.+|child1:\")): AnyStr(),\n \ \ }\n ),\n }\n },\n \ \ metadata={\n \"source\": \"loop\",\n \ \ \"step\": 1,\n \"parents\": AnyDict(\n {\n\ \ \"\": AnyStr(),\n AnyStr(\"child:\"\ ): AnyStr(),\n }\n ),\n },\n \ \ created_at=AnyStr(),\n parent_config=None,\n tasks=(\n\ \ PregelTask(\n id=AnyStr(),\n \ \ name=\"grandchild_2\",\n path=(PULL, \"grandchild_2\"\ ),\n result=None,\n ),\n ),\n \ \ interrupts=(),\n ),\n ]" - "def _msgpack_enc(data: Any) -> bytes:\n return ormsgpack.packb(data, default=_msgpack_default,\ \ option=_option)" - "def test_list_namespaces_operations(\n fake_embeddings: CharacterEmbeddings,\n\ ) -> None:\n \"\"\"Test list namespaces functionality with various filters.\"\ \"\"\n with create_vector_store(\n fake_embeddings, text_fields=[\"\ key0\", \"key1\", \"key3\"]\n ) as store:\n test_pref = str(uuid.uuid4())\n\ \ test_namespaces = [\n (test_pref, \"test\", \"documents\"\ , \"public\", test_pref),\n (test_pref, \"test\", \"documents\", \"\ private\", test_pref),\n (test_pref, \"test\", \"images\", \"public\"\ , test_pref),\n (test_pref, \"test\", \"images\", \"private\", test_pref),\n\ \ (test_pref, \"prod\", \"documents\", \"public\", test_pref),\n \ \ (test_pref, \"prod\", \"documents\", \"some\", \"nesting\", \"public\"\ , test_pref),\n (test_pref, \"prod\", \"documents\", \"private\", test_pref),\n\ \ ]\n\n # Add test data\n for namespace in test_namespaces:\n\ \ store.put(namespace, \"dummy\", {\"content\": \"dummy\"})\n\n \ \ # Test prefix filtering\n prefix_result = store.list_namespaces(prefix=(test_pref,\ \ \"test\"))\n assert len(prefix_result) == 4\n assert all(ns[1]\ \ == \"test\" for ns in prefix_result)\n\n # Test specific prefix\n \ \ specific_prefix_result = store.list_namespaces(\n prefix=(test_pref,\ \ \"test\", \"documents\")\n )\n assert len(specific_prefix_result)\ \ == 2\n assert all(ns[1:3] == (\"test\", \"documents\") for ns in specific_prefix_result)\n\ \n # Test suffix filtering\n suffix_result = store.list_namespaces(suffix=(\"\ public\", test_pref))\n assert len(suffix_result) == 4\n assert\ \ all(ns[-2] == \"public\" for ns in suffix_result)\n\n # Test combined\ \ prefix and suffix\n prefix_suffix_result = store.list_namespaces(\n \ \ prefix=(test_pref, \"test\"), suffix=(\"public\", test_pref)\n \ \ )\n assert len(prefix_suffix_result) == 2\n assert all(\n\ \ ns[1] == \"test\" and ns[-2] == \"public\" for ns in prefix_suffix_result\n\ \ )\n\n # Test wildcard in prefix\n wildcard_prefix_result\ \ = store.list_namespaces(\n prefix=(test_pref, \"*\", \"documents\"\ )\n )\n assert len(wildcard_prefix_result) == 5\n assert\ \ all(ns[2] == \"documents\" for ns in wildcard_prefix_result)\n\n # Test\ \ wildcard in suffix\n wildcard_suffix_result = store.list_namespaces(\n\ \ suffix=(\"*\", \"public\", test_pref)\n )\n assert\ \ len(wildcard_suffix_result) == 4\n assert all(ns[-2] == \"public\" for\ \ ns in wildcard_suffix_result)\n\n wildcard_single = store.list_namespaces(\n\ \ suffix=(\"some\", \"*\", \"public\", test_pref)\n )\n \ \ assert len(wildcard_single) == 1\n assert wildcard_single[0] == (\n\ \ test_pref,\n \"prod\",\n \"documents\",\n \ \ \"some\",\n \"nesting\",\n \"public\",\n \ \ test_pref,\n )\n\n # Test max depth\n max_depth_result\ \ = store.list_namespaces(max_depth=3)\n assert all(len(ns) <= 3 for ns\ \ in max_depth_result)\n\n max_depth_result = store.list_namespaces(\n\ \ max_depth=4, prefix=(test_pref, \"*\", \"documents\")\n )\n\ \ assert len(set(res for res in max_depth_result)) == len(max_depth_result)\ \ == 5\n\n # Test pagination\n limit_result = store.list_namespaces(prefix=(test_pref,),\ \ limit=3)\n assert len(limit_result) == 3\n\n offset_result = store.list_namespaces(prefix=(test_pref,),\ \ offset=3)\n assert len(offset_result) == len(test_namespaces) - 3\n\n\ \ empty_prefix_result = store.list_namespaces(prefix=(test_pref,))\n \ \ assert len(empty_prefix_result) == len(test_namespaces)\n assert\ \ set(empty_prefix_result) == set(test_namespaces)\n\n # Clean up\n \ \ for namespace in test_namespaces:\n store.delete(namespace, \"\ dummy\")" pipeline_tag: sentence-similarity library_name: sentence-transformers metrics: - cosine_accuracy@1 - cosine_accuracy@3 - cosine_accuracy@5 - cosine_accuracy@10 - cosine_precision@1 - cosine_precision@3 - cosine_precision@5 - cosine_precision@10 - cosine_recall@1 - cosine_recall@3 - cosine_recall@5 - cosine_recall@10 - cosine_ndcg@10 - cosine_mrr@10 - cosine_map@100 model-index: - name: codeBert dense retriever results: - task: type: information-retrieval name: Information Retrieval dataset: name: dim 768 type: dim_768 metrics: - type: cosine_accuracy@1 value: 0.9 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.9 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 1.0 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 1.0 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.9 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.29999999999999993 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.20000000000000004 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.10000000000000002 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.9 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.9 name: Cosine Recall@3 - type: cosine_recall@5 value: 1.0 name: Cosine Recall@5 - type: cosine_recall@10 value: 1.0 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.9408764682653967 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.9225 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.9225 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 512 type: dim_512 metrics: - type: cosine_accuracy@1 value: 0.9 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.9 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 1.0 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 1.0 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.9 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.29999999999999993 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.20000000000000004 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.10000000000000002 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.9 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.9 name: Cosine Recall@3 - type: cosine_recall@5 value: 1.0 name: Cosine Recall@5 - type: cosine_recall@10 value: 1.0 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.9408764682653967 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.9225 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.9225 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 256 type: dim_256 metrics: - type: cosine_accuracy@1 value: 0.9 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.9 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 1.0 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 1.0 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.9 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.29999999999999993 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.20000000000000004 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.10000000000000002 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.9 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.9 name: Cosine Recall@3 - type: cosine_recall@5 value: 1.0 name: Cosine Recall@5 - type: cosine_recall@10 value: 1.0 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.9408764682653967 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.9225 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.9225 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 128 type: dim_128 metrics: - type: cosine_accuracy@1 value: 0.85 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.9 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.95 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.95 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.85 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.29999999999999993 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.19000000000000003 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.09500000000000001 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.85 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.9 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.95 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.95 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.894342640361727 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.8766666666666666 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.8799999999999999 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 64 type: dim_64 metrics: - type: cosine_accuracy@1 value: 0.85 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.9 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.9 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 1.0 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.85 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.29999999999999993 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.18000000000000005 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.10000000000000002 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.85 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.9 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.9 name: Cosine Recall@5 - type: cosine_recall@10 value: 1.0 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.9074399105059531 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.8800595238095237 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.8800595238095237 name: Cosine Map@100 --- # codeBert dense retriever This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [shubharuidas/codebert-embed-base-dense-retriever](https://huggingface.co/shubharuidas/codebert-embed-base-dense-retriever). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [shubharuidas/codebert-embed-base-dense-retriever](https://huggingface.co/shubharuidas/codebert-embed-base-dense-retriever) - **Maximum Sequence Length:** 512 tokens - **Output Dimensionality:** 768 dimensions - **Similarity Function:** Cosine Similarity - **Language:** en - **License:** apache-2.0 ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'RobertaModel'}) (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("anaghaj111/codebert-base-code-embed-mrl-langchain-langgraph") # Run inference sentences = [ 'Best practices for test_list_namespaces_operations', 'def test_list_namespaces_operations(\n fake_embeddings: CharacterEmbeddings,\n) -> None:\n """Test list namespaces functionality with various filters."""\n with create_vector_store(\n fake_embeddings, text_fields=["key0", "key1", "key3"]\n ) as store:\n test_pref = str(uuid.uuid4())\n test_namespaces = [\n (test_pref, "test", "documents", "public", test_pref),\n (test_pref, "test", "documents", "private", test_pref),\n (test_pref, "test", "images", "public", test_pref),\n (test_pref, "test", "images", "private", test_pref),\n (test_pref, "prod", "documents", "public", test_pref),\n (test_pref, "prod", "documents", "some", "nesting", "public", test_pref),\n (test_pref, "prod", "documents", "private", test_pref),\n ]\n\n # Add test data\n for namespace in test_namespaces:\n store.put(namespace, "dummy", {"content": "dummy"})\n\n # Test prefix filtering\n prefix_result = store.list_namespaces(prefix=(test_pref, "test"))\n assert len(prefix_result) == 4\n assert all(ns[1] == "test" for ns in prefix_result)\n\n # Test specific prefix\n specific_prefix_result = store.list_namespaces(\n prefix=(test_pref, "test", "documents")\n )\n assert len(specific_prefix_result) == 2\n assert all(ns[1:3] == ("test", "documents") for ns in specific_prefix_result)\n\n # Test suffix filtering\n suffix_result = store.list_namespaces(suffix=("public", test_pref))\n assert len(suffix_result) == 4\n assert all(ns[-2] == "public" for ns in suffix_result)\n\n # Test combined prefix and suffix\n prefix_suffix_result = store.list_namespaces(\n prefix=(test_pref, "test"), suffix=("public", test_pref)\n )\n assert len(prefix_suffix_result) == 2\n assert all(\n ns[1] == "test" and ns[-2] == "public" for ns in prefix_suffix_result\n )\n\n # Test wildcard in prefix\n wildcard_prefix_result = store.list_namespaces(\n prefix=(test_pref, "*", "documents")\n )\n assert len(wildcard_prefix_result) == 5\n assert all(ns[2] == "documents" for ns in wildcard_prefix_result)\n\n # Test wildcard in suffix\n wildcard_suffix_result = store.list_namespaces(\n suffix=("*", "public", test_pref)\n )\n assert len(wildcard_suffix_result) == 4\n assert all(ns[-2] == "public" for ns in wildcard_suffix_result)\n\n wildcard_single = store.list_namespaces(\n suffix=("some", "*", "public", test_pref)\n )\n assert len(wildcard_single) == 1\n assert wildcard_single[0] == (\n test_pref,\n "prod",\n "documents",\n "some",\n "nesting",\n "public",\n test_pref,\n )\n\n # Test max depth\n max_depth_result = store.list_namespaces(max_depth=3)\n assert all(len(ns) <= 3 for ns in max_depth_result)\n\n max_depth_result = store.list_namespaces(\n max_depth=4, prefix=(test_pref, "*", "documents")\n )\n assert len(set(res for res in max_depth_result)) == len(max_depth_result) == 5\n\n # Test pagination\n limit_result = store.list_namespaces(prefix=(test_pref,), limit=3)\n assert len(limit_result) == 3\n\n offset_result = store.list_namespaces(prefix=(test_pref,), offset=3)\n assert len(offset_result) == len(test_namespaces) - 3\n\n empty_prefix_result = store.list_namespaces(prefix=(test_pref,))\n assert len(empty_prefix_result) == len(test_namespaces)\n assert set(empty_prefix_result) == set(test_namespaces)\n\n # Clean up\n for namespace in test_namespaces:\n store.delete(namespace, "dummy")', 'def test_doubly_nested_graph_state(\n sync_checkpointer: BaseCheckpointSaver,\n) -> None:\n class State(TypedDict):\n my_key: str\n\n class ChildState(TypedDict):\n my_key: str\n\n class GrandChildState(TypedDict):\n my_key: str\n\n def grandchild_1(state: ChildState):\n return {"my_key": state["my_key"] + " here"}\n\n def grandchild_2(state: ChildState):\n return {\n "my_key": state["my_key"] + " and there",\n }\n\n grandchild = StateGraph(GrandChildState)\n grandchild.add_node("grandchild_1", grandchild_1)\n grandchild.add_node("grandchild_2", grandchild_2)\n grandchild.add_edge("grandchild_1", "grandchild_2")\n grandchild.set_entry_point("grandchild_1")\n grandchild.set_finish_point("grandchild_2")\n\n child = StateGraph(ChildState)\n child.add_node(\n "child_1",\n grandchild.compile(interrupt_before=["grandchild_2"]),\n )\n child.set_entry_point("child_1")\n child.set_finish_point("child_1")\n\n def parent_1(state: State):\n return {"my_key": "hi " + state["my_key"]}\n\n def parent_2(state: State):\n return {"my_key": state["my_key"] + " and back again"}\n\n graph = StateGraph(State)\n graph.add_node("parent_1", parent_1)\n graph.add_node("child", child.compile())\n graph.add_node("parent_2", parent_2)\n graph.set_entry_point("parent_1")\n graph.add_edge("parent_1", "child")\n graph.add_edge("child", "parent_2")\n graph.set_finish_point("parent_2")\n\n app = graph.compile(checkpointer=sync_checkpointer)\n\n # test invoke w/ nested interrupt\n config = {"configurable": {"thread_id": "1"}}\n assert [\n c\n for c in app.stream(\n {"my_key": "my value"}, config, subgraphs=True, durability="exit"\n )\n ] == [\n ((), {"parent_1": {"my_key": "hi my value"}}),\n (\n (AnyStr("child:"), AnyStr("child_1:")),\n {"grandchild_1": {"my_key": "hi my value here"}},\n ),\n ((), {"__interrupt__": ()}),\n ]\n # get state without subgraphs\n outer_state = app.get_state(config)\n assert outer_state == StateSnapshot(\n values={"my_key": "hi my value"},\n tasks=(\n PregelTask(\n AnyStr(),\n "child",\n (PULL, "child"),\n state={\n "configurable": {\n "thread_id": "1",\n "checkpoint_ns": AnyStr("child"),\n }\n },\n ),\n ),\n next=("child",),\n config={\n "configurable": {\n "thread_id": "1",\n "checkpoint_ns": "",\n "checkpoint_id": AnyStr(),\n }\n },\n metadata={\n "parents": {},\n "source": "loop",\n "step": 1,\n },\n created_at=AnyStr(),\n parent_config=None,\n interrupts=(),\n )\n child_state = app.get_state(outer_state.tasks[0].state)\n assert child_state == StateSnapshot(\n values={"my_key": "hi my value"},\n tasks=(\n PregelTask(\n AnyStr(),\n "child_1",\n (PULL, "child_1"),\n state={\n "configurable": {\n "thread_id": "1",\n "checkpoint_ns": AnyStr(),\n }\n },\n ),\n ),\n next=("child_1",),\n config={\n "configurable": {\n "thread_id": "1",\n "checkpoint_ns": AnyStr("child:"),\n "checkpoint_id": AnyStr(),\n "checkpoint_map": AnyDict(\n {\n "": AnyStr(),\n AnyStr("child:"): AnyStr(),\n }\n ),\n }\n },\n metadata={\n "parents": {"": AnyStr()},\n "source": "loop",\n "step": 0,\n },\n created_at=AnyStr(),\n parent_config=None,\n interrupts=(),\n )\n grandchild_state = app.get_state(child_state.tasks[0].state)\n assert grandchild_state == StateSnapshot(\n values={"my_key": "hi my value here"},\n tasks=(\n PregelTask(\n AnyStr(),\n "grandchild_2",\n (PULL, "grandchild_2"),\n ),\n ),\n next=("grandchild_2",),\n config={\n "configurable": {\n "thread_id": "1",\n "checkpoint_ns": AnyStr(),\n "checkpoint_id": AnyStr(),\n "checkpoint_map": AnyDict(\n {\n "": AnyStr(),\n AnyStr("child:"): AnyStr(),\n AnyStr(re.compile(r"child:.+|child1:")): AnyStr(),\n }\n ),\n }\n },\n metadata={\n "parents": AnyDict(\n {\n "": AnyStr(),\n AnyStr("child:"): AnyStr(),\n }\n ),\n "source": "loop",\n "step": 1,\n },\n created_at=AnyStr(),\n parent_config=None,\n interrupts=(),\n )\n # get state with subgraphs\n assert app.get_state(config, subgraphs=True) == StateSnapshot(\n values={"my_key": "hi my value"},\n tasks=(\n PregelTask(\n AnyStr(),\n "child",\n (PULL, "child"),\n state=StateSnapshot(\n values={"my_key": "hi my value"},\n tasks=(\n PregelTask(\n AnyStr(),\n "child_1",\n (PULL, "child_1"),\n state=StateSnapshot(\n values={"my_key": "hi my value here"},\n tasks=(\n PregelTask(\n AnyStr(),\n "grandchild_2",\n (PULL, "grandchild_2"),\n ),\n ),\n next=("grandchild_2",),\n config={\n "configurable": {\n "thread_id": "1",\n "checkpoint_ns": AnyStr(),\n "checkpoint_id": AnyStr(),\n "checkpoint_map": AnyDict(\n {\n "": AnyStr(),\n AnyStr("child:"): AnyStr(),\n AnyStr(\n re.compile(r"child:.+|child1:")\n ): AnyStr(),\n }\n ),\n }\n },\n metadata={\n "parents": AnyDict(\n {\n "": AnyStr(),\n AnyStr("child:"): AnyStr(),\n }\n ),\n "source": "loop",\n "step": 1,\n },\n created_at=AnyStr(),\n parent_config=None,\n interrupts=(),\n ),\n ),\n ),\n next=("child_1",),\n config={\n "configurable": {\n "thread_id": "1",\n "checkpoint_ns": AnyStr("child:"),\n "checkpoint_id": AnyStr(),\n "checkpoint_map": AnyDict(\n {"": AnyStr(), AnyStr("child:"): AnyStr()}\n ),\n }\n },\n metadata={\n "parents": {"": AnyStr()},\n "source": "loop",\n "step": 0,\n },\n created_at=AnyStr(),\n parent_config=None,\n interrupts=(),\n ),\n ),\n ),\n next=("child",),\n config={\n "configurable": {\n "thread_id": "1",\n "checkpoint_ns": "",\n "checkpoint_id": AnyStr(),\n }\n },\n metadata={\n "parents": {},\n "source": "loop",\n "step": 1,\n },\n created_at=AnyStr(),\n parent_config=None,\n interrupts=(),\n )\n # # resume\n assert [c for c in app.stream(None, config, subgraphs=True, durability="exit")] == [\n (\n (AnyStr("child:"), AnyStr("child_1:")),\n {"grandchild_2": {"my_key": "hi my value here and there"}},\n ),\n ((AnyStr("child:"),), {"child_1": {"my_key": "hi my value here and there"}}),\n ((), {"child": {"my_key": "hi my value here and there"}}),\n ((), {"parent_2": {"my_key": "hi my value here and there and back again"}}),\n ]\n # get state with and without subgraphs\n assert (\n app.get_state(config)\n == app.get_state(config, subgraphs=True)\n == StateSnapshot(\n values={"my_key": "hi my value here and there and back again"},\n tasks=(),\n next=(),\n config={\n "configurable": {\n "thread_id": "1",\n "checkpoint_ns": "",\n "checkpoint_id": AnyStr(),\n }\n },\n metadata={\n "parents": {},\n "source": "loop",\n "step": 3,\n },\n created_at=AnyStr(),\n parent_config=(\n {\n "configurable": {\n "thread_id": "1",\n "checkpoint_ns": "",\n "checkpoint_id": AnyStr(),\n }\n }\n ),\n interrupts=(),\n )\n )\n\n # get outer graph history\n outer_history = list(app.get_state_history(config))\n assert outer_history == [\n StateSnapshot(\n values={"my_key": "hi my value here and there and back again"},\n tasks=(),\n next=(),\n config={\n "configurable": {\n "thread_id": "1",\n "checkpoint_ns": "",\n "checkpoint_id": AnyStr(),\n }\n },\n metadata={\n "parents": {},\n "source": "loop",\n "step": 3,\n },\n created_at=AnyStr(),\n parent_config={\n "configurable": {\n "thread_id": "1",\n "checkpoint_ns": "",\n "checkpoint_id": AnyStr(),\n }\n },\n interrupts=(),\n ),\n StateSnapshot(\n values={"my_key": "hi my value"},\n tasks=(\n PregelTask(\n AnyStr(),\n "child",\n (PULL, "child"),\n state={\n "configurable": {\n "thread_id": "1",\n "checkpoint_ns": AnyStr("child"),\n }\n },\n result=None,\n ),\n ),\n next=("child",),\n config={\n "configurable": {\n "thread_id": "1",\n "checkpoint_ns": "",\n "checkpoint_id": AnyStr(),\n }\n },\n metadata={\n "parents": {},\n "source": "loop",\n "step": 1,\n },\n created_at=AnyStr(),\n parent_config=None,\n interrupts=(),\n ),\n ]\n # get child graph history\n child_history = list(app.get_state_history(outer_history[1].tasks[0].state))\n assert child_history == [\n StateSnapshot(\n values={"my_key": "hi my value"},\n next=("child_1",),\n config={\n "configurable": {\n "thread_id": "1",\n "checkpoint_ns": AnyStr("child:"),\n "checkpoint_id": AnyStr(),\n "checkpoint_map": AnyDict(\n {"": AnyStr(), AnyStr("child:"): AnyStr()}\n ),\n }\n },\n metadata={\n "source": "loop",\n "step": 0,\n "parents": {"": AnyStr()},\n },\n created_at=AnyStr(),\n parent_config=None,\n tasks=(\n PregelTask(\n id=AnyStr(),\n name="child_1",\n path=(PULL, "child_1"),\n state={\n "configurable": {\n "thread_id": "1",\n "checkpoint_ns": AnyStr("child:"),\n }\n },\n result=None,\n ),\n ),\n interrupts=(),\n ),\n ]\n # get grandchild graph history\n grandchild_history = list(app.get_state_history(child_history[0].tasks[0].state))\n assert grandchild_history == [\n StateSnapshot(\n values={"my_key": "hi my value here"},\n next=("grandchild_2",),\n config={\n "configurable": {\n "thread_id": "1",\n "checkpoint_ns": AnyStr(),\n "checkpoint_id": AnyStr(),\n "checkpoint_map": AnyDict(\n {\n "": AnyStr(),\n AnyStr("child:"): AnyStr(),\n AnyStr(re.compile(r"child:.+|child1:")): AnyStr(),\n }\n ),\n }\n },\n metadata={\n "source": "loop",\n "step": 1,\n "parents": AnyDict(\n {\n "": AnyStr(),\n AnyStr("child:"): AnyStr(),\n }\n ),\n },\n created_at=AnyStr(),\n parent_config=None,\n tasks=(\n PregelTask(\n id=AnyStr(),\n name="grandchild_2",\n path=(PULL, "grandchild_2"),\n result=None,\n ),\n ),\n interrupts=(),\n ),\n ]', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 768] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities) # tensor([[1.0000, 0.7789, 0.3589], # [0.7789, 1.0000, 0.4748], # [0.3589, 0.4748, 1.0000]]) ``` ## Evaluation ### Metrics #### Information Retrieval * Dataset: `dim_768` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters: ```json { "truncate_dim": 768 } ``` | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.9 | | cosine_accuracy@3 | 0.9 | | cosine_accuracy@5 | 1.0 | | cosine_accuracy@10 | 1.0 | | cosine_precision@1 | 0.9 | | cosine_precision@3 | 0.3 | | cosine_precision@5 | 0.2 | | cosine_precision@10 | 0.1 | | cosine_recall@1 | 0.9 | | cosine_recall@3 | 0.9 | | cosine_recall@5 | 1.0 | | cosine_recall@10 | 1.0 | | **cosine_ndcg@10** | **0.9409** | | cosine_mrr@10 | 0.9225 | | cosine_map@100 | 0.9225 | #### Information Retrieval * Dataset: `dim_512` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters: ```json { "truncate_dim": 512 } ``` | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.9 | | cosine_accuracy@3 | 0.9 | | cosine_accuracy@5 | 1.0 | | cosine_accuracy@10 | 1.0 | | cosine_precision@1 | 0.9 | | cosine_precision@3 | 0.3 | | cosine_precision@5 | 0.2 | | cosine_precision@10 | 0.1 | | cosine_recall@1 | 0.9 | | cosine_recall@3 | 0.9 | | cosine_recall@5 | 1.0 | | cosine_recall@10 | 1.0 | | **cosine_ndcg@10** | **0.9409** | | cosine_mrr@10 | 0.9225 | | cosine_map@100 | 0.9225 | #### Information Retrieval * Dataset: `dim_256` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters: ```json { "truncate_dim": 256 } ``` | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.9 | | cosine_accuracy@3 | 0.9 | | cosine_accuracy@5 | 1.0 | | cosine_accuracy@10 | 1.0 | | cosine_precision@1 | 0.9 | | cosine_precision@3 | 0.3 | | cosine_precision@5 | 0.2 | | cosine_precision@10 | 0.1 | | cosine_recall@1 | 0.9 | | cosine_recall@3 | 0.9 | | cosine_recall@5 | 1.0 | | cosine_recall@10 | 1.0 | | **cosine_ndcg@10** | **0.9409** | | cosine_mrr@10 | 0.9225 | | cosine_map@100 | 0.9225 | #### Information Retrieval * Dataset: `dim_128` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters: ```json { "truncate_dim": 128 } ``` | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.85 | | cosine_accuracy@3 | 0.9 | | cosine_accuracy@5 | 0.95 | | cosine_accuracy@10 | 0.95 | | cosine_precision@1 | 0.85 | | cosine_precision@3 | 0.3 | | cosine_precision@5 | 0.19 | | cosine_precision@10 | 0.095 | | cosine_recall@1 | 0.85 | | cosine_recall@3 | 0.9 | | cosine_recall@5 | 0.95 | | cosine_recall@10 | 0.95 | | **cosine_ndcg@10** | **0.8943** | | cosine_mrr@10 | 0.8767 | | cosine_map@100 | 0.88 | #### Information Retrieval * Dataset: `dim_64` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters: ```json { "truncate_dim": 64 } ``` | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.85 | | cosine_accuracy@3 | 0.9 | | cosine_accuracy@5 | 0.9 | | cosine_accuracy@10 | 1.0 | | cosine_precision@1 | 0.85 | | cosine_precision@3 | 0.3 | | cosine_precision@5 | 0.18 | | cosine_precision@10 | 0.1 | | cosine_recall@1 | 0.85 | | cosine_recall@3 | 0.9 | | cosine_recall@5 | 0.9 | | cosine_recall@10 | 1.0 | | **cosine_ndcg@10** | **0.9074** | | cosine_mrr@10 | 0.8801 | | cosine_map@100 | 0.8801 | ## Training Details ### Training Dataset #### Unnamed Dataset * Size: 180 training samples * Columns: anchor and positive * Approximate statistics based on the first 180 samples: | | anchor | positive | |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------| | type | string | string | | details | | | * Samples: | anchor | positive | |:-----------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | How to implement State? | class State(TypedDict):
messages: Annotated[list[str], operator.add]
| | Best practices for test_sql_injection_vulnerability | def test_sql_injection_vulnerability(store: SqliteStore) -> None:
"""Test that SQL injection via malicious filter keys is prevented."""
# Add public and private documents
store.put(("docs",), "public", {"access": "public", "data": "public info"})
store.put(
("docs",), "private", {"access": "private", "data": "secret", "password": "123"}
)

# Normal query - returns 1 public document
normal = store.search(("docs",), filter={"access": "public"})
assert len(normal) == 1
assert normal[0].value["access"] == "public"

# SQL injection attempt via malicious key should raise ValueError
malicious_key = "access') = 'public' OR '1'='1' OR json_extract(value, '$."

with pytest.raises(ValueError, match="Invalid filter key"):
store.search(("docs",), filter={malicious_key: "dummy"})
| | Example usage of put_writes | def put_writes(
self,
config: RunnableConfig,
writes: Sequence[tuple[str, Any]],
task_id: str,
task_path: str = "",
) -> None:
"""Store intermediate writes linked to a checkpoint.

This method saves intermediate writes associated with a checkpoint to the Postgres database.

Args:
config: Configuration of the related checkpoint.
writes: List of writes to store.
task_id: Identifier for the task creating the writes.
"""
query = (
self.UPSERT_CHECKPOINT_WRITES_SQL
if all(w[0] in WRITES_IDX_MAP for w in writes)
else self.INSERT_CHECKPOINT_WRITES_SQL
)
with self._cursor(pipeline=True) as cur:
cur.executemany(
query,
self._dump_writes(
config["configurable"]["thread_id"],
config["configurable"]["checkpoint_ns"],
config["c...
| * Loss: [MatryoshkaLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters: ```json { "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 } ``` ### Training Hyperparameters #### Non-Default Hyperparameters - `eval_strategy`: epoch - `per_device_train_batch_size`: 4 - `per_device_eval_batch_size`: 4 - `gradient_accumulation_steps`: 16 - `learning_rate`: 2e-05 - `num_train_epochs`: 2 - `lr_scheduler_type`: cosine - `warmup_ratio`: 0.1 - `fp16`: True - `load_best_model_at_end`: True - `optim`: adamw_torch - `batch_sampler`: no_duplicates #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: epoch - `prediction_loss_only`: True - `per_device_train_batch_size`: 4 - `per_device_eval_batch_size`: 4 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 16 - `eval_accumulation_steps`: None - `torch_empty_cache_steps`: None - `learning_rate`: 2e-05 - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1.0 - `num_train_epochs`: 2 - `max_steps`: -1 - `lr_scheduler_type`: cosine - `lr_scheduler_kwargs`: {} - `warmup_ratio`: 0.1 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `bf16`: False - `fp16`: True - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: None - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: True - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `parallelism_config`: None - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `project`: huggingface - `trackio_space_id`: trackio - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: False - `resume_from_checkpoint`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: None - `hub_always_push`: False - `hub_revision`: None - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `include_for_metrics`: [] - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: no - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `eval_on_start`: False - `use_liger_kernel`: False - `liger_kernel_config`: None - `eval_use_gather_object`: False - `average_tokens_across_devices`: True - `prompts`: None - `batch_sampler`: no_duplicates - `multi_dataset_batch_sampler`: proportional - `router_mapping`: {} - `learning_rate_mapping`: {}
### Training Logs | Epoch | Step | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 | |:-------:|:-----:|:----------------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:| | 1.0 | 3 | 0.9409 | 0.9202 | 0.9431 | 0.8412 | 0.9059 | | **2.0** | **6** | **0.9409** | **0.9409** | **0.9409** | **0.8943** | **0.9074** | * The bold row denotes the saved checkpoint. ### Framework Versions - Python: 3.14.0 - Sentence Transformers: 5.2.2 - Transformers: 4.57.3 - PyTorch: 2.9.1 - Accelerate: 1.12.0 - Datasets: 4.5.0 - Tokenizers: 0.22.2 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### MatryoshkaLoss ```bibtex @misc{kusupati2024matryoshka, title={Matryoshka Representation Learning}, author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi}, year={2024}, eprint={2205.13147}, archivePrefix={arXiv}, primaryClass={cs.LG} } ``` #### MultipleNegativesRankingLoss ```bibtex @misc{henderson2017efficient, title={Efficient Natural Language Response Suggestion for Smart Reply}, author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, year={2017}, eprint={1705.00652}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```