Spaces:

OnyxMunk
/

GravityFalls

Paused

frdel commited on Dec 15, 2024

Commit

082b100

1 Parent(s): 4c80980

Squashed commit of the following:

commit 9d4e1b68b2ab41eefc534ef1f48953496d7d1cc6
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Sun Dec 15 18:01:51 2024 +0100

ctx window popup fix, default settings fix

commit 9ef32085651bc02610e1317b29f8d0b4913ae49f
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Sun Dec 15 14:55:53 2024 +0100

models, settings, initializer refactor

Rate limiter fix
Models initialized JIT
Model call wrappers for agent
Message compression fix
Log progress update
Settings frontend numeric fields

commit f7b3e2540c0ab798658cc6fed783942423d74df8
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Sun Dec 8 19:42:17 2024 +0100

knowledge import/reload

commit 4e028a3ce428cec581273a2261b6bdd8c474853e
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Sun Dec 8 11:28:47 2024 +0100

Memory recall speedup

commit 8ec3b24696e0346a3aea22d0f304ca297004bdcc
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Sun Dec 8 00:17:02 2024 +0100

keyboard input tool

commit a76a302f3f4e9fce9183aa0585595d6f18ae215a
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Sat Dec 7 23:28:03 2024 +0100

solutions cleanup

commit 884007cdb0064ef9d550dc99e6c3066719242da1
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Sat Dec 7 21:51:14 2024 +0100

console print edits for docker

commit 927c234d69312d57a90b03896bf9e62bf2583bd6
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Sat Dec 7 20:40:28 2024 +0100

openai azure model func name fix

commit 53a46288f9a72928ec902b36625080ad07ebe2a1
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Fri Dec 6 15:17:58 2024 +0100

mistral fix, error text output

commit 6aa37744fc9ca77271e33c195bf67a78dd7937a7
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Fri Dec 6 14:58:10 2024 +0100

toast fix

commit f0be03ea77c3d4ef72ec8180df87e0bfffb46198
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Fri Dec 6 14:33:28 2024 +0100

toast errors

commit 84346828128d230a39a14d4ad32d14dec9bea8b7
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Fri Dec 6 11:30:34 2024 +0100

warnings cleanup

commit 2b94af895d517e32932f7f71ffea277a63dce940
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Fri Dec 6 10:54:06 2024 +0100

Preload fix

commit 7f270d4a14032bbe05a8b22c873af0febfa78668
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Fri Dec 6 09:44:13 2024 +0100

Server startup log msg

commit f9c9b5c93369269dd5eb71d222d8918f2bec6715
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Fri Dec 6 07:50:15 2024 +0100

Update run_ui.py

commit f3ca7e0742b12a93a8d5cc6cce066d88ad56a63b
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Fri Dec 6 06:21:14 2024 +0100

Update run_ui.py

commit 21975c5a7cc7b3ad8b9ab95f940b5e6f6a743231
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Thu Dec 5 20:45:51 2024 +0100

local models docker url

commit f0a8b07c4fd2b1a5daaaf52142f194a8fdb8fcef
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Thu Dec 5 16:40:49 2024 +0100

Server addr notice

commit 656612726a3bbf01e4cd6ede01a60ce05b855c97
Merge: 49594fe 7c2866c
Author: Jan Tomášek <38891707+frdel@users.noreply.github.com>
Date: Thu Dec 5 16:11:23 2024 +0100

Merge pull request #260 from 3clyp50/development

fix: toast handling, mobile breakpoint

commit 7c2866ca614fff704b820e8f1ff6c9f50006320b
Author: Alessandro <155005371+3clyp50@users.noreply.github.com>
Date: Wed Dec 4 19:37:50 2024 +0100

fix: toast handling, mobile breakpoint

`toast.css` and `index.js`
- fixed toasts disappearing right after showing
- simplified toast animation

`index.css`
- set 2ⁿᵈ mobile breakpoint at 640px

commit 49594fe6ec2d32a1855a2ccbd9479d4fda347651
Merge: f697754 70b1fa3
Author: Jan Tomášek <38891707+frdel@users.noreply.github.com>
Date: Wed Dec 4 10:39:58 2024 +0100

Merge pull request #259 from 3clyp50/development

CSS refactor and toasts

commit 70b1fa385af8d86d1d5280a5b34e1a8a9abeb3cf
Author: Alessandro <155005371+3clyp50@users.noreply.github.com>
Date: Wed Dec 4 02:17:50 2024 +0100

refactor: css, style: toasts, fix: z-index

- organized structure
- consolidated selectors and states
- shorthand everywhere

- modern toasts
- bigger action buttons for mobile

commit f6977546c11b63e2e47dce8367cad8a6c62248fc
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Tue Dec 3 22:42:36 2024 +0100

call subordinate fix

commit fbe47ac03e56cfb005a1cd2b044c6305e27ca436
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Tue Dec 3 21:19:03 2024 +0100

Minor fixes

commit 961dbc405af8a784ecadcfcbcd7652d1f8d9be28
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Tue Dec 3 21:10:45 2024 +0100

restart

commit 357909c16a66c0e7ec78a2a993a9a4e54dd67bf9
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Tue Dec 3 19:41:29 2024 +0100

whisper remote preload

commit e0b0b6f6367841c85dfa9c2156f46db755c88497
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Tue Dec 3 17:39:56 2024 +0100

nudge

commit 9fae02b2a55bb1760fae926c2b40cd07ee26a61c
Merge: 0ebc142 fedf2d4
Author: Jan Tomášek <38891707+frdel@users.noreply.github.com>
Date: Tue Dec 3 14:57:18 2024 +0100

Merge pull request #256 from 3clyp50/development

feature: copy text button, nudge & fix: various styles

commit 0ebc142124fa3dcb370d95fd2a84bdba8f3145e8
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Tue Dec 3 14:56:33 2024 +0100

ssh connection retry

commit deae13d3834c7031a18cd30d5ee593f804b1b09a
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Tue Dec 3 14:38:57 2024 +0100

root pass fix

commit 9109fcbf60a8c9cc975c9e21306c619d81a2b43c
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Tue Dec 3 14:28:53 2024 +0100

root password change fix

commit 46689d6477d51966b9876b7d51b180e871569ebb
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Tue Dec 3 14:22:18 2024 +0100

RFC & SSH exchange for development

commit fedf2d4bdc6357f9e50e76b9202e06081c66db5e
Author: Alessandro <155005371+3clyp50@users.noreply.github.com>
Date: Tue Dec 3 04:03:14 2024 +0100

feature: copy text button, nudge & fix: various styles

- Copy button for all messages
- Nudge button front-end
- Fixed various non-styled light mode elements

to do -> css cleanup and whisper loading

commit 19f50d6d9509acdaea2a5ccd846b5de2722b4a07
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Sun Dec 1 20:50:17 2024 +0100

attachments, files, prompt extras, prompt caching, refactors, cleanups

commit c99b1a47d4f25d8184661a77418ebfafa5c00ee9
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Fri Nov 29 08:55:27 2024 +0100

Alpine fix version, STT fixes

commit 81e653ba2d710ad31e43d658738cf6a843461792
Merge: 857f8b6 89b8483
Author: Jan Tomášek <38891707+frdel@users.noreply.github.com>
Date: Thu Nov 28 23:08:09 2024 +0100

Merge pull request #255 from 3clyp50/development

feature: speech to text settings

commit 857f8b6d82ec6707f45c67fa7e2a360e535071b0
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Thu Nov 28 23:05:17 2024 +0100

download and remove folders in browser

commit 89b848312b6f553fe22a6c0039bc7ab93b716384
Author: Alessandro <155005371+3clyp50@users.noreply.github.com>
Date: Thu Nov 28 16:07:50 2024 +0100

feature: speech to text settings

- initial commit: voice settings

- Settings section for STT

commit b3a27bb442668e4a21e79be4ab96c73a09f2b864
Merge: 5e8d6b1 bb980ea
Author: Jan Tomášek <38891707+frdel@users.noreply.github.com>
Date: Thu Nov 28 08:39:01 2024 +0100

Merge pull request #254 from 3clyp50/development

fix: file browser bugs + final ui polishing

commit bb980ea6b93a074b24cf86c54b0be69596b34cb1
Author: Alessandro <155005371+3clyp50@users.noreply.github.com>
Date: Thu Nov 28 01:13:56 2024 +0100

fix: file browser deletion bug + parent directory

Underscore matters!
- fixed both bugs for the browser

Extra:
- style for toasts

quickfix generic modals

commit f0126a6ef87c43aa34e6fbc7595d89f10f6c3b27
Author: Alessandro <155005371+3clyp50@users.noreply.github.com>
Date: Wed Nov 27 23:44:20 2024 +0100

style: polishing and consistency

commit 5e8d6b1c7d3ec965eac864b2bb72c85360bae8c2
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Wed Nov 27 22:16:13 2024 +0100

Minor fixes

commit 184f8dcf53ec49733d20967246374f08469d7e84
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Wed Nov 27 22:05:23 2024 +0100

Pause button fix

commit 969f142af12c01abd9009a8e35e0cbbd225bca8d
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Wed Nov 27 22:01:06 2024 +0100

RFC fix, history bugfixes

commit 733b8de5163b3fc36c68df099a1860af210e6a1d
Merge: f2057d3 6a83e79
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Wed Nov 27 20:57:15 2024 +0100

Merge branch 'pr/253' into development

commit 6a83e79d5a42fb44bfcac88c5ade3fda85ba2b28
Author: Alessandro <real.eclypso@gmail.com>
Date: Wed Nov 27 20:41:53 2024 +0100

fix: bigger modals

commit f2057d390178a760b7a857f918a8bb4dee586194
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Wed Nov 27 17:30:19 2024 +0100

Squashed commit of the following:

commit e626817332661f48ec97da1d4ab42479ca40b50f
Author: Alessandro <155005371+3clyp50@users.noreply.github.com>
Date: Wed Nov 27 12:51:22 2024 +0100

refactor: modals css

Modals now get the base styles from modals.css, with any spec in the individual files (settings.css, file_manager.css, ecc).

commit 306db0ca395a9f5691e558c7a18a02c9cecabaa3
Author: Alessandro <155005371+3clyp50@users.noreply.github.com>
Date: Wed Nov 27 03:17:20 2024 +0100

style: new action buttons + ghost buttons

Updated styles for but

Files changed (20) hide show

.vscode/settings.json +2 -0
agent.py +119 -70
initialize.py +40 -36
models.py +36 -2
python/extensions/message_loop_prompts/_50_recall_memories.py +3 -3
python/extensions/message_loop_prompts/_51_recall_solutions.py +3 -3
python/extensions/message_loop_prompts/_91_recall_wait.py +2 -2
python/extensions/monologue_end/_50_memorize_fragments.py +5 -4
python/extensions/monologue_end/_51_memorize_solutions.py +5 -4
python/helpers/history.py +34 -40
python/helpers/log.py +36 -9
python/helpers/memory.py +33 -21
python/helpers/persist_chat.py +1 -1
python/helpers/rate_limiter.py +46 -58
python/helpers/settings.py +168 -83
python/tools/behaviour_adjustment.py +3 -3
python/tools/response.py +0 -1
run_ui.py +1 -1
webui/css/settings.css +1 -0
webui/index.html +10 -1

.vscode/settings.json CHANGED Viewed

@@ -1,3 +1,5 @@
 {
     "python.analysis.typeCheckingMode": "standard",
 }

 {
     "python.analysis.typeCheckingMode": "standard",
+    "windsurfPyright.analysis.diagnosticMode": "workspace",
+    "windsurfPyright.analysis.typeCheckingMode": "standard",
 }

agent.py CHANGED Viewed

@@ -2,8 +2,12 @@ import asyncio
 from collections import OrderedDict
 from dataclasses import dataclass, field
 import time, importlib, inspect, os, json
-from typing import Any, Optional, Dict, TypedDict
 import uuid
 from python.helpers import extract_tools, rate_limiter, files, errors, history, tokens
 from python.helpers.print_style import PrintStyle
 from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
@@ -131,19 +135,25 @@ class AgentContext:
             agent.handle_critical_exception(e)
 @dataclass
 class AgentConfig:
-    chat_model: BaseChatModel | BaseLLM
-    utility_model: BaseChatModel | BaseLLM
-    embeddings_model: Embeddings
     prompts_subdir: str = ""
     memory_subdir: str = ""
     knowledge_subdirs: list[str] = field(default_factory=lambda: ["default", "custom"])
-    rate_limit_seconds: int = 60
-    rate_limit_requests: int = 15
-    rate_limit_input_tokens: int = 0
-    rate_limit_output_tokens: int = 0
-    response_timeout_seconds: int = 60
     code_exec_docker_enabled: bool = False
     code_exec_docker_name: str = "A0-dev"
     code_exec_docker_image: str = "frdel/agent-zero-run:development"
@@ -222,13 +232,6 @@ class Agent:
         self.history = history.History(self)
         self.last_user_message: history.Message | None = None
         self.intervention: UserMessage | None = None
-        self.rate_limiter = rate_limiter.RateLimiter(
-            self.context.log,
-            max_calls=self.config.rate_limit_requests,
-            max_input_tokens=self.config.rate_limit_input_tokens,
-            max_output_tokens=self.config.rate_limit_output_tokens,
-            window_seconds=self.config.rate_limit_seconds,
-        )
         self.data = {}  # free data object all the tools can use
     async def monologue(self):
@@ -245,20 +248,11 @@ class Agent:
                 while True:
                     self.context.streaming_agent = self  # mark self as current streamer
-                    agent_response = ""
                     self.loop_data.iteration += 1
                     try:
                         # prepare LLM chain (model, system, history)
-                        chain, prompt = await self.prepare_chain(
-                            loop_data=self.loop_data
-                        )
-                        # rate limiter TODO - move to extension, make per-model
-                        formatted_inputs = prompt.format()
-                        self.set_data(self.DATA_NAME_CTX_WINDOW, formatted_inputs)
-                        token_count = tokens.approximate_tokens(formatted_inputs)
-                        self.rate_limiter.limit_call_and_input(token_count)
                         # output that the agent is starting
                         PrintStyle(
@@ -271,27 +265,18 @@ class Agent:
                             type="agent", heading=f"{self.agent_name}: Generating"
                         )
-                        async for chunk in chain.astream({}):
-                            # wait for intervention and handle it, if paused
-                            await self.handle_intervention(agent_response)
-                            if isinstance(chunk, str):
-                                content = chunk
-                            elif hasattr(chunk, "content"):
-                                content = str(chunk.content)
-                            else:
-                                content = str(chunk)
-                            if content:
-                                # output the agent response stream
-                                printer.stream(content)
-                                # concatenate stream into the response
-                                agent_response += content
-                                self.log_from_stream(agent_response, log)
-                        self.rate_limiter.set_output_tokens(
-                            int(len(agent_response) / 4)
-                        )  # rough estimation
                         await self.handle_intervention(agent_response)
@@ -319,14 +304,14 @@ class Agent:
                     # exceptions inside message loop:
                     except InterventionException as e:
                         pass  # intervention message has been handled in handle_intervention(), proceed with conversation loop
-                    except (
-                        RepairableException
-                    ) as e:  # Forward repairable errors to the LLM, maybe it can fix them
                         error_message = errors.format_error(e)
                         await self.hist_add_warning(error_message)
                         PrintStyle(font_color="red", padding=True).print(error_message)
                         self.context.log.log(type="error", content=error_message)
-                    except Exception as e:  # Other exception kill the loop
                         self.handle_critical_exception(e)
                     finally:
@@ -345,7 +330,7 @@ class Agent:
                 # call monologue_end extensions
                 await self.call_extensions("monologue_end", loop_data=self.loop_data)  # type: ignore
-    async def prepare_chain(self, loop_data: LoopData):
         # set system prompt and message history
         loop_data.system = await self.get_system_prompt(self.loop_data)
         loop_data.history_output = self.history.output()
@@ -374,10 +359,7 @@ class Agent:
                 *history_langchain,
             ]
         )
-        # return callable chain
-        chain = prompt | self.config.chat_model
-        return chain, prompt
     def handle_critical_exception(self, exception: Exception):
         if isinstance(exception, HandledException):
@@ -498,39 +480,106 @@ class Agent:
     ):  # TODO add param for message range, topic, history
         return self.history.output_text(human_label="user", ai_label="assistant")
-    async def call_utility_llm(
-        self, system: str, msg: str, callback: Callable[[str], None] | None = None
     ):
         prompt = ChatPromptTemplate.from_messages(
-            [SystemMessage(content=system), HumanMessage(content=msg)]
         )
-        chain = prompt | self.config.utility_model
         response = ""
-        formatted_inputs = prompt.format()
-        token_count = tokens.approximate_tokens(formatted_inputs)
-        self.rate_limiter.limit_call_and_input(token_count)
-        async for chunk in chain.astream({}):
             await self.handle_intervention()  # wait for intervention and handle it, if paused
-            if isinstance(chunk, str):
-                content = chunk
-            elif hasattr(chunk, "content"):
-                content = str(chunk.content)
-            else:
-                content = str(chunk)
             if callback:
-                callback(content)
             response += content
-        self.rate_limiter.set_output_tokens(int(len(response) / 4))
         return response
     async def handle_intervention(self, progress: str = ""):
         while self.context.paused:
             await asyncio.sleep(0.1)  # wait if paused

 from collections import OrderedDict
 from dataclasses import dataclass, field
 import time, importlib, inspect, os, json
+import token
+from typing import Any, Awaitable, Optional, Dict, TypedDict
 import uuid
+import models
+from langchain_core.prompt_values import ChatPromptValue
 from python.helpers import extract_tools, rate_limiter, files, errors, history, tokens
 from python.helpers.print_style import PrintStyle
 from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
             agent.handle_critical_exception(e)
+@dataclass
+class ModelConfig:
+    provider: models.ModelProvider
+    name: str
+    ctx_length: int
+    limit_requests: int
+    limit_input: int
+    limit_output: int
+    kwargs: dict
 @dataclass
 class AgentConfig:
+    chat_model: ModelConfig
+    utility_model: ModelConfig
+    embeddings_model: ModelConfig
     prompts_subdir: str = ""
     memory_subdir: str = ""
     knowledge_subdirs: list[str] = field(default_factory=lambda: ["default", "custom"])
     code_exec_docker_enabled: bool = False
     code_exec_docker_name: str = "A0-dev"
     code_exec_docker_image: str = "frdel/agent-zero-run:development"
         self.history = history.History(self)
         self.last_user_message: history.Message | None = None
         self.intervention: UserMessage | None = None
         self.data = {}  # free data object all the tools can use
     async def monologue(self):
                 while True:
                     self.context.streaming_agent = self  # mark self as current streamer
                     self.loop_data.iteration += 1
                     try:
                         # prepare LLM chain (model, system, history)
+                        prompt = await self.prepare_prompt(loop_data=self.loop_data)
                         # output that the agent is starting
                         PrintStyle(
                             type="agent", heading=f"{self.agent_name}: Generating"
                         )
+                        async def stream_callback(chunk: str, full: str):
+                            # output the agent response stream
+                            if chunk:
+                                printer.stream(chunk)
+                                self.log_from_stream(full, log)
+                        # store as last context window content
+                        self.set_data(Agent.DATA_NAME_CTX_WINDOW, prompt.format())
+                        agent_response = await self.call_chat_model(
+                            prompt, callback=stream_callback
+                        )
                         await self.handle_intervention(agent_response)
                     # exceptions inside message loop:
                     except InterventionException as e:
                         pass  # intervention message has been handled in handle_intervention(), proceed with conversation loop
+                    except RepairableException as e:
+                        # Forward repairable errors to the LLM, maybe it can fix them
                         error_message = errors.format_error(e)
                         await self.hist_add_warning(error_message)
                         PrintStyle(font_color="red", padding=True).print(error_message)
                         self.context.log.log(type="error", content=error_message)
+                    except Exception as e:
+                        # Other exception kill the loop
                         self.handle_critical_exception(e)
                     finally:
                 # call monologue_end extensions
                 await self.call_extensions("monologue_end", loop_data=self.loop_data)  # type: ignore
+    async def prepare_prompt(self, loop_data: LoopData) -> ChatPromptTemplate:
         # set system prompt and message history
         loop_data.system = await self.get_system_prompt(self.loop_data)
         loop_data.history_output = self.history.output()
                 *history_langchain,
             ]
         )
+        return prompt
     def handle_critical_exception(self, exception: Exception):
         if isinstance(exception, HandledException):
     ):  # TODO add param for message range, topic, history
         return self.history.output_text(human_label="user", ai_label="assistant")
+    async def call_utility_model(
+        self,
+        system: str,
+        message: str,
+        callback: Callable[[str], Awaitable[None]] | None = None,
+        background: bool = False,
     ):
         prompt = ChatPromptTemplate.from_messages(
+            [SystemMessage(content=system), HumanMessage(content=message)]
         )
         response = ""
+        # model class
+        model = models.get_model(
+            models.ModelType.CHAT,
+            self.config.utility_model.provider,
+            self.config.utility_model.name,
+            **self.config.utility_model.kwargs,
+        )
+        # rate limiter
+        limiter = await self.rate_limiter(
+            self.config.utility_model, prompt.format(), background
+        )
+        async for chunk in (prompt | model).astream({}):
             await self.handle_intervention()  # wait for intervention and handle it, if paused
+            content = models.parse_chunk(chunk)
+            limiter.add(output=tokens.approximate_tokens(content))
+            response += content
             if callback:
+                await callback(content)
+        return response
+    async def call_chat_model(
+        self,
+        prompt: ChatPromptTemplate,
+        callback: Callable[[str, str], Awaitable[None]] | None = None,
+    ):
+        response = ""
+        # model class
+        model = models.get_model(
+            models.ModelType.CHAT,
+            self.config.chat_model.provider,
+            self.config.chat_model.name,
+            **self.config.chat_model.kwargs,
+        )
+        # rate limiter
+        limiter = await self.rate_limiter(self.config.chat_model, prompt.format())
+        async for chunk in (prompt | model).astream({}):
+            await self.handle_intervention()  # wait for intervention and handle it, if paused
+            content = models.parse_chunk(chunk)
+            limiter.add(output=tokens.approximate_tokens(content))
             response += content
+            if callback:
+                await callback(content, response)
         return response
+    async def rate_limiter(
+        self, model_config: ModelConfig, input: str, background: bool = False
+    ):
+        # rate limiter log
+        wait_log = None
+        async def wait_callback(msg: str, key: str, total: int, limit: int):
+            nonlocal wait_log
+            if not wait_log:
+                wait_log = self.context.log.log(
+                    type="util",
+                    update_progress="none",
+                    heading=msg,
+                    model=f"{model_config.provider.value}\\{model_config.name}",
+                )
+            wait_log.update(heading=msg, key=key, value=total, limit=limit)
+            if not background:
+                self.context.log.set_progress(msg, -1)
+        # rate limiter
+        limiter = models.get_rate_limiter(
+            model_config.provider,
+            model_config.name,
+            model_config.limit_requests,
+            model_config.limit_input,
+            model_config.limit_output,
+        )
+        limiter.add(input=tokens.approximate_tokens(input))
+        limiter.add(requests=1)
+        await limiter.wait(callback=wait_callback)
+        return limiter
     async def handle_intervention(self, progress: str = ""):
         while self.context.paused:
             await asyncio.sleep(0.1)  # wait if paused

initialize.py CHANGED Viewed

@@ -1,6 +1,6 @@
 import asyncio
 import models
-from agent import AgentConfig
 from python.helpers import dotenv, files, rfc_exchange, runtime, settings, docker, log
@@ -8,36 +8,45 @@ def initialize():
     current_settings = settings.get_settings()
-    # main chat model used by agents (smarter, more accurate)
-    # chat_llm = models.get_openai_chat(model_name="gpt-4o-mini", temperature=0)
-    # chat_llm = models.get_ollama_chat(model_name="llama3.2:3b-instruct-fp16", temperature=0)
-    # chat_llm = models.get_lmstudio_chat(model_name="lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF", temperature=0)
-    # chat_llm = models.get_openrouter_chat(model_name="openai/o1-mini-2024-09-12")
-    # chat_llm = models.get_azure_openai_chat(deployment_name="gpt-4o-mini", temperature=0)
-    # chat_llm = models.get_anthropic_chat(model_name="claude-3-5-sonnet-20240620", temperature=0)
-    # chat_llm = models.get_google_chat(model_name="gemini-1.5-flash", temperature=0)
-    # chat_llm = models.get_mistral_chat(model_name="mistral-small-latest", temperature=0)
-    # chat_llm = models.get_groq_chat(model_name="llama-3.2-90b-text-preview", temperature=0)
-    # chat_llm = models.get_sambanova_chat(model_name="Meta-Llama-3.1-70B-Instruct-8k", temperature=0)
-    chat_llm = settings.get_chat_model(
-        current_settings
-    )  # chat model from user settings
-    # utility model used for helper functions (cheaper, faster)
-    # utility_llm = chat_llm
-    utility_llm = settings.get_utility_model(
-        current_settings
-    )  # utility model from user settings
-    # embedding model used for memory
-    # embedding_llm = models.get_openai_embedding(model_name="text-embedding-3-small")
-    # embedding_llm = models.get_ollama_embedding(model_name="nomic-embed-text")
-    # embedding_llm = models.get_huggingface_embedding(model_name="sentence-transformers/all-MiniLM-L6-v2")
-    # embedding_llm = models.get_lmstudio_embedding(model_name="nomic-ai/nomic-embed-text-v1.5-GGUF")
-    embedding_llm = settings.get_embedding_model(
-        current_settings
-    )  # embedding model from user settings
     # agent configuration
     config = AgentConfig(
         chat_model=chat_llm,
@@ -46,12 +55,7 @@ def initialize():
         prompts_subdir=current_settings["agent_prompts_subdir"],
         memory_subdir=current_settings["agent_memory_subdir"],
         knowledge_subdirs=["default", current_settings["agent_knowledge_subdir"]],
-        # rate_limit_seconds = 60,
-        rate_limit_requests=30,
-        # rate_limit_input_tokens = 0,
-        # rate_limit_output_tokens = 0,
-        # response_timeout_seconds = 60,
-        code_exec_docker_enabled = False,
         # code_exec_docker_name = "A0-dev",
         # code_exec_docker_image = "frdel/agent-zero-run:development",
         # code_exec_docker_ports = { "22/tcp": 55022, "80/tcp": 55080 }

 import asyncio
 import models
+from agent import AgentConfig, ModelConfig
 from python.helpers import dotenv, files, rfc_exchange, runtime, settings, docker, log
     current_settings = settings.get_settings()
+    # chat model from user settings
+    chat_llm = ModelConfig(
+        provider=models.ModelProvider[current_settings["chat_model_provider"]],
+        name=current_settings["chat_model_name"],
+        ctx_length=current_settings["chat_model_ctx_length"],
+        limit_requests=current_settings["chat_model_rl_requests"],
+        limit_input=current_settings["chat_model_rl_input"],
+        limit_output=current_settings["chat_model_rl_output"],
+        kwargs={
+            "temperature": current_settings["chat_model_temperature"],
+            **current_settings["chat_model_kwargs"],
+        },
+    )
+    # utility model from user settings
+    utility_llm = ModelConfig(
+        provider=models.ModelProvider[current_settings["util_model_provider"]],
+        name=current_settings["util_model_name"],
+        ctx_length=current_settings["util_model_ctx_length"],
+        limit_requests=current_settings["util_model_rl_requests"],
+        limit_input=current_settings["util_model_rl_input"],
+        limit_output=current_settings["util_model_rl_output"],
+        kwargs={
+            "temperature": current_settings["util_model_temperature"],
+            **current_settings["util_model_kwargs"],
+        },
+    )
+    # embedding model from user settings
+    embedding_llm = ModelConfig(
+        provider=models.ModelProvider[current_settings["embed_model_provider"]],
+        name=current_settings["embed_model_name"],
+        ctx_length=0,
+        limit_requests=current_settings["embed_model_rl_requests"],
+        limit_input=0,
+        limit_output=0,
+        kwargs={
+            **current_settings["embed_model_kwargs"],
+        },
+    )
     # agent configuration
     config = AgentConfig(
         chat_model=chat_llm,
         prompts_subdir=current_settings["agent_prompts_subdir"],
         memory_subdir=current_settings["agent_memory_subdir"],
         knowledge_subdirs=["default", current_settings["agent_knowledge_subdir"]],
+        code_exec_docker_enabled=False,
         # code_exec_docker_name = "A0-dev",
         # code_exec_docker_image = "frdel/agent-zero-run:development",
         # code_exec_docker_ports = { "22/tcp": 55022, "80/tcp": 55080 }

models.py CHANGED Viewed

@@ -1,5 +1,6 @@
 from enum import Enum
 import os
 from langchain_openai import (
     ChatOpenAI,
     OpenAI,
@@ -28,6 +29,7 @@ from langchain_mistralai import ChatMistralAI
 from pydantic.v1.types import SecretStr
 from python.helpers import dotenv, runtime
 from python.helpers.dotenv import load_dotenv
 # environment variables
 load_dotenv()
@@ -56,6 +58,9 @@ class ModelProvider(Enum):
     OTHER = "Other"
 # Utility function to get API keys from environment variables
 def get_api_key(service):
     return (
@@ -71,11 +76,36 @@ def get_model(type: ModelType, provider: ModelProvider, name: str, **kwargs):
     return model
 # Ollama models
 def get_ollama_base_url():
-    return dotenv.get_dotenv_value("OLLAMA_BASE_URL") or f"http://{runtime.get_local_url()}:11434"
 def get_ollama_chat(
     model_name: str,
@@ -138,7 +168,11 @@ def get_huggingface_embedding(model_name: str, **kwargs):
 # LM Studio and other OpenAI compatible interfaces
 def get_lmstudio_base_url():
-    return dotenv.get_dotenv_value("LM_STUDIO_BASE_URL") or f"http://{runtime.get_local_url()}:1234/v1"
 def get_lmstudio_chat(
     model_name: str,

 from enum import Enum
 import os
+from typing import Any
 from langchain_openai import (
     ChatOpenAI,
     OpenAI,
 from pydantic.v1.types import SecretStr
 from python.helpers import dotenv, runtime
 from python.helpers.dotenv import load_dotenv
+from python.helpers.rate_limiter import RateLimiter
 # environment variables
 load_dotenv()
     OTHER = "Other"
+rate_limiters: dict[str, RateLimiter] = {}
 # Utility function to get API keys from environment variables
 def get_api_key(service):
     return (
     return model
+def get_rate_limiter(
+    provider: ModelProvider, name: str, requests: int, input: int, output: int
+) -> RateLimiter:
+    # get or create
+    key = f"{provider.name}\\{name}"
+    rate_limiters[key] = limiter = rate_limiters.get(key, RateLimiter(seconds=60))
+    # always update
+    limiter.limits["requests"] = requests or 0
+    limiter.limits["input"] = input or 0
+    limiter.limits["output"] = output or 0
+    return limiter
+def parse_chunk(chunk: Any):
+    if isinstance(chunk, str):
+        content = chunk
+    elif hasattr(chunk, "content"):
+        content = str(chunk.content)
+    else:
+        content = str(chunk)
+    return content
 # Ollama models
 def get_ollama_base_url():
+    return (
+        dotenv.get_dotenv_value("OLLAMA_BASE_URL")
+        or f"http://{runtime.get_local_url()}:11434"
+    )
 def get_ollama_chat(
     model_name: str,
 # LM Studio and other OpenAI compatible interfaces
 def get_lmstudio_base_url():
+    return (
+        dotenv.get_dotenv_value("LM_STUDIO_BASE_URL")
+        or f"http://{runtime.get_local_url()}:1234/v1"
+    )
 def get_lmstudio_chat(
     model_name: str,

python/extensions/message_loop_prompts/_50_recall_memories.py CHANGED Viewed

@@ -53,13 +53,13 @@ class RecallMemories(Extension):
         )
         # log query streamed by LLM
-        def log_callback(content):
             log_item.stream(query=content)
         # call util llm to summarize conversation
-        query = await self.agent.call_utility_llm(
             system=system,
-            msg=loop_data.user_message.output_text() if loop_data.user_message else "",
             callback=log_callback,
         )

         )
         # log query streamed by LLM
+        async def log_callback(content):
             log_item.stream(query=content)
         # call util llm to summarize conversation
+        query = await self.agent.call_utility_model(
             system=system,
+            message=loop_data.user_message.output_text() if loop_data.user_message else "",
             callback=log_callback,
         )

python/extensions/message_loop_prompts/_51_recall_solutions.py CHANGED Viewed

@@ -53,12 +53,12 @@ class RecallSolutions(Extension):
         )
         # log query streamed by LLM
-        def log_callback(content):
             log_item.stream(query=content)
         # call util llm to summarize conversation
-        query = await self.agent.call_utility_llm(
-            system=system, msg=loop_data.user_message.output_text() if loop_data.user_message else "", callback=log_callback
         )
         # get solutions database

         )
         # log query streamed by LLM
+        async def log_callback(content):
             log_item.stream(query=content)
         # call util llm to summarize conversation
+        query = await self.agent.call_utility_model(
+            system=system, message=loop_data.user_message.output_text() if loop_data.user_message else "", callback=log_callback
         )
         # get solutions database

python/extensions/message_loop_prompts/_91_recall_wait.py CHANGED Viewed

@@ -9,11 +9,11 @@ class RecallWait(Extension):
             task = self.agent.get_data(DATA_NAME_TASK_MEMORIES)
             if task and not task.done():
-                self.agent.context.log.set_progress("Recalling memories...")
                 await task
             task = self.agent.get_data(DATA_NAME_TASK_SOLUTIONS)
             if task and not task.done():
-                self.agent.context.log.set_progress("Recalling solutions...")
                 await task

             task = self.agent.get_data(DATA_NAME_TASK_MEMORIES)
             if task and not task.done():
+                # self.agent.context.log.set_progress("Recalling memories...")
                 await task
             task = self.agent.get_data(DATA_NAME_TASK_SOLUTIONS)
             if task and not task.done():
+                # self.agent.context.log.set_progress("Recalling solutions...")
                 await task

python/extensions/monologue_end/_50_memorize_fragments.py CHANGED Viewed

@@ -35,14 +35,15 @@ class MemorizeMemories(Extension):
         msgs_text = self.agent.concat_messages(self.agent.history)
         # log query streamed by LLM
-        def log_callback(content):
             log_item.stream(content=content)
         # call util llm to find info in history
-        memories_json = await self.agent.call_utility_llm(
             system=system,
-            msg=msgs_text,
             callback=log_callback,
         )
         memories = DirtyJson.parse_string(memories_json)
@@ -76,7 +77,7 @@ class MemorizeMemories(Extension):
                     log_item.update(replaced=rem_txt)
             # insert new solution
-            db.insert_text(text=txt, metadata={"area": Memory.Area.FRAGMENTS.value})
         log_item.update(
             result=f"{len(memories)} entries memorized.",

         msgs_text = self.agent.concat_messages(self.agent.history)
         # log query streamed by LLM
+        async def log_callback(content):
             log_item.stream(content=content)
         # call util llm to find info in history
+        memories_json = await self.agent.call_utility_model(
             system=system,
+            message=msgs_text,
             callback=log_callback,
+            background=True,
         )
         memories = DirtyJson.parse_string(memories_json)
                     log_item.update(replaced=rem_txt)
             # insert new solution
+            await db.insert_text(text=txt, metadata={"area": Memory.Area.FRAGMENTS.value})
         log_item.update(
             result=f"{len(memories)} entries memorized.",

python/extensions/monologue_end/_51_memorize_solutions.py CHANGED Viewed

@@ -33,14 +33,15 @@ class MemorizeSolutions(Extension):
         msgs_text = self.agent.concat_messages(self.agent.history)
         # log query streamed by LLM
-        def log_callback(content):
             log_item.stream(content=content)
         # call util llm to find solutions in history
-        solutions_json = await self.agent.call_utility_llm(
             system=system,
-            msg=msgs_text,
             callback=log_callback,
         )
         solutions = DirtyJson.parse_string(solutions_json)
@@ -75,7 +76,7 @@ class MemorizeSolutions(Extension):
                     log_item.update(replaced=rem_txt)
             # insert new solution
-            db.insert_text(text=txt, metadata={"area": Memory.Area.SOLUTIONS.value})
         solutions_txt = solutions_txt.strip()
         log_item.update(solutions=solutions_txt)

         msgs_text = self.agent.concat_messages(self.agent.history)
         # log query streamed by LLM
+        async def log_callback(content):
             log_item.stream(content=content)
         # call util llm to find solutions in history
+        solutions_json = await self.agent.call_utility_model(
             system=system,
+            message=msgs_text,
             callback=log_callback,
+            background=True,
         )
         solutions = DirtyJson.parse_string(solutions_json)
                     log_item.update(replaced=rem_txt)
             # insert new solution
+            await db.insert_text(text=txt, metadata={"area": Memory.Area.SOLUTIONS.value})
         solutions_txt = solutions_txt.strip()
         log_item.update(solutions=solutions_txt)

python/helpers/history.py CHANGED Viewed

@@ -130,12 +130,12 @@ class Topic(Record):
             * LARGE_MESSAGE_TO_TOPIC_RATIO
         )
         large_msgs = []
-        for m in self.messages:
             out = m.output()
             text = output_text(out)
             tok = tokens.approximate_tokens(text)
             leng = len(text)
-            if leng > msg_max_size:
                 large_msgs.append((m, tok, leng, out))
         large_msgs.sort(key=lambda x: x[1], reverse=True)
         for msg, tok, leng, out in large_msgs:
@@ -173,12 +173,11 @@ class Topic(Record):
     async def summarize_messages(self, messages: list[Message]):
         msg_txt = [m.output_text() for m in messages]
-        summary = await call_llm.call_llm(
             system=self.history.agent.read_prompt("fw.topic_summary.sys.md"),
             message=self.history.agent.read_prompt(
                 "fw.topic_summary.msg.md", content=msg_txt
             ),
-            model=settings.get_utility_model(),
         )
         return summary
@@ -218,12 +217,11 @@ class Bulk(Record):
         return False
     async def summarize(self):
-        self.summary = await call_llm.call_llm(
             system=self.history.agent.read_prompt("fw.topic_summary.sys.md"),
             message=self.history.agent.read_prompt(
                 "fw.topic_summary.msg.md", content=self.output_text()
             ),
-            model=settings.get_utility_model(),
         )
         return self.summary
@@ -309,42 +307,38 @@ class History(Record):
         return json.dumps(data)
     async def compress(self):
-        curr, hist, bulk = (
-            self.get_current_topic_tokens(),
-            self.get_topics_tokens(),
-            self.get_bulks_tokens(),
-        )
-        total = get_ctx_size_for_history()
         compressed = False
-        # calculate ratios of individual parts
-        ratios = [
-            (curr, CURRENT_TOPIC_RATIO, "current_topic"),
-            (hist, HISTORY_TOPIC_RATIO, "history_topic"),
-            (bulk, HISTORY_BULK_RATIO, "history_bulk"),
-        ]
-        # start from the most oversized part and compress it
-        ratios = sorted(ratios, key=lambda x: (x[0] / total) / x[1], reverse=True)
-        for ratio in ratios:
-            if ratio[0] > ratio[1] * total:
-                over_part = ratio[2]
-                if over_part == "current_topic":
-                    compressed = await self.current.compress()
-                elif over_part == "history_topic":
-                    compressed = await self.compress_topics()
-                else:
-                    compressed = await self.compress_bulks()
-                # if part was compressed, stop the loop and try the whole function again, maybe no more compression is necessary
-                if compressed:
-                    break
             else:
-                break
-        # try the whole function again to see if there is still a need for compression
-        if compressed:
-            await self.compress()
-        return compressed
     async def compress_topics(self) -> bool:
         # summarize topics one by one

             * LARGE_MESSAGE_TO_TOPIC_RATIO
         )
         large_msgs = []
+        for m in (m for m in self.messages if not m.summary):
             out = m.output()
             text = output_text(out)
             tok = tokens.approximate_tokens(text)
             leng = len(text)
+            if tok > msg_max_size:
                 large_msgs.append((m, tok, leng, out))
         large_msgs.sort(key=lambda x: x[1], reverse=True)
         for msg, tok, leng, out in large_msgs:
     async def summarize_messages(self, messages: list[Message]):
         msg_txt = [m.output_text() for m in messages]
+        summary = await self.history.agent.call_utility_model(
             system=self.history.agent.read_prompt("fw.topic_summary.sys.md"),
             message=self.history.agent.read_prompt(
                 "fw.topic_summary.msg.md", content=msg_txt
             ),
         )
         return summary
         return False
     async def summarize(self):
+        self.summary = await self.history.agent.call_utility_model(
             system=self.history.agent.read_prompt("fw.topic_summary.sys.md"),
             message=self.history.agent.read_prompt(
                 "fw.topic_summary.msg.md", content=self.output_text()
             ),
         )
         return self.summary
         return json.dumps(data)
     async def compress(self):
         compressed = False
+        while True:
+            curr, hist, bulk = (
+                self.get_current_topic_tokens(),
+                self.get_topics_tokens(),
+                self.get_bulks_tokens(),
+            )
+            total = get_ctx_size_for_history()
+            ratios = [
+                (curr, CURRENT_TOPIC_RATIO, "current_topic"),
+                (hist, HISTORY_TOPIC_RATIO, "history_topic"),
+                (bulk, HISTORY_BULK_RATIO, "history_bulk"),
+            ]
+            ratios = sorted(ratios, key=lambda x: (x[0] / total) / x[1], reverse=True)
+            compressed_part = False
+            for ratio in ratios:
+                if ratio[0] > ratio[1] * total:
+                    over_part = ratio[2]
+                    if over_part == "current_topic":
+                        compressed_part = await self.current.compress()
+                    elif over_part == "history_topic":
+                        compressed_part = await self.compress_topics()
+                    else:
+                        compressed_part = await self.compress_bulks()
+                    if compressed_part:
+                        break
+            if compressed_part:
+                compressed = True
+                continue
             else:
+                return compressed
     async def compress_topics(self) -> bool:
         # summarize topics one by one

python/helpers/log.py CHANGED Viewed

@@ -19,6 +19,9 @@ Type = Literal[
     "warning",
 ]
 @dataclass
 class LogItem:
     log: "Log"
@@ -27,6 +30,7 @@ class LogItem:
     heading: str
     content: str
     temp: bool
     kvps: Optional[OrderedDict] = None  # Use OrderedDict for kvps
     id: Optional[str] = None  # Add id field
     guid: str = ""
@@ -41,20 +45,27 @@ class LogItem:
         content: str | None = None,
         kvps: dict | None = None,
         temp: bool | None = None,
         **kwargs,
     ):
         if self.guid == self.log.guid:
-            self.log.update_item(
                 self.no,
                 type=type,
                 heading=heading,
                 content=content,
                 kvps=kvps,
                 temp=temp,
                 **kwargs,
             )
-    def stream(self, heading: str | None = None, content: str | None = None, **kwargs):
         if heading is not None:
             self.update(heading=self.heading + heading)
         if content is not None:
@@ -75,6 +86,7 @@ class LogItem:
             "kvps": self.kvps,
         }
 class Log:
     def __init__(self):
@@ -90,7 +102,9 @@ class Log:
         content: str | None = None,
         kvps: dict | None = None,
         temp: bool | None = None,
         id: Optional[str] = None,  # Add id parameter
     ) -> LogItem:
         # Use OrderedDict if kvps is provided
         if kvps is not None:
@@ -101,17 +115,19 @@ class Log:
             type=type,
             heading=heading or "",
             content=content or "",
-            kvps=kvps,
-            temp=temp or False,
             id=id,  # Pass id to LogItem
         )
         self.logs.append(item)
         self.updates += [item.no]
-        if heading and item.no >= self.progress_no:
-            self.set_progress(heading, item.no)
         return item
-    def update_item(
         self,
         no: int,
         type: str | None = None,
@@ -119,15 +135,16 @@ class Log:
         content: str | None = None,
         kvps: dict | None = None,
         temp: bool | None = None,
         **kwargs,
     ):
         item = self.logs[no]
         if type is not None:
             item.type = type
         if heading is not None:
             item.heading = heading
-            if no >= self.progress_no:
-                self.set_progress(heading, no)
         if content is not None:
             item.content = content
         if kvps is not None:
@@ -143,6 +160,7 @@ class Log:
                 item.kvps[k] = v
         self.updates += [item.no]
     def set_progress(self, progress: str, no: int = 0, active: bool = True):
         self.progress = progress
@@ -174,3 +192,12 @@ class Log:
         self.updates = []
         self.logs = []
         self.set_initial_progress()

     "warning",
 ]
+ProgressUpdate = Literal["persistent", "temporary", "none"]
 @dataclass
 class LogItem:
     log: "Log"
     heading: str
     content: str
     temp: bool
+    update_progress: Optional[ProgressUpdate] = "persistent"
     kvps: Optional[OrderedDict] = None  # Use OrderedDict for kvps
     id: Optional[str] = None  # Add id field
     guid: str = ""
         content: str | None = None,
         kvps: dict | None = None,
         temp: bool | None = None,
+        update_progress: ProgressUpdate | None = None,
         **kwargs,
     ):
         if self.guid == self.log.guid:
+            self.log._update_item(
                 self.no,
                 type=type,
                 heading=heading,
                 content=content,
                 kvps=kvps,
                 temp=temp,
+                update_progress=update_progress,
                 **kwargs,
             )
+    def stream(
+        self,
+        heading: str | None = None,
+        content: str | None = None,
+        **kwargs,
+    ):
         if heading is not None:
             self.update(heading=self.heading + heading)
         if content is not None:
             "kvps": self.kvps,
         }
 class Log:
     def __init__(self):
         content: str | None = None,
         kvps: dict | None = None,
         temp: bool | None = None,
+        update_progress: ProgressUpdate | None = None,
         id: Optional[str] = None,  # Add id parameter
+        **kwargs,
     ) -> LogItem:
         # Use OrderedDict if kvps is provided
         if kvps is not None:
             type=type,
             heading=heading or "",
             content=content or "",
+            kvps=OrderedDict({**(kvps or {}), **(kwargs or {})}),
+            update_progress=(
+                update_progress if update_progress is not None else "persistent"
+            ),
+            temp=temp if temp is not None else False,
             id=id,  # Pass id to LogItem
         )
         self.logs.append(item)
         self.updates += [item.no]
+        self._update_progress_from_item(item)
         return item
+    def _update_item(
         self,
         no: int,
         type: str | None = None,
         content: str | None = None,
         kvps: dict | None = None,
         temp: bool | None = None,
+        update_progress: ProgressUpdate | None = None,
         **kwargs,
     ):
         item = self.logs[no]
         if type is not None:
             item.type = type
+        if update_progress is not None:
+            item.update_progress = update_progress
         if heading is not None:
             item.heading = heading
         if content is not None:
             item.content = content
         if kvps is not None:
                 item.kvps[k] = v
         self.updates += [item.no]
+        self._update_progress_from_item(item)
     def set_progress(self, progress: str, no: int = 0, active: bool = True):
         self.progress = progress
         self.updates = []
         self.logs = []
         self.set_initial_progress()
+    def _update_progress_from_item(self, item: LogItem):
+        if item.heading and item.update_progress != "none":
+            if item.no >= self.progress_no:
+                self.set_progress(
+                    item.heading,
+                    (item.no if item.update_progress == "persistent" else -1),
+                )

python/helpers/memory.py CHANGED Viewed

@@ -10,6 +10,8 @@ from langchain_community.docstore.in_memory import InMemoryDocstore
 from langchain_community.vectorstores.utils import (
     DistanceStrategy,
 )
 import os, json
 import numpy as np
@@ -22,6 +24,7 @@ from python.helpers import knowledge_import
 from python.helpers.log import Log, LogItem
 from enum import Enum
 from agent import Agent
 class MyFaiss(FAISS):
@@ -54,7 +57,12 @@ class Memory:
             )
             db = Memory.initialize(
                 log_item,
-                agent.config.embeddings_model,
                 memory_subdir,
                 False,
             )
@@ -82,7 +90,7 @@ class Memory:
     @staticmethod
     def initialize(
         log_item: LogItem | None,
-        embeddings_model,
         memory_subdir: str,
         in_memory=False,
     ) -> MyFaiss:
@@ -187,7 +195,7 @@ class Memory:
                     index[file]["ids"]
                 )  # remove original version
             if index[file]["state"] == "changed":
-                index[file]["ids"] = self.insert_documents(
                     index[file]["documents"]
                 )  # insert new version
@@ -234,6 +242,11 @@ class Memory:
         self, query: str, limit: int, threshold: float, filter: str = ""
     ):
         comparator = Memory._get_comparator(filter) if filter else None
         return await self.db.asearch(
             query,
             search_type="similarity_score_threshold",
@@ -287,30 +300,28 @@ class Memory:
             self._save_db()  # persist
         return rem_docs
-    def insert_text(self, text, metadata: dict = {}):
-        id = str(uuid.uuid4())
-        if not metadata.get("area", ""):
-            metadata["area"] = Memory.Area.MAIN.value
-        self.db.add_documents(
-            documents=[
-                Document(
-                    text,
-                    metadata={"id": id, "timestamp": self.get_timestamp(), **metadata},
-                )
-            ],
-            ids=[id],
-        )
-        self._save_db()  # persist
-        return id
-    def insert_documents(self, docs: list[Document]):
         ids = [str(uuid.uuid4()) for _ in range(len(docs))]
         timestamp = self.get_timestamp()
         if ids:
             for doc, id in zip(docs, ids):
                 doc.metadata["id"] = id  # add ids to documents metadata
                 doc.metadata["timestamp"] = timestamp  # add timestamp
             self.db.add_documents(documents=docs, ids=ids)
             self._save_db()  # persist
         return ids
@@ -365,8 +376,9 @@ class Memory:
 def get_memory_subdir_abs(agent: Agent) -> str:
     return files.get_abs_path("memory", agent.config.memory_subdir or "default")
 def get_custom_knowledge_subdir_abs(agent: Agent) -> str:
     for dir in agent.config.knowledge_subdirs:
-        if dir != "default":
             return files.get_abs_path("knowledge", dir)
     raise Exception("No custom knowledge subdir set")

 from langchain_community.vectorstores.utils import (
     DistanceStrategy,
 )
+from langchain_core.embeddings import Embeddings
 import os, json
 import numpy as np
 from python.helpers.log import Log, LogItem
 from enum import Enum
 from agent import Agent
+import models
 class MyFaiss(FAISS):
             )
             db = Memory.initialize(
                 log_item,
+                models.get_model(
+                    models.ModelType.EMBEDDING,
+                    agent.config.embeddings_model.provider,
+                    agent.config.embeddings_model.name,
+                    **agent.config.embeddings_model.kwargs,
+                ),
                 memory_subdir,
                 False,
             )
     @staticmethod
     def initialize(
         log_item: LogItem | None,
+        embeddings_model: Embeddings,
         memory_subdir: str,
         in_memory=False,
     ) -> MyFaiss:
                     index[file]["ids"]
                 )  # remove original version
             if index[file]["state"] == "changed":
+                index[file]["ids"] = await self.insert_documents(
                     index[file]["documents"]
                 )  # insert new version
         self, query: str, limit: int, threshold: float, filter: str = ""
     ):
         comparator = Memory._get_comparator(filter) if filter else None
+        #rate limiter
+        await self.agent.rate_limiter(
+            model_config=self.agent.config.embeddings_model, input=query)
         return await self.db.asearch(
             query,
             search_type="similarity_score_threshold",
             self._save_db()  # persist
         return rem_docs
+    async def insert_text(self, text, metadata: dict = {}):
+        doc = Document(text, metadata=metadata)
+        ids = await self.insert_documents([doc])
+        return ids[0]
+    async def insert_documents(self, docs: list[Document]):
         ids = [str(uuid.uuid4()) for _ in range(len(docs))]
         timestamp = self.get_timestamp()
         if ids:
             for doc, id in zip(docs, ids):
                 doc.metadata["id"] = id  # add ids to documents metadata
                 doc.metadata["timestamp"] = timestamp  # add timestamp
+                if not doc.metadata.get("area", ""):
+                    doc.metadata["area"] = Memory.Area.MAIN.value
+            #rate limiter
+            docs_txt = "".join(self.format_docs_plain(docs))
+            await self.agent.rate_limiter(
+                model_config=self.agent.config.embeddings_model, input=docs_txt)
             self.db.add_documents(documents=docs, ids=ids)
             self._save_db()  # persist
         return ids
 def get_memory_subdir_abs(agent: Agent) -> str:
     return files.get_abs_path("memory", agent.config.memory_subdir or "default")
 def get_custom_knowledge_subdir_abs(agent: Agent) -> str:
     for dir in agent.config.knowledge_subdirs:
+        if dir != "default":
             return files.get_abs_path("knowledge", dir)
     raise Exception("No custom knowledge subdir set")

python/helpers/persist_chat.py CHANGED Viewed

@@ -174,7 +174,7 @@ def _deserialize_log(data: dict[str, Any]) -> "Log":
         log.logs.append(
             LogItem(
                 log=log,  # restore the log reference
-                no=item_data["no"],
                 type=item_data["type"],
                 heading=item_data.get("heading", ""),
                 content=item_data.get("content", ""),

         log.logs.append(
             LogItem(
                 log=log,  # restore the log reference
+                no=i, #item_data["no"],
                 type=item_data["type"],
                 heading=item_data.get("heading", ""),
                 content=item_data.get("content", ""),

python/helpers/rate_limiter.py CHANGED Viewed

@@ -1,68 +1,56 @@
 import time
-from collections import deque
-from dataclasses import dataclass
-from typing import List, Tuple
-from .print_style import PrintStyle
-from .log import Log
-@dataclass
-class CallRecord:
-    timestamp: float
-    input_tokens: int
-    output_tokens: int = 0  # Default to 0, will be set separately
 class RateLimiter:
-    def __init__(self, logger: Log, max_calls: int, max_input_tokens: int, max_output_tokens: int, window_seconds: int = 60):
-        self.logger = logger
-        self.max_calls = max_calls
-        self.max_input_tokens = max_input_tokens
-        self.max_output_tokens = max_output_tokens
-        self.window_seconds = window_seconds
-        self.call_records: deque = deque()
-    def _clean_old_records(self, current_time: float):
-        while self.call_records and current_time - self.call_records[0].timestamp > self.window_seconds:
-            self.call_records.popleft()
-    def _get_counts(self) -> Tuple[int, int, int]:
-        calls = len(self.call_records)
-        input_tokens = sum(record.input_tokens for record in self.call_records)
-        output_tokens = sum(record.output_tokens for record in self.call_records)
-        return calls, input_tokens, output_tokens
-    def _wait_if_needed(self, current_time: float, new_input_tokens: int):
         while True:
-            self._clean_old_records(current_time)
-            calls, input_tokens, output_tokens = self._get_counts()
-            wait_reasons = []
-            if self.max_calls > 0 and calls >= self.max_calls:
-                wait_reasons.append("max calls")
-            if self.max_input_tokens > 0 and input_tokens + new_input_tokens > self.max_input_tokens:
-                wait_reasons.append("max input tokens")
-            if self.max_output_tokens > 0 and output_tokens >= self.max_output_tokens:
-                wait_reasons.append("max output tokens")
-            if not wait_reasons:
-                break
-            oldest_record = self.call_records[0]
-            wait_time = oldest_record.timestamp + self.window_seconds - current_time
-            if wait_time > 0:
-                PrintStyle(font_color="yellow", padding=True).print(f"Rate limit exceeded. Waiting for {wait_time:.2f} seconds due to: {', '.join(wait_reasons)}")
-                self.logger.log("rate_limit","Rate limit exceeded",f"Rate limit exceeded. Waiting for {wait_time:.2f} seconds due to: {', '.join(wait_reasons)}")
-                # TODO rate limit log type
-                time.sleep(wait_time)
-            current_time = time.time()
-    def limit_call_and_input(self, input_token_count: int) -> CallRecord:
-        current_time = time.time()
-        self._wait_if_needed(current_time, input_token_count)
-        new_record = CallRecord(current_time, input_token_count)
-        self.call_records.append(new_record)
-        return new_record
-    def set_output_tokens(self, output_token_count: int):
-        if self.call_records:
-            self.call_records[-1].output_tokens += output_token_count
-        return self

+import asyncio
 import time
+from typing import Callable, Awaitable
 class RateLimiter:
+    def __init__(self, seconds: int = 60, **limits: int):
+        self.timeframe = seconds
+        self.limits = {key: value if isinstance(value, (int, float)) else 0 for key, value in (limits or {}).items()}
+        self.values = {key: [] for key in self.limits.keys()}
+        self._lock = asyncio.Lock()
+    def add(self, **kwargs: int):
+        now = time.time()
+        for key, value in kwargs.items():
+            if not key in self.values:
+                self.values[key] = []
+            self.values[key].append((now, value))
+    async def cleanup(self):
+        async with self._lock:
+            now = time.time()
+            cutoff = now - self.timeframe
+            for key in self.values:
+                self.values[key] = [(t, v) for t, v in self.values[key] if t > cutoff]
+    async def get_total(self, key: str) -> int:
+        async with self._lock:
+            if not key in self.values:
+                return 0
+            return sum(value for _, value in self.values[key])
+    async def wait(
+        self,
+        callback: Callable[[str, str, int, int], Awaitable[None]] | None = None,
+    ):
         while True:
+            await self.cleanup()
+            should_wait = False
+            for key, limit in self.limits.items():
+                if limit <= 0:  # Skip if no limit set
+                    continue
+                total = await self.get_total(key)
+                if total > limit:
+                    if callback:
+                        msg = f"Rate limit exceeded for {key} ({total}/{limit}), waiting..."
+                        await callback(msg, key, total, limit)
+                    should_wait = True
+                    break
+            if not should_wait:
+                break
+            await asyncio.sleep(1)

python/helpers/settings.py CHANGED Viewed

@@ -8,10 +8,6 @@ from typing import Any, Literal, TypedDict
 import models
 from python.helpers import runtime, whisper, defer
 from . import files, dotenv
-from models import get_model, ModelProvider, ModelType
-from langchain_core.language_models.chat_models import BaseChatModel
-from langchain_core.embeddings import Embeddings
 class Settings(TypedDict):
     chat_model_provider: str
@@ -20,15 +16,26 @@ class Settings(TypedDict):
     chat_model_kwargs: dict[str, str]
     chat_model_ctx_length: int
     chat_model_ctx_history: float
     util_model_provider: str
     util_model_name: str
     util_model_temperature: float
     util_model_kwargs: dict[str, str]
     embed_model_provider: str
     embed_model_name: str
     embed_model_kwargs: dict[str, str]
     agent_prompts_subdir: str
     agent_memory_subdir: str
@@ -66,7 +73,7 @@ class SettingsField(TypedDict, total=False):
     id: str
     title: str
     description: str
-    type: Literal["input", "select", "range", "textarea", "password"]
     value: Any
     min: float
     max: float
@@ -91,6 +98,8 @@ _settings: Settings | None = None
 def convert_out(settings: Settings) -> SettingsOutput:
     # main model section
     chat_model_fields: list[SettingsField] = []
@@ -109,7 +118,7 @@ def convert_out(settings: Settings) -> SettingsOutput:
             "id": "chat_model_name",
             "title": "Chat model name",
             "description": "Exact name of model from selected provider",
-            "type": "input",
             "value": settings["chat_model_name"],
         }
     )
@@ -132,7 +141,7 @@ def convert_out(settings: Settings) -> SettingsOutput:
             "id": "chat_model_ctx_length",
             "title": "Chat model context length",
             "description": "Maximum number of tokens in the context window for LLM. System prompt, chat history, RAG and response all count towards this limit.",
-            "type": "input",
             "value": settings["chat_model_ctx_length"],
         }
     )
@@ -150,6 +159,36 @@ def convert_out(settings: Settings) -> SettingsOutput:
         }
     )
     chat_model_fields.append(
         {
             "id": "chat_model_kwargs",
@@ -183,7 +222,7 @@ def convert_out(settings: Settings) -> SettingsOutput:
             "id": "util_model_name",
             "title": "Utility model name",
             "description": "Exact name of model from selected provider",
-            "type": "input",
             "value": settings["util_model_name"],
         }
     )
@@ -200,6 +239,58 @@ def convert_out(settings: Settings) -> SettingsOutput:
             "value": settings["util_model_temperature"],
         }
     )
     util_model_fields.append(
         {
@@ -234,46 +325,28 @@ def convert_out(settings: Settings) -> SettingsOutput:
             "id": "embed_model_name",
             "title": "Embedding model name",
             "description": "Exact name of model from selected provider",
-            "type": "input",
             "value": settings["embed_model_name"],
         }
     )
     embed_model_fields.append(
         {
-            "id": "embed_model_kwargs",
-            "title": "Embedding model additional parameters",
-            "description": "Any other parameters supported by the model. Format is KEY=VALUE on individual lines, just like .env file.",
-            "type": "textarea",
-            "value": _dict_to_env(settings["embed_model_kwargs"]),
         }
     )
-    embed_model_section: SettingsSection = {
-        "title": "Embedding Model",
-        "description": "Settings for the embedding model used by Agent Zero.",
-        "fields": embed_model_fields,
-    }
-    # embedding model section
-    embed_model_fields: list[SettingsField] = []
-    embed_model_fields.append(
-        {
-            "id": "embed_model_provider",
-            "title": "Embedding model provider",
-            "description": "Select provider for embedding model used by the framework",
-            "type": "select",
-            "value": settings["embed_model_provider"],
-            "options": [{"value": p.name, "label": p.value} for p in ModelProvider],
-        }
-    )
     embed_model_fields.append(
         {
-            "id": "embed_model_name",
-            "title": "Embedding model name",
-            "description": "Exact name of model from selected provider",
-            "type": "input",
-            "value": settings["embed_model_name"],
         }
     )
@@ -301,7 +374,7 @@ def convert_out(settings: Settings) -> SettingsOutput:
             "id": "auth_login",
             "title": "UI Login",
             "description": "Set user name for web UI",
-            "type": "input",
             "value": dotenv.get_dotenv_value(dotenv.KEY_AUTH_LOGIN) or "",
         }
     )
@@ -423,7 +496,7 @@ def convert_out(settings: Settings) -> SettingsOutput:
         #         "id": "rfc_auto_docker",
         #         "title": "RFC Auto Docker Management",
         #         "description": "Automatically create dockerized instance of A0 for RFCs using this instance's code base and, settings and .env.",
-        #         "type": "input",
         #         "value": settings["rfc_auto_docker"],
         #     }
         # )
@@ -433,7 +506,7 @@ def convert_out(settings: Settings) -> SettingsOutput:
                 "id": "rfc_url",
                 "title": "RFC Destination URL",
                 "description": "URL of dockerized A0 instance for remote function calls. Do not specify port here.",
-                "type": "input",
                 "value": settings["rfc_url"],
             }
         )
@@ -458,7 +531,7 @@ def convert_out(settings: Settings) -> SettingsOutput:
                 "id": "rfc_port_http",
                 "title": "RFC HTTP port",
                 "description": "HTTP port for dockerized instance of A0.",
-                "type": "input",
                 "value": settings["rfc_port_http"],
             }
         )
@@ -468,7 +541,7 @@ def convert_out(settings: Settings) -> SettingsOutput:
                 "id": "rfc_port_ssh",
                 "title": "RFC SSH port",
                 "description": "SSH port for dockerized instance of A0.",
-                "type": "input",
                 "value": settings["rfc_port_ssh"],
             }
         )
@@ -505,7 +578,7 @@ def convert_out(settings: Settings) -> SettingsOutput:
             "id": "stt_language",
             "title": "Language Code",
             "description": "Language code (e.g. en, fr, it)",
-            "type": "input",
             "value": settings["stt_language"],
         }
     )
@@ -528,7 +601,7 @@ def convert_out(settings: Settings) -> SettingsOutput:
             "id": "stt_silence_duration",
             "title": "Silence duration (ms)",
             "description": "Duration of silence before the server considers speaking to have ended.",
-            "type": "input",
             "value": settings["stt_silence_duration"],
         }
     )
@@ -538,7 +611,7 @@ def convert_out(settings: Settings) -> SettingsOutput:
             "id": "stt_waiting_timeout",
             "title": "Waiting timeout (ms)",
             "description": "Duration before the server closes the microphone.",
-            "type": "input",
             "value": settings["stt_waiting_timeout"],
         }
     )
@@ -617,43 +690,43 @@ def normalize_settings(settings: Settings) -> Settings:
             try:
                 copy[key] = type(value)(copy[key])  # type: ignore
             except (ValueError, TypeError):
-                pass
     return copy
-def get_chat_model(settings: Settings | None = None) -> BaseChatModel:
-    if not settings:
-        settings = get_settings()
-    return get_model(
-        type=ModelType.CHAT,
-        provider=ModelProvider[settings["chat_model_provider"]],
-        name=settings["chat_model_name"],
-        temperature=settings["chat_model_temperature"],
-        **settings["chat_model_kwargs"],
-    )
-def get_utility_model(settings: Settings | None = None) -> BaseChatModel:
-    if not settings:
-        settings = get_settings()
-    return get_model(
-        type=ModelType.CHAT,
-        provider=ModelProvider[settings["util_model_provider"]],
-        name=settings["util_model_name"],
-        temperature=settings["util_model_temperature"],
-        **settings["util_model_kwargs"],
-    )
-def get_embedding_model(settings: Settings | None = None) -> Embeddings:
-    if not settings:
-        settings = get_settings()
-    return get_model(
-        type=ModelType.EMBEDDING,
-        provider=ModelProvider[settings["embed_model_provider"]],
-        name=settings["embed_model_name"],
-        **settings["embed_model_kwargs"],
-    )
 def _read_settings_file() -> Settings | None:
@@ -697,20 +770,32 @@ def _write_sensitive_settings(settings: Settings):
 def get_default_settings() -> Settings:
     return Settings(
         chat_model_provider=ModelProvider.OPENAI.name,
         chat_model_name="gpt-4o-mini",
-        chat_model_temperature=0,
         chat_model_kwargs={},
-        chat_model_ctx_length=8192,
         chat_model_ctx_history=0.7,
         util_model_provider=ModelProvider.OPENAI.name,
         util_model_name="gpt-4o-mini",
-        util_model_temperature=0,
         util_model_kwargs={},
         embed_model_provider=ModelProvider.OPENAI.name,
         embed_model_name="text-embedding-3-small",
         embed_model_kwargs={},
         api_keys={},
         auth_login="",
         auth_password="",

 import models
 from python.helpers import runtime, whisper, defer
 from . import files, dotenv
 class Settings(TypedDict):
     chat_model_provider: str
     chat_model_kwargs: dict[str, str]
     chat_model_ctx_length: int
     chat_model_ctx_history: float
+    chat_model_rl_requests: int
+    chat_model_rl_input: int
+    chat_model_rl_output: int
     util_model_provider: str
     util_model_name: str
     util_model_temperature: float
     util_model_kwargs: dict[str, str]
+    util_model_ctx_length: int
+    util_model_ctx_input: float
+    util_model_rl_requests: int
+    util_model_rl_input: int
+    util_model_rl_output: int
     embed_model_provider: str
     embed_model_name: str
     embed_model_kwargs: dict[str, str]
+    embed_model_rl_requests: int
+    embed_model_rl_input: int
     agent_prompts_subdir: str
     agent_memory_subdir: str
     id: str
     title: str
     description: str
+    type: Literal["text", "number", "select", "range", "textarea", "password"]
     value: Any
     min: float
     max: float
 def convert_out(settings: Settings) -> SettingsOutput:
+    from models import ModelProvider
     # main model section
     chat_model_fields: list[SettingsField] = []
             "id": "chat_model_name",
             "title": "Chat model name",
             "description": "Exact name of model from selected provider",
+            "type": "text",
             "value": settings["chat_model_name"],
         }
     )
             "id": "chat_model_ctx_length",
             "title": "Chat model context length",
             "description": "Maximum number of tokens in the context window for LLM. System prompt, chat history, RAG and response all count towards this limit.",
+            "type": "number",
             "value": settings["chat_model_ctx_length"],
         }
     )
         }
     )
+    chat_model_fields.append(
+        {
+            "id": "chat_model_rl_requests",
+            "title": "Requests per minute limit",
+            "description": "Limits the number of requests per minute to the chat model. Waits if the limit is exceeded. Set to 0 to disable rate limiting.",
+            "type": "number",
+            "value": settings["chat_model_rl_requests"],
+        }
+    )
+    chat_model_fields.append(
+        {
+            "id": "chat_model_rl_input",
+            "title": "Input tokens per minute limit",
+            "description": "Limits the number of input tokens per minute to the chat model. Waits if the limit is exceeded. Set to 0 to disable rate limiting.",
+            "type": "number",
+            "value": settings["chat_model_rl_input"],
+        }
+    )
+    chat_model_fields.append(
+        {
+            "id": "chat_model_rl_output",
+            "title": "Output tokens per minute limit",
+            "description": "Limits the number of output tokens per minute to the chat model. Waits if the limit is exceeded. Set to 0 to disable rate limiting.",
+            "type": "number",
+            "value": settings["chat_model_rl_output"],
+        }
+    )
     chat_model_fields.append(
         {
             "id": "chat_model_kwargs",
             "id": "util_model_name",
             "title": "Utility model name",
             "description": "Exact name of model from selected provider",
+            "type": "text",
             "value": settings["util_model_name"],
         }
     )
             "value": settings["util_model_temperature"],
         }
     )
+    # util_model_fields.append(
+    #     {
+    #         "id": "util_model_ctx_length",
+    #         "title": "Utility model context length",
+    #         "description": "Maximum number of tokens in the context window for LLM. System prompt, message and response all count towards this limit.",
+    #         "type": "number",
+    #         "value": settings["util_model_ctx_length"],
+    #     }
+    # )
+    # util_model_fields.append(
+    #     {
+    #         "id": "util_model_ctx_input",
+    #         "title": "Context window space for input tokens",
+    #         "description": "Portion of context window dedicated to input tokens. The remaining space can be filled with response.",
+    #         "type": "range",
+    #         "min": 0.01,
+    #         "max": 1,
+    #         "step": 0.01,
+    #         "value": settings["util_model_ctx_input"],
+    #     }
+    # )
+    util_model_fields.append(
+        {
+            "id": "util_model_rl_requests",
+            "title": "Requests per minute limit",
+            "description": "Limits the number of requests per minute to the utility model. Waits if the limit is exceeded. Set to 0 to disable rate limiting.",
+            "type": "number",
+            "value": settings["util_model_rl_requests"],
+        }
+    )
+    util_model_fields.append(
+        {
+            "id": "util_model_rl_input",
+            "title": "Input tokens per minute limit",
+            "description": "Limits the number of input tokens per minute to the utility model. Waits if the limit is exceeded. Set to 0 to disable rate limiting.",
+            "type": "number",
+            "value": settings["util_model_rl_input"],
+        }
+    )
+    util_model_fields.append(
+        {
+            "id": "util_model_rl_output",
+            "title": "Output tokens per minute limit",
+            "description": "Limits the number of output tokens per minute to the utility model. Waits if the limit is exceeded. Set to 0 to disable rate limiting.",
+            "type": "number",
+            "value": settings["util_model_rl_output"],
+        }
+    )
     util_model_fields.append(
         {
             "id": "embed_model_name",
             "title": "Embedding model name",
             "description": "Exact name of model from selected provider",
+            "type": "text",
             "value": settings["embed_model_name"],
         }
     )
     embed_model_fields.append(
         {
+            "id": "embed_model_rl_requests",
+            "title": "Requests per minute limit",
+            "description": "Limits the number of requests per minute to the embedding model. Waits if the limit is exceeded. Set to 0 to disable rate limiting.",
+            "type": "number",
+            "value": settings["embed_model_rl_requests"],
         }
     )
     embed_model_fields.append(
         {
+            "id": "embed_model_rl_input",
+            "title": "Input tokens per minute limit",
+            "description": "Limits the number of input tokens per minute to the embedding model. Waits if the limit is exceeded. Set to 0 to disable rate limiting.",
+            "type": "number",
+            "value": settings["embed_model_rl_input"],
         }
     )
             "id": "auth_login",
             "title": "UI Login",
             "description": "Set user name for web UI",
+            "type": "text",
             "value": dotenv.get_dotenv_value(dotenv.KEY_AUTH_LOGIN) or "",
         }
     )
         #         "id": "rfc_auto_docker",
         #         "title": "RFC Auto Docker Management",
         #         "description": "Automatically create dockerized instance of A0 for RFCs using this instance's code base and, settings and .env.",
+        #         "type": "text",
         #         "value": settings["rfc_auto_docker"],
         #     }
         # )
                 "id": "rfc_url",
                 "title": "RFC Destination URL",
                 "description": "URL of dockerized A0 instance for remote function calls. Do not specify port here.",
+                "type": "text",
                 "value": settings["rfc_url"],
             }
         )
                 "id": "rfc_port_http",
                 "title": "RFC HTTP port",
                 "description": "HTTP port for dockerized instance of A0.",
+                "type": "text",
                 "value": settings["rfc_port_http"],
             }
         )
                 "id": "rfc_port_ssh",
                 "title": "RFC SSH port",
                 "description": "SSH port for dockerized instance of A0.",
+                "type": "text",
                 "value": settings["rfc_port_ssh"],
             }
         )
             "id": "stt_language",
             "title": "Language Code",
             "description": "Language code (e.g. en, fr, it)",
+            "type": "text",
             "value": settings["stt_language"],
         }
     )
             "id": "stt_silence_duration",
             "title": "Silence duration (ms)",
             "description": "Duration of silence before the server considers speaking to have ended.",
+            "type": "text",
             "value": settings["stt_silence_duration"],
         }
     )
             "id": "stt_waiting_timeout",
             "title": "Waiting timeout (ms)",
             "description": "Duration before the server closes the microphone.",
+            "type": "text",
             "value": settings["stt_waiting_timeout"],
         }
     )
             try:
                 copy[key] = type(value)(copy[key])  # type: ignore
             except (ValueError, TypeError):
+                copy[key] = value # make default instead
     return copy
+# def get_chat_model(settings: Settings | None = None) -> BaseChatModel:
+#     if not settings:
+#         settings = get_settings()
+#     return get_model(
+#         type=ModelType.CHAT,
+#         provider=ModelProvider[settings["chat_model_provider"]],
+#         name=settings["chat_model_name"],
+#         temperature=settings["chat_model_temperature"],
+#         **settings["chat_model_kwargs"],
+#     )
+# def get_utility_model(settings: Settings | None = None) -> BaseChatModel:
+#     if not settings:
+#         settings = get_settings()
+#     return get_model(
+#         type=ModelType.CHAT,
+#         provider=ModelProvider[settings["util_model_provider"]],
+#         name=settings["util_model_name"],
+#         temperature=settings["util_model_temperature"],
+#         **settings["util_model_kwargs"],
+#     )
+# def get_embedding_model(settings: Settings | None = None) -> Embeddings:
+#     if not settings:
+#         settings = get_settings()
+#     return get_model(
+#         type=ModelType.EMBEDDING,
+#         provider=ModelProvider[settings["embed_model_provider"]],
+#         name=settings["embed_model_name"],
+#         **settings["embed_model_kwargs"],
+#     )
 def _read_settings_file() -> Settings | None:
 def get_default_settings() -> Settings:
+    from models import ModelProvider
     return Settings(
         chat_model_provider=ModelProvider.OPENAI.name,
         chat_model_name="gpt-4o-mini",
+        chat_model_temperature=0.0,
         chat_model_kwargs={},
+        chat_model_ctx_length=120000,
         chat_model_ctx_history=0.7,
+        chat_model_rl_requests=0,
+        chat_model_rl_input=0,
+        chat_model_rl_output=0,
         util_model_provider=ModelProvider.OPENAI.name,
         util_model_name="gpt-4o-mini",
+        util_model_temperature=0.0,
+        util_model_ctx_length=120000,
+        util_model_ctx_input=0.7,
         util_model_kwargs={},
+        util_model_rl_requests=60,
+        util_model_rl_input=0,
+        util_model_rl_output=0,
         embed_model_provider=ModelProvider.OPENAI.name,
         embed_model_name="text-embedding-3-small",
         embed_model_kwargs={},
+        embed_model_rl_requests=0,
+        embed_model_rl_input=0,
         api_keys={},
         auth_login="",
         auth_password="",

python/tools/behaviour_adjustment.py CHANGED Viewed

@@ -21,15 +21,15 @@ async def update_behaviour(agent: Agent, log_item: LogItem, adjustments: str):
     current_rules = read_rules(agent)
     # log query streamed by LLM
-    def log_callback(content):
         log_item.stream(ruleset=content)
     msg = agent.read_prompt("behaviour.merge.msg.md", current_rules=current_rules, adjustments=adjustments)
     # call util llm to find solutions in history
-    adjustments_merge = await agent.call_utility_llm(
         system=system,
-        msg=msg,
         callback=log_callback,
     )

     current_rules = read_rules(agent)
     # log query streamed by LLM
+    async def log_callback(content):
         log_item.stream(ruleset=content)
     msg = agent.read_prompt("behaviour.merge.msg.md", current_rules=current_rules, adjustments=adjustments)
     # call util llm to find solutions in history
+    adjustments_merge = await agent.call_utility_model(
         system=system,
+        message=msg,
         callback=log_callback,
     )

python/tools/response.py CHANGED Viewed

@@ -3,7 +3,6 @@ from python.helpers.tool import Tool, Response
 class ResponseTool(Tool):
     async def execute(self,**kwargs):
-        self.agent.set_data("timeout", self.agent.config.response_timeout_seconds)
         return Response(message=self.args["text"], break_loop=True)
     async def before_execution(self, **kwargs):

 class ResponseTool(Tool):
     async def execute(self,**kwargs):
         return Response(message=self.args["text"], break_loop=True)
     async def before_execution(self, **kwargs):

run_ui.py CHANGED Viewed

@@ -92,7 +92,7 @@ def run():
     server = None
-    def register_api_handler(app, handler):
         name = handler.__module__.split(".")[-1]
         instance = handler(app, lock)
         @requires_auth

     server = None
+    def register_api_handler(app, handler: type[ApiHandler]):
         name = handler.__module__.split(".")[-1]
         instance = handler(app, lock)
         @requires_auth

webui/css/settings.css CHANGED Viewed

@@ -42,6 +42,7 @@
 /* Input Styles */
 input[type="text"],
 input[type="password"],
 textarea,
 select {
   width: 100%;

 /* Input Styles */
 input[type="text"],
 input[type="password"],
+input[type="number"],
 textarea,
 select {
   width: 100%;

webui/index.html CHANGED Viewed

@@ -451,12 +451,21 @@
                                         <div class="field-control">
                                             <!-- Input field -->
-                                            <template x-if="field.type === 'input'">
                                                 <input type="text" :class="field.classes" :value="field.value"
                                                     :readonly="field.readonly === true"
                                                     @input="field.value = $event.target.value">
                                             </template>
                                             <!-- Password field -->
                                             <template x-if="field.type === 'password'">
                                                 <input type="password" :class="field.classes" :value="field.value"

                                         <div class="field-control">
                                             <!-- Input field -->
+                                            <template x-if="field.type === 'text'">
                                                 <input type="text" :class="field.classes" :value="field.value"
                                                     :readonly="field.readonly === true"
                                                     @input="field.value = $event.target.value">
                                             </template>
+                                            <!-- Number field -->
+                                            <template x-if="field.type === 'number'">
+                                                <input type="number" :class="field.classes" :value="field.value"
+                                                    :readonly="field.readonly === true"
+                                                    @input="field.value = $event.target.value"
+                                                    :min="field.min" :max="field.max" :step="field.step">
+                                            </template>
                                             <!-- Password field -->
                                             <template x-if="field.type === 'password'">
                                                 <input type="password" :class="field.classes" :value="field.value"