frdel commited on
Commit
082b100
·
1 Parent(s): 4c80980

Squashed commit of the following:

Browse files

commit 9d4e1b68b2ab41eefc534ef1f48953496d7d1cc6
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Sun Dec 15 18:01:51 2024 +0100

ctx window popup fix, default settings fix

commit 9ef32085651bc02610e1317b29f8d0b4913ae49f
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Sun Dec 15 14:55:53 2024 +0100

models, settings, initializer refactor

Rate limiter fix
Models initialized JIT
Model call wrappers for agent
Message compression fix
Log progress update
Settings frontend numeric fields

commit f7b3e2540c0ab798658cc6fed783942423d74df8
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Sun Dec 8 19:42:17 2024 +0100

knowledge import/reload

commit 4e028a3ce428cec581273a2261b6bdd8c474853e
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Sun Dec 8 11:28:47 2024 +0100

Memory recall speedup

commit 8ec3b24696e0346a3aea22d0f304ca297004bdcc
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Sun Dec 8 00:17:02 2024 +0100

keyboard input tool

commit a76a302f3f4e9fce9183aa0585595d6f18ae215a
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Sat Dec 7 23:28:03 2024 +0100

solutions cleanup

commit 884007cdb0064ef9d550dc99e6c3066719242da1
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Sat Dec 7 21:51:14 2024 +0100

console print edits for docker

commit 927c234d69312d57a90b03896bf9e62bf2583bd6
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Sat Dec 7 20:40:28 2024 +0100

openai azure model func name fix

commit 53a46288f9a72928ec902b36625080ad07ebe2a1
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Fri Dec 6 15:17:58 2024 +0100

mistral fix, error text output

commit 6aa37744fc9ca77271e33c195bf67a78dd7937a7
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Fri Dec 6 14:58:10 2024 +0100

toast fix

commit f0be03ea77c3d4ef72ec8180df87e0bfffb46198
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Fri Dec 6 14:33:28 2024 +0100

toast errors

commit 84346828128d230a39a14d4ad32d14dec9bea8b7
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Fri Dec 6 11:30:34 2024 +0100

warnings cleanup

commit 2b94af895d517e32932f7f71ffea277a63dce940
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Fri Dec 6 10:54:06 2024 +0100

Preload fix

commit 7f270d4a14032bbe05a8b22c873af0febfa78668
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Fri Dec 6 09:44:13 2024 +0100

Server startup log msg

commit f9c9b5c93369269dd5eb71d222d8918f2bec6715
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Fri Dec 6 07:50:15 2024 +0100

Update run_ui.py

commit f3ca7e0742b12a93a8d5cc6cce066d88ad56a63b
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Fri Dec 6 06:21:14 2024 +0100

Update run_ui.py

commit 21975c5a7cc7b3ad8b9ab95f940b5e6f6a743231
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Thu Dec 5 20:45:51 2024 +0100

local models docker url

commit f0a8b07c4fd2b1a5daaaf52142f194a8fdb8fcef
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Thu Dec 5 16:40:49 2024 +0100

Server addr notice

commit 656612726a3bbf01e4cd6ede01a60ce05b855c97
Merge: 49594fe 7c2866c
Author: Jan Tomášek <38891707+frdel@users.noreply.github.com>
Date: Thu Dec 5 16:11:23 2024 +0100

Merge pull request #260 from 3clyp50/development

fix: toast handling, mobile breakpoint

commit 7c2866ca614fff704b820e8f1ff6c9f50006320b
Author: Alessandro <155005371+3clyp50@users.noreply.github.com>
Date: Wed Dec 4 19:37:50 2024 +0100

fix: toast handling, mobile breakpoint

`toast.css` and `index.js`
- fixed toasts disappearing right after showing
- simplified toast animation

`index.css`
- set 2ⁿᵈ mobile breakpoint at 640px

commit 49594fe6ec2d32a1855a2ccbd9479d4fda347651
Merge: f697754 70b1fa3
Author: Jan Tomášek <38891707+frdel@users.noreply.github.com>
Date: Wed Dec 4 10:39:58 2024 +0100

Merge pull request #259 from 3clyp50/development

CSS refactor and toasts

commit 70b1fa385af8d86d1d5280a5b34e1a8a9abeb3cf
Author: Alessandro <155005371+3clyp50@users.noreply.github.com>
Date: Wed Dec 4 02:17:50 2024 +0100

refactor: css, style: toasts, fix: z-index

- organized structure
- consolidated selectors and states
- shorthand everywhere

- modern toasts
- bigger action buttons for mobile

commit f6977546c11b63e2e47dce8367cad8a6c62248fc
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Tue Dec 3 22:42:36 2024 +0100

call subordinate fix

commit fbe47ac03e56cfb005a1cd2b044c6305e27ca436
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Tue Dec 3 21:19:03 2024 +0100

Minor fixes

commit 961dbc405af8a784ecadcfcbcd7652d1f8d9be28
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Tue Dec 3 21:10:45 2024 +0100

restart

commit 357909c16a66c0e7ec78a2a993a9a4e54dd67bf9
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Tue Dec 3 19:41:29 2024 +0100

whisper remote preload

commit e0b0b6f6367841c85dfa9c2156f46db755c88497
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Tue Dec 3 17:39:56 2024 +0100

nudge

commit 9fae02b2a55bb1760fae926c2b40cd07ee26a61c
Merge: 0ebc142 fedf2d4
Author: Jan Tomášek <38891707+frdel@users.noreply.github.com>
Date: Tue Dec 3 14:57:18 2024 +0100

Merge pull request #256 from 3clyp50/development

feature: copy text button, nudge & fix: various styles

commit 0ebc142124fa3dcb370d95fd2a84bdba8f3145e8
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Tue Dec 3 14:56:33 2024 +0100

ssh connection retry

commit deae13d3834c7031a18cd30d5ee593f804b1b09a
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Tue Dec 3 14:38:57 2024 +0100

root pass fix

commit 9109fcbf60a8c9cc975c9e21306c619d81a2b43c
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Tue Dec 3 14:28:53 2024 +0100

root password change fix

commit 46689d6477d51966b9876b7d51b180e871569ebb
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Tue Dec 3 14:22:18 2024 +0100

RFC & SSH exchange for development

commit fedf2d4bdc6357f9e50e76b9202e06081c66db5e
Author: Alessandro <155005371+3clyp50@users.noreply.github.com>
Date: Tue Dec 3 04:03:14 2024 +0100

feature: copy text button, nudge & fix: various styles

- Copy button for all messages
- Nudge button front-end
- Fixed various non-styled light mode elements

to do -> css cleanup and whisper loading

commit 19f50d6d9509acdaea2a5ccd846b5de2722b4a07
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Sun Dec 1 20:50:17 2024 +0100

attachments, files, prompt extras, prompt caching, refactors, cleanups

commit c99b1a47d4f25d8184661a77418ebfafa5c00ee9
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Fri Nov 29 08:55:27 2024 +0100

Alpine fix version, STT fixes

commit 81e653ba2d710ad31e43d658738cf6a843461792
Merge: 857f8b6 89b8483
Author: Jan Tomášek <38891707+frdel@users.noreply.github.com>
Date: Thu Nov 28 23:08:09 2024 +0100

Merge pull request #255 from 3clyp50/development

feature: speech to text settings

commit 857f8b6d82ec6707f45c67fa7e2a360e535071b0
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Thu Nov 28 23:05:17 2024 +0100

download and remove folders in browser

commit 89b848312b6f553fe22a6c0039bc7ab93b716384
Author: Alessandro <155005371+3clyp50@users.noreply.github.com>
Date: Thu Nov 28 16:07:50 2024 +0100

feature: speech to text settings

- initial commit: voice settings

- Settings section for STT

commit b3a27bb442668e4a21e79be4ab96c73a09f2b864
Merge: 5e8d6b1 bb980ea
Author: Jan Tomášek <38891707+frdel@users.noreply.github.com>
Date: Thu Nov 28 08:39:01 2024 +0100

Merge pull request #254 from 3clyp50/development

fix: file browser bugs + final ui polishing

commit bb980ea6b93a074b24cf86c54b0be69596b34cb1
Author: Alessandro <155005371+3clyp50@users.noreply.github.com>
Date: Thu Nov 28 01:13:56 2024 +0100

fix: file browser deletion bug + parent directory

Underscore matters!
- fixed both bugs for the browser

Extra:
- style for toasts

quickfix generic modals

commit f0126a6ef87c43aa34e6fbc7595d89f10f6c3b27
Author: Alessandro <155005371+3clyp50@users.noreply.github.com>
Date: Wed Nov 27 23:44:20 2024 +0100

style: polishing and consistency

commit 5e8d6b1c7d3ec965eac864b2bb72c85360bae8c2
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Wed Nov 27 22:16:13 2024 +0100

Minor fixes

commit 184f8dcf53ec49733d20967246374f08469d7e84
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Wed Nov 27 22:05:23 2024 +0100

Pause button fix

commit 969f142af12c01abd9009a8e35e0cbbd225bca8d
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Wed Nov 27 22:01:06 2024 +0100

RFC fix, history bugfixes

commit 733b8de5163b3fc36c68df099a1860af210e6a1d
Merge: f2057d3 6a83e79
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Wed Nov 27 20:57:15 2024 +0100

Merge branch 'pr/253' into development

commit 6a83e79d5a42fb44bfcac88c5ade3fda85ba2b28
Author: Alessandro <real.eclypso@gmail.com>
Date: Wed Nov 27 20:41:53 2024 +0100

fix: bigger modals

commit f2057d390178a760b7a857f918a8bb4dee586194
Author: frdel <38891707+frdel@users.noreply.github.com>
Date: Wed Nov 27 17:30:19 2024 +0100

Squashed commit of the following:

commit e626817332661f48ec97da1d4ab42479ca40b50f
Author: Alessandro <155005371+3clyp50@users.noreply.github.com>
Date: Wed Nov 27 12:51:22 2024 +0100

refactor: modals css

Modals now get the base styles from modals.css, with any spec in the individual files (settings.css, file_manager.css, ecc).

commit 306db0ca395a9f5691e558c7a18a02c9cecabaa3
Author: Alessandro <155005371+3clyp50@users.noreply.github.com>
Date: Wed Nov 27 03:17:20 2024 +0100

style: new action buttons + ghost buttons

Updated styles for but

.vscode/settings.json CHANGED
@@ -1,3 +1,5 @@
1
  {
2
  "python.analysis.typeCheckingMode": "standard",
 
 
3
  }
 
1
  {
2
  "python.analysis.typeCheckingMode": "standard",
3
+ "windsurfPyright.analysis.diagnosticMode": "workspace",
4
+ "windsurfPyright.analysis.typeCheckingMode": "standard",
5
  }
agent.py CHANGED
@@ -2,8 +2,12 @@ import asyncio
2
  from collections import OrderedDict
3
  from dataclasses import dataclass, field
4
  import time, importlib, inspect, os, json
5
- from typing import Any, Optional, Dict, TypedDict
 
6
  import uuid
 
 
 
7
  from python.helpers import extract_tools, rate_limiter, files, errors, history, tokens
8
  from python.helpers.print_style import PrintStyle
9
  from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
@@ -131,19 +135,25 @@ class AgentContext:
131
  agent.handle_critical_exception(e)
132
 
133
 
 
 
 
 
 
 
 
 
 
 
 
134
  @dataclass
135
  class AgentConfig:
136
- chat_model: BaseChatModel | BaseLLM
137
- utility_model: BaseChatModel | BaseLLM
138
- embeddings_model: Embeddings
139
  prompts_subdir: str = ""
140
  memory_subdir: str = ""
141
  knowledge_subdirs: list[str] = field(default_factory=lambda: ["default", "custom"])
142
- rate_limit_seconds: int = 60
143
- rate_limit_requests: int = 15
144
- rate_limit_input_tokens: int = 0
145
- rate_limit_output_tokens: int = 0
146
- response_timeout_seconds: int = 60
147
  code_exec_docker_enabled: bool = False
148
  code_exec_docker_name: str = "A0-dev"
149
  code_exec_docker_image: str = "frdel/agent-zero-run:development"
@@ -222,13 +232,6 @@ class Agent:
222
  self.history = history.History(self)
223
  self.last_user_message: history.Message | None = None
224
  self.intervention: UserMessage | None = None
225
- self.rate_limiter = rate_limiter.RateLimiter(
226
- self.context.log,
227
- max_calls=self.config.rate_limit_requests,
228
- max_input_tokens=self.config.rate_limit_input_tokens,
229
- max_output_tokens=self.config.rate_limit_output_tokens,
230
- window_seconds=self.config.rate_limit_seconds,
231
- )
232
  self.data = {} # free data object all the tools can use
233
 
234
  async def monologue(self):
@@ -245,20 +248,11 @@ class Agent:
245
  while True:
246
 
247
  self.context.streaming_agent = self # mark self as current streamer
248
- agent_response = ""
249
  self.loop_data.iteration += 1
250
 
251
  try:
252
  # prepare LLM chain (model, system, history)
253
- chain, prompt = await self.prepare_chain(
254
- loop_data=self.loop_data
255
- )
256
-
257
- # rate limiter TODO - move to extension, make per-model
258
- formatted_inputs = prompt.format()
259
- self.set_data(self.DATA_NAME_CTX_WINDOW, formatted_inputs)
260
- token_count = tokens.approximate_tokens(formatted_inputs)
261
- self.rate_limiter.limit_call_and_input(token_count)
262
 
263
  # output that the agent is starting
264
  PrintStyle(
@@ -271,27 +265,18 @@ class Agent:
271
  type="agent", heading=f"{self.agent_name}: Generating"
272
  )
273
 
274
- async for chunk in chain.astream({}):
275
- # wait for intervention and handle it, if paused
276
- await self.handle_intervention(agent_response)
277
-
278
- if isinstance(chunk, str):
279
- content = chunk
280
- elif hasattr(chunk, "content"):
281
- content = str(chunk.content)
282
- else:
283
- content = str(chunk)
284
 
285
- if content:
286
- # output the agent response stream
287
- printer.stream(content)
288
- # concatenate stream into the response
289
- agent_response += content
290
- self.log_from_stream(agent_response, log)
291
 
292
- self.rate_limiter.set_output_tokens(
293
- int(len(agent_response) / 4)
294
- ) # rough estimation
295
 
296
  await self.handle_intervention(agent_response)
297
 
@@ -319,14 +304,14 @@ class Agent:
319
  # exceptions inside message loop:
320
  except InterventionException as e:
321
  pass # intervention message has been handled in handle_intervention(), proceed with conversation loop
322
- except (
323
- RepairableException
324
- ) as e: # Forward repairable errors to the LLM, maybe it can fix them
325
  error_message = errors.format_error(e)
326
  await self.hist_add_warning(error_message)
327
  PrintStyle(font_color="red", padding=True).print(error_message)
328
  self.context.log.log(type="error", content=error_message)
329
- except Exception as e: # Other exception kill the loop
 
330
  self.handle_critical_exception(e)
331
 
332
  finally:
@@ -345,7 +330,7 @@ class Agent:
345
  # call monologue_end extensions
346
  await self.call_extensions("monologue_end", loop_data=self.loop_data) # type: ignore
347
 
348
- async def prepare_chain(self, loop_data: LoopData):
349
  # set system prompt and message history
350
  loop_data.system = await self.get_system_prompt(self.loop_data)
351
  loop_data.history_output = self.history.output()
@@ -374,10 +359,7 @@ class Agent:
374
  *history_langchain,
375
  ]
376
  )
377
-
378
- # return callable chain
379
- chain = prompt | self.config.chat_model
380
- return chain, prompt
381
 
382
  def handle_critical_exception(self, exception: Exception):
383
  if isinstance(exception, HandledException):
@@ -498,39 +480,106 @@ class Agent:
498
  ): # TODO add param for message range, topic, history
499
  return self.history.output_text(human_label="user", ai_label="assistant")
500
 
501
- async def call_utility_llm(
502
- self, system: str, msg: str, callback: Callable[[str], None] | None = None
 
 
 
 
503
  ):
504
  prompt = ChatPromptTemplate.from_messages(
505
- [SystemMessage(content=system), HumanMessage(content=msg)]
506
  )
507
 
508
- chain = prompt | self.config.utility_model
509
  response = ""
510
 
511
- formatted_inputs = prompt.format()
512
- token_count = tokens.approximate_tokens(formatted_inputs)
513
- self.rate_limiter.limit_call_and_input(token_count)
 
 
 
 
 
 
 
 
 
514
 
515
- async for chunk in chain.astream({}):
516
  await self.handle_intervention() # wait for intervention and handle it, if paused
517
 
518
- if isinstance(chunk, str):
519
- content = chunk
520
- elif hasattr(chunk, "content"):
521
- content = str(chunk.content)
522
- else:
523
- content = str(chunk)
524
 
525
  if callback:
526
- callback(content)
527
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
528
  response += content
529
 
530
- self.rate_limiter.set_output_tokens(int(len(response) / 4))
 
531
 
532
  return response
533
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
534
  async def handle_intervention(self, progress: str = ""):
535
  while self.context.paused:
536
  await asyncio.sleep(0.1) # wait if paused
 
2
  from collections import OrderedDict
3
  from dataclasses import dataclass, field
4
  import time, importlib, inspect, os, json
5
+ import token
6
+ from typing import Any, Awaitable, Optional, Dict, TypedDict
7
  import uuid
8
+ import models
9
+
10
+ from langchain_core.prompt_values import ChatPromptValue
11
  from python.helpers import extract_tools, rate_limiter, files, errors, history, tokens
12
  from python.helpers.print_style import PrintStyle
13
  from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
 
135
  agent.handle_critical_exception(e)
136
 
137
 
138
+ @dataclass
139
+ class ModelConfig:
140
+ provider: models.ModelProvider
141
+ name: str
142
+ ctx_length: int
143
+ limit_requests: int
144
+ limit_input: int
145
+ limit_output: int
146
+ kwargs: dict
147
+
148
+
149
  @dataclass
150
  class AgentConfig:
151
+ chat_model: ModelConfig
152
+ utility_model: ModelConfig
153
+ embeddings_model: ModelConfig
154
  prompts_subdir: str = ""
155
  memory_subdir: str = ""
156
  knowledge_subdirs: list[str] = field(default_factory=lambda: ["default", "custom"])
 
 
 
 
 
157
  code_exec_docker_enabled: bool = False
158
  code_exec_docker_name: str = "A0-dev"
159
  code_exec_docker_image: str = "frdel/agent-zero-run:development"
 
232
  self.history = history.History(self)
233
  self.last_user_message: history.Message | None = None
234
  self.intervention: UserMessage | None = None
 
 
 
 
 
 
 
235
  self.data = {} # free data object all the tools can use
236
 
237
  async def monologue(self):
 
248
  while True:
249
 
250
  self.context.streaming_agent = self # mark self as current streamer
 
251
  self.loop_data.iteration += 1
252
 
253
  try:
254
  # prepare LLM chain (model, system, history)
255
+ prompt = await self.prepare_prompt(loop_data=self.loop_data)
 
 
 
 
 
 
 
 
256
 
257
  # output that the agent is starting
258
  PrintStyle(
 
265
  type="agent", heading=f"{self.agent_name}: Generating"
266
  )
267
 
268
+ async def stream_callback(chunk: str, full: str):
269
+ # output the agent response stream
270
+ if chunk:
271
+ printer.stream(chunk)
272
+ self.log_from_stream(full, log)
 
 
 
 
 
273
 
274
+ # store as last context window content
275
+ self.set_data(Agent.DATA_NAME_CTX_WINDOW, prompt.format())
 
 
 
 
276
 
277
+ agent_response = await self.call_chat_model(
278
+ prompt, callback=stream_callback
279
+ )
280
 
281
  await self.handle_intervention(agent_response)
282
 
 
304
  # exceptions inside message loop:
305
  except InterventionException as e:
306
  pass # intervention message has been handled in handle_intervention(), proceed with conversation loop
307
+ except RepairableException as e:
308
+ # Forward repairable errors to the LLM, maybe it can fix them
 
309
  error_message = errors.format_error(e)
310
  await self.hist_add_warning(error_message)
311
  PrintStyle(font_color="red", padding=True).print(error_message)
312
  self.context.log.log(type="error", content=error_message)
313
+ except Exception as e:
314
+ # Other exception kill the loop
315
  self.handle_critical_exception(e)
316
 
317
  finally:
 
330
  # call monologue_end extensions
331
  await self.call_extensions("monologue_end", loop_data=self.loop_data) # type: ignore
332
 
333
+ async def prepare_prompt(self, loop_data: LoopData) -> ChatPromptTemplate:
334
  # set system prompt and message history
335
  loop_data.system = await self.get_system_prompt(self.loop_data)
336
  loop_data.history_output = self.history.output()
 
359
  *history_langchain,
360
  ]
361
  )
362
+ return prompt
 
 
 
363
 
364
  def handle_critical_exception(self, exception: Exception):
365
  if isinstance(exception, HandledException):
 
480
  ): # TODO add param for message range, topic, history
481
  return self.history.output_text(human_label="user", ai_label="assistant")
482
 
483
+ async def call_utility_model(
484
+ self,
485
+ system: str,
486
+ message: str,
487
+ callback: Callable[[str], Awaitable[None]] | None = None,
488
+ background: bool = False,
489
  ):
490
  prompt = ChatPromptTemplate.from_messages(
491
+ [SystemMessage(content=system), HumanMessage(content=message)]
492
  )
493
 
 
494
  response = ""
495
 
496
+ # model class
497
+ model = models.get_model(
498
+ models.ModelType.CHAT,
499
+ self.config.utility_model.provider,
500
+ self.config.utility_model.name,
501
+ **self.config.utility_model.kwargs,
502
+ )
503
+
504
+ # rate limiter
505
+ limiter = await self.rate_limiter(
506
+ self.config.utility_model, prompt.format(), background
507
+ )
508
 
509
+ async for chunk in (prompt | model).astream({}):
510
  await self.handle_intervention() # wait for intervention and handle it, if paused
511
 
512
+ content = models.parse_chunk(chunk)
513
+ limiter.add(output=tokens.approximate_tokens(content))
514
+ response += content
 
 
 
515
 
516
  if callback:
517
+ await callback(content)
518
 
519
+ return response
520
+
521
+ async def call_chat_model(
522
+ self,
523
+ prompt: ChatPromptTemplate,
524
+ callback: Callable[[str, str], Awaitable[None]] | None = None,
525
+ ):
526
+ response = ""
527
+
528
+ # model class
529
+ model = models.get_model(
530
+ models.ModelType.CHAT,
531
+ self.config.chat_model.provider,
532
+ self.config.chat_model.name,
533
+ **self.config.chat_model.kwargs,
534
+ )
535
+
536
+ # rate limiter
537
+ limiter = await self.rate_limiter(self.config.chat_model, prompt.format())
538
+
539
+ async for chunk in (prompt | model).astream({}):
540
+ await self.handle_intervention() # wait for intervention and handle it, if paused
541
+
542
+ content = models.parse_chunk(chunk)
543
+ limiter.add(output=tokens.approximate_tokens(content))
544
  response += content
545
 
546
+ if callback:
547
+ await callback(content, response)
548
 
549
  return response
550
 
551
+ async def rate_limiter(
552
+ self, model_config: ModelConfig, input: str, background: bool = False
553
+ ):
554
+ # rate limiter log
555
+ wait_log = None
556
+
557
+ async def wait_callback(msg: str, key: str, total: int, limit: int):
558
+ nonlocal wait_log
559
+ if not wait_log:
560
+ wait_log = self.context.log.log(
561
+ type="util",
562
+ update_progress="none",
563
+ heading=msg,
564
+ model=f"{model_config.provider.value}\\{model_config.name}",
565
+ )
566
+ wait_log.update(heading=msg, key=key, value=total, limit=limit)
567
+ if not background:
568
+ self.context.log.set_progress(msg, -1)
569
+
570
+ # rate limiter
571
+ limiter = models.get_rate_limiter(
572
+ model_config.provider,
573
+ model_config.name,
574
+ model_config.limit_requests,
575
+ model_config.limit_input,
576
+ model_config.limit_output,
577
+ )
578
+ limiter.add(input=tokens.approximate_tokens(input))
579
+ limiter.add(requests=1)
580
+ await limiter.wait(callback=wait_callback)
581
+ return limiter
582
+
583
  async def handle_intervention(self, progress: str = ""):
584
  while self.context.paused:
585
  await asyncio.sleep(0.1) # wait if paused
initialize.py CHANGED
@@ -1,6 +1,6 @@
1
  import asyncio
2
  import models
3
- from agent import AgentConfig
4
  from python.helpers import dotenv, files, rfc_exchange, runtime, settings, docker, log
5
 
6
 
@@ -8,36 +8,45 @@ def initialize():
8
 
9
  current_settings = settings.get_settings()
10
 
11
- # main chat model used by agents (smarter, more accurate)
12
- # chat_llm = models.get_openai_chat(model_name="gpt-4o-mini", temperature=0)
13
- # chat_llm = models.get_ollama_chat(model_name="llama3.2:3b-instruct-fp16", temperature=0)
14
- # chat_llm = models.get_lmstudio_chat(model_name="lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF", temperature=0)
15
- # chat_llm = models.get_openrouter_chat(model_name="openai/o1-mini-2024-09-12")
16
- # chat_llm = models.get_azure_openai_chat(deployment_name="gpt-4o-mini", temperature=0)
17
- # chat_llm = models.get_anthropic_chat(model_name="claude-3-5-sonnet-20240620", temperature=0)
18
- # chat_llm = models.get_google_chat(model_name="gemini-1.5-flash", temperature=0)
19
- # chat_llm = models.get_mistral_chat(model_name="mistral-small-latest", temperature=0)
20
- # chat_llm = models.get_groq_chat(model_name="llama-3.2-90b-text-preview", temperature=0)
21
- # chat_llm = models.get_sambanova_chat(model_name="Meta-Llama-3.1-70B-Instruct-8k", temperature=0)
22
- chat_llm = settings.get_chat_model(
23
- current_settings
24
- ) # chat model from user settings
25
-
26
- # utility model used for helper functions (cheaper, faster)
27
- # utility_llm = chat_llm
28
- utility_llm = settings.get_utility_model(
29
- current_settings
30
- ) # utility model from user settings
31
-
32
- # embedding model used for memory
33
- # embedding_llm = models.get_openai_embedding(model_name="text-embedding-3-small")
34
- # embedding_llm = models.get_ollama_embedding(model_name="nomic-embed-text")
35
- # embedding_llm = models.get_huggingface_embedding(model_name="sentence-transformers/all-MiniLM-L6-v2")
36
- # embedding_llm = models.get_lmstudio_embedding(model_name="nomic-ai/nomic-embed-text-v1.5-GGUF")
37
- embedding_llm = settings.get_embedding_model(
38
- current_settings
39
- ) # embedding model from user settings
40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
  # agent configuration
42
  config = AgentConfig(
43
  chat_model=chat_llm,
@@ -46,12 +55,7 @@ def initialize():
46
  prompts_subdir=current_settings["agent_prompts_subdir"],
47
  memory_subdir=current_settings["agent_memory_subdir"],
48
  knowledge_subdirs=["default", current_settings["agent_knowledge_subdir"]],
49
- # rate_limit_seconds = 60,
50
- rate_limit_requests=30,
51
- # rate_limit_input_tokens = 0,
52
- # rate_limit_output_tokens = 0,
53
- # response_timeout_seconds = 60,
54
- code_exec_docker_enabled = False,
55
  # code_exec_docker_name = "A0-dev",
56
  # code_exec_docker_image = "frdel/agent-zero-run:development",
57
  # code_exec_docker_ports = { "22/tcp": 55022, "80/tcp": 55080 }
 
1
  import asyncio
2
  import models
3
+ from agent import AgentConfig, ModelConfig
4
  from python.helpers import dotenv, files, rfc_exchange, runtime, settings, docker, log
5
 
6
 
 
8
 
9
  current_settings = settings.get_settings()
10
 
11
+ # chat model from user settings
12
+ chat_llm = ModelConfig(
13
+ provider=models.ModelProvider[current_settings["chat_model_provider"]],
14
+ name=current_settings["chat_model_name"],
15
+ ctx_length=current_settings["chat_model_ctx_length"],
16
+ limit_requests=current_settings["chat_model_rl_requests"],
17
+ limit_input=current_settings["chat_model_rl_input"],
18
+ limit_output=current_settings["chat_model_rl_output"],
19
+ kwargs={
20
+ "temperature": current_settings["chat_model_temperature"],
21
+ **current_settings["chat_model_kwargs"],
22
+ },
23
+ )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
+ # utility model from user settings
26
+ utility_llm = ModelConfig(
27
+ provider=models.ModelProvider[current_settings["util_model_provider"]],
28
+ name=current_settings["util_model_name"],
29
+ ctx_length=current_settings["util_model_ctx_length"],
30
+ limit_requests=current_settings["util_model_rl_requests"],
31
+ limit_input=current_settings["util_model_rl_input"],
32
+ limit_output=current_settings["util_model_rl_output"],
33
+ kwargs={
34
+ "temperature": current_settings["util_model_temperature"],
35
+ **current_settings["util_model_kwargs"],
36
+ },
37
+ )
38
+ # embedding model from user settings
39
+ embedding_llm = ModelConfig(
40
+ provider=models.ModelProvider[current_settings["embed_model_provider"]],
41
+ name=current_settings["embed_model_name"],
42
+ ctx_length=0,
43
+ limit_requests=current_settings["embed_model_rl_requests"],
44
+ limit_input=0,
45
+ limit_output=0,
46
+ kwargs={
47
+ **current_settings["embed_model_kwargs"],
48
+ },
49
+ )
50
  # agent configuration
51
  config = AgentConfig(
52
  chat_model=chat_llm,
 
55
  prompts_subdir=current_settings["agent_prompts_subdir"],
56
  memory_subdir=current_settings["agent_memory_subdir"],
57
  knowledge_subdirs=["default", current_settings["agent_knowledge_subdir"]],
58
+ code_exec_docker_enabled=False,
 
 
 
 
 
59
  # code_exec_docker_name = "A0-dev",
60
  # code_exec_docker_image = "frdel/agent-zero-run:development",
61
  # code_exec_docker_ports = { "22/tcp": 55022, "80/tcp": 55080 }
models.py CHANGED
@@ -1,5 +1,6 @@
1
  from enum import Enum
2
  import os
 
3
  from langchain_openai import (
4
  ChatOpenAI,
5
  OpenAI,
@@ -28,6 +29,7 @@ from langchain_mistralai import ChatMistralAI
28
  from pydantic.v1.types import SecretStr
29
  from python.helpers import dotenv, runtime
30
  from python.helpers.dotenv import load_dotenv
 
31
 
32
  # environment variables
33
  load_dotenv()
@@ -56,6 +58,9 @@ class ModelProvider(Enum):
56
  OTHER = "Other"
57
 
58
 
 
 
 
59
  # Utility function to get API keys from environment variables
60
  def get_api_key(service):
61
  return (
@@ -71,11 +76,36 @@ def get_model(type: ModelType, provider: ModelProvider, name: str, **kwargs):
71
  return model
72
 
73
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
 
75
 
76
  # Ollama models
77
  def get_ollama_base_url():
78
- return dotenv.get_dotenv_value("OLLAMA_BASE_URL") or f"http://{runtime.get_local_url()}:11434"
 
 
 
 
79
 
80
  def get_ollama_chat(
81
  model_name: str,
@@ -138,7 +168,11 @@ def get_huggingface_embedding(model_name: str, **kwargs):
138
 
139
  # LM Studio and other OpenAI compatible interfaces
140
  def get_lmstudio_base_url():
141
- return dotenv.get_dotenv_value("LM_STUDIO_BASE_URL") or f"http://{runtime.get_local_url()}:1234/v1"
 
 
 
 
142
 
143
  def get_lmstudio_chat(
144
  model_name: str,
 
1
  from enum import Enum
2
  import os
3
+ from typing import Any
4
  from langchain_openai import (
5
  ChatOpenAI,
6
  OpenAI,
 
29
  from pydantic.v1.types import SecretStr
30
  from python.helpers import dotenv, runtime
31
  from python.helpers.dotenv import load_dotenv
32
+ from python.helpers.rate_limiter import RateLimiter
33
 
34
  # environment variables
35
  load_dotenv()
 
58
  OTHER = "Other"
59
 
60
 
61
+ rate_limiters: dict[str, RateLimiter] = {}
62
+
63
+
64
  # Utility function to get API keys from environment variables
65
  def get_api_key(service):
66
  return (
 
76
  return model
77
 
78
 
79
+ def get_rate_limiter(
80
+ provider: ModelProvider, name: str, requests: int, input: int, output: int
81
+ ) -> RateLimiter:
82
+ # get or create
83
+ key = f"{provider.name}\\{name}"
84
+ rate_limiters[key] = limiter = rate_limiters.get(key, RateLimiter(seconds=60))
85
+ # always update
86
+ limiter.limits["requests"] = requests or 0
87
+ limiter.limits["input"] = input or 0
88
+ limiter.limits["output"] = output or 0
89
+ return limiter
90
+
91
+
92
+ def parse_chunk(chunk: Any):
93
+ if isinstance(chunk, str):
94
+ content = chunk
95
+ elif hasattr(chunk, "content"):
96
+ content = str(chunk.content)
97
+ else:
98
+ content = str(chunk)
99
+ return content
100
 
101
 
102
  # Ollama models
103
  def get_ollama_base_url():
104
+ return (
105
+ dotenv.get_dotenv_value("OLLAMA_BASE_URL")
106
+ or f"http://{runtime.get_local_url()}:11434"
107
+ )
108
+
109
 
110
  def get_ollama_chat(
111
  model_name: str,
 
168
 
169
  # LM Studio and other OpenAI compatible interfaces
170
  def get_lmstudio_base_url():
171
+ return (
172
+ dotenv.get_dotenv_value("LM_STUDIO_BASE_URL")
173
+ or f"http://{runtime.get_local_url()}:1234/v1"
174
+ )
175
+
176
 
177
  def get_lmstudio_chat(
178
  model_name: str,
python/extensions/message_loop_prompts/_50_recall_memories.py CHANGED
@@ -53,13 +53,13 @@ class RecallMemories(Extension):
53
  )
54
 
55
  # log query streamed by LLM
56
- def log_callback(content):
57
  log_item.stream(query=content)
58
 
59
  # call util llm to summarize conversation
60
- query = await self.agent.call_utility_llm(
61
  system=system,
62
- msg=loop_data.user_message.output_text() if loop_data.user_message else "",
63
  callback=log_callback,
64
  )
65
 
 
53
  )
54
 
55
  # log query streamed by LLM
56
+ async def log_callback(content):
57
  log_item.stream(query=content)
58
 
59
  # call util llm to summarize conversation
60
+ query = await self.agent.call_utility_model(
61
  system=system,
62
+ message=loop_data.user_message.output_text() if loop_data.user_message else "",
63
  callback=log_callback,
64
  )
65
 
python/extensions/message_loop_prompts/_51_recall_solutions.py CHANGED
@@ -53,12 +53,12 @@ class RecallSolutions(Extension):
53
  )
54
 
55
  # log query streamed by LLM
56
- def log_callback(content):
57
  log_item.stream(query=content)
58
 
59
  # call util llm to summarize conversation
60
- query = await self.agent.call_utility_llm(
61
- system=system, msg=loop_data.user_message.output_text() if loop_data.user_message else "", callback=log_callback
62
  )
63
 
64
  # get solutions database
 
53
  )
54
 
55
  # log query streamed by LLM
56
+ async def log_callback(content):
57
  log_item.stream(query=content)
58
 
59
  # call util llm to summarize conversation
60
+ query = await self.agent.call_utility_model(
61
+ system=system, message=loop_data.user_message.output_text() if loop_data.user_message else "", callback=log_callback
62
  )
63
 
64
  # get solutions database
python/extensions/message_loop_prompts/_91_recall_wait.py CHANGED
@@ -9,11 +9,11 @@ class RecallWait(Extension):
9
 
10
  task = self.agent.get_data(DATA_NAME_TASK_MEMORIES)
11
  if task and not task.done():
12
- self.agent.context.log.set_progress("Recalling memories...")
13
  await task
14
 
15
  task = self.agent.get_data(DATA_NAME_TASK_SOLUTIONS)
16
  if task and not task.done():
17
- self.agent.context.log.set_progress("Recalling solutions...")
18
  await task
19
 
 
9
 
10
  task = self.agent.get_data(DATA_NAME_TASK_MEMORIES)
11
  if task and not task.done():
12
+ # self.agent.context.log.set_progress("Recalling memories...")
13
  await task
14
 
15
  task = self.agent.get_data(DATA_NAME_TASK_SOLUTIONS)
16
  if task and not task.done():
17
+ # self.agent.context.log.set_progress("Recalling solutions...")
18
  await task
19
 
python/extensions/monologue_end/_50_memorize_fragments.py CHANGED
@@ -35,14 +35,15 @@ class MemorizeMemories(Extension):
35
  msgs_text = self.agent.concat_messages(self.agent.history)
36
 
37
  # log query streamed by LLM
38
- def log_callback(content):
39
  log_item.stream(content=content)
40
 
41
  # call util llm to find info in history
42
- memories_json = await self.agent.call_utility_llm(
43
  system=system,
44
- msg=msgs_text,
45
  callback=log_callback,
 
46
  )
47
 
48
  memories = DirtyJson.parse_string(memories_json)
@@ -76,7 +77,7 @@ class MemorizeMemories(Extension):
76
  log_item.update(replaced=rem_txt)
77
 
78
  # insert new solution
79
- db.insert_text(text=txt, metadata={"area": Memory.Area.FRAGMENTS.value})
80
 
81
  log_item.update(
82
  result=f"{len(memories)} entries memorized.",
 
35
  msgs_text = self.agent.concat_messages(self.agent.history)
36
 
37
  # log query streamed by LLM
38
+ async def log_callback(content):
39
  log_item.stream(content=content)
40
 
41
  # call util llm to find info in history
42
+ memories_json = await self.agent.call_utility_model(
43
  system=system,
44
+ message=msgs_text,
45
  callback=log_callback,
46
+ background=True,
47
  )
48
 
49
  memories = DirtyJson.parse_string(memories_json)
 
77
  log_item.update(replaced=rem_txt)
78
 
79
  # insert new solution
80
+ await db.insert_text(text=txt, metadata={"area": Memory.Area.FRAGMENTS.value})
81
 
82
  log_item.update(
83
  result=f"{len(memories)} entries memorized.",
python/extensions/monologue_end/_51_memorize_solutions.py CHANGED
@@ -33,14 +33,15 @@ class MemorizeSolutions(Extension):
33
  msgs_text = self.agent.concat_messages(self.agent.history)
34
 
35
  # log query streamed by LLM
36
- def log_callback(content):
37
  log_item.stream(content=content)
38
 
39
  # call util llm to find solutions in history
40
- solutions_json = await self.agent.call_utility_llm(
41
  system=system,
42
- msg=msgs_text,
43
  callback=log_callback,
 
44
  )
45
 
46
  solutions = DirtyJson.parse_string(solutions_json)
@@ -75,7 +76,7 @@ class MemorizeSolutions(Extension):
75
  log_item.update(replaced=rem_txt)
76
 
77
  # insert new solution
78
- db.insert_text(text=txt, metadata={"area": Memory.Area.SOLUTIONS.value})
79
 
80
  solutions_txt = solutions_txt.strip()
81
  log_item.update(solutions=solutions_txt)
 
33
  msgs_text = self.agent.concat_messages(self.agent.history)
34
 
35
  # log query streamed by LLM
36
+ async def log_callback(content):
37
  log_item.stream(content=content)
38
 
39
  # call util llm to find solutions in history
40
+ solutions_json = await self.agent.call_utility_model(
41
  system=system,
42
+ message=msgs_text,
43
  callback=log_callback,
44
+ background=True,
45
  )
46
 
47
  solutions = DirtyJson.parse_string(solutions_json)
 
76
  log_item.update(replaced=rem_txt)
77
 
78
  # insert new solution
79
+ await db.insert_text(text=txt, metadata={"area": Memory.Area.SOLUTIONS.value})
80
 
81
  solutions_txt = solutions_txt.strip()
82
  log_item.update(solutions=solutions_txt)
python/helpers/history.py CHANGED
@@ -130,12 +130,12 @@ class Topic(Record):
130
  * LARGE_MESSAGE_TO_TOPIC_RATIO
131
  )
132
  large_msgs = []
133
- for m in self.messages:
134
  out = m.output()
135
  text = output_text(out)
136
  tok = tokens.approximate_tokens(text)
137
  leng = len(text)
138
- if leng > msg_max_size:
139
  large_msgs.append((m, tok, leng, out))
140
  large_msgs.sort(key=lambda x: x[1], reverse=True)
141
  for msg, tok, leng, out in large_msgs:
@@ -173,12 +173,11 @@ class Topic(Record):
173
 
174
  async def summarize_messages(self, messages: list[Message]):
175
  msg_txt = [m.output_text() for m in messages]
176
- summary = await call_llm.call_llm(
177
  system=self.history.agent.read_prompt("fw.topic_summary.sys.md"),
178
  message=self.history.agent.read_prompt(
179
  "fw.topic_summary.msg.md", content=msg_txt
180
  ),
181
- model=settings.get_utility_model(),
182
  )
183
  return summary
184
 
@@ -218,12 +217,11 @@ class Bulk(Record):
218
  return False
219
 
220
  async def summarize(self):
221
- self.summary = await call_llm.call_llm(
222
  system=self.history.agent.read_prompt("fw.topic_summary.sys.md"),
223
  message=self.history.agent.read_prompt(
224
  "fw.topic_summary.msg.md", content=self.output_text()
225
  ),
226
- model=settings.get_utility_model(),
227
  )
228
  return self.summary
229
 
@@ -309,42 +307,38 @@ class History(Record):
309
  return json.dumps(data)
310
 
311
  async def compress(self):
312
- curr, hist, bulk = (
313
- self.get_current_topic_tokens(),
314
- self.get_topics_tokens(),
315
- self.get_bulks_tokens(),
316
- )
317
- total = get_ctx_size_for_history()
318
  compressed = False
319
-
320
- # calculate ratios of individual parts
321
- ratios = [
322
- (curr, CURRENT_TOPIC_RATIO, "current_topic"),
323
- (hist, HISTORY_TOPIC_RATIO, "history_topic"),
324
- (bulk, HISTORY_BULK_RATIO, "history_bulk"),
325
- ]
326
- # start from the most oversized part and compress it
327
- ratios = sorted(ratios, key=lambda x: (x[0] / total) / x[1], reverse=True)
328
- for ratio in ratios:
329
- if ratio[0] > ratio[1] * total:
330
- over_part = ratio[2]
331
- if over_part == "current_topic":
332
- compressed = await self.current.compress()
333
- elif over_part == "history_topic":
334
- compressed = await self.compress_topics()
335
- else:
336
- compressed = await self.compress_bulks()
337
- # if part was compressed, stop the loop and try the whole function again, maybe no more compression is necessary
338
- if compressed:
339
- break
 
 
 
 
 
 
 
 
340
  else:
341
- break
342
-
343
- # try the whole function again to see if there is still a need for compression
344
- if compressed:
345
- await self.compress()
346
-
347
- return compressed
348
 
349
  async def compress_topics(self) -> bool:
350
  # summarize topics one by one
 
130
  * LARGE_MESSAGE_TO_TOPIC_RATIO
131
  )
132
  large_msgs = []
133
+ for m in (m for m in self.messages if not m.summary):
134
  out = m.output()
135
  text = output_text(out)
136
  tok = tokens.approximate_tokens(text)
137
  leng = len(text)
138
+ if tok > msg_max_size:
139
  large_msgs.append((m, tok, leng, out))
140
  large_msgs.sort(key=lambda x: x[1], reverse=True)
141
  for msg, tok, leng, out in large_msgs:
 
173
 
174
  async def summarize_messages(self, messages: list[Message]):
175
  msg_txt = [m.output_text() for m in messages]
176
+ summary = await self.history.agent.call_utility_model(
177
  system=self.history.agent.read_prompt("fw.topic_summary.sys.md"),
178
  message=self.history.agent.read_prompt(
179
  "fw.topic_summary.msg.md", content=msg_txt
180
  ),
 
181
  )
182
  return summary
183
 
 
217
  return False
218
 
219
  async def summarize(self):
220
+ self.summary = await self.history.agent.call_utility_model(
221
  system=self.history.agent.read_prompt("fw.topic_summary.sys.md"),
222
  message=self.history.agent.read_prompt(
223
  "fw.topic_summary.msg.md", content=self.output_text()
224
  ),
 
225
  )
226
  return self.summary
227
 
 
307
  return json.dumps(data)
308
 
309
  async def compress(self):
 
 
 
 
 
 
310
  compressed = False
311
+ while True:
312
+ curr, hist, bulk = (
313
+ self.get_current_topic_tokens(),
314
+ self.get_topics_tokens(),
315
+ self.get_bulks_tokens(),
316
+ )
317
+ total = get_ctx_size_for_history()
318
+ ratios = [
319
+ (curr, CURRENT_TOPIC_RATIO, "current_topic"),
320
+ (hist, HISTORY_TOPIC_RATIO, "history_topic"),
321
+ (bulk, HISTORY_BULK_RATIO, "history_bulk"),
322
+ ]
323
+ ratios = sorted(ratios, key=lambda x: (x[0] / total) / x[1], reverse=True)
324
+ compressed_part = False
325
+ for ratio in ratios:
326
+ if ratio[0] > ratio[1] * total:
327
+ over_part = ratio[2]
328
+ if over_part == "current_topic":
329
+ compressed_part = await self.current.compress()
330
+ elif over_part == "history_topic":
331
+ compressed_part = await self.compress_topics()
332
+ else:
333
+ compressed_part = await self.compress_bulks()
334
+ if compressed_part:
335
+ break
336
+
337
+ if compressed_part:
338
+ compressed = True
339
+ continue
340
  else:
341
+ return compressed
 
 
 
 
 
 
342
 
343
  async def compress_topics(self) -> bool:
344
  # summarize topics one by one
python/helpers/log.py CHANGED
@@ -19,6 +19,9 @@ Type = Literal[
19
  "warning",
20
  ]
21
 
 
 
 
22
  @dataclass
23
  class LogItem:
24
  log: "Log"
@@ -27,6 +30,7 @@ class LogItem:
27
  heading: str
28
  content: str
29
  temp: bool
 
30
  kvps: Optional[OrderedDict] = None # Use OrderedDict for kvps
31
  id: Optional[str] = None # Add id field
32
  guid: str = ""
@@ -41,20 +45,27 @@ class LogItem:
41
  content: str | None = None,
42
  kvps: dict | None = None,
43
  temp: bool | None = None,
 
44
  **kwargs,
45
  ):
46
  if self.guid == self.log.guid:
47
- self.log.update_item(
48
  self.no,
49
  type=type,
50
  heading=heading,
51
  content=content,
52
  kvps=kvps,
53
  temp=temp,
 
54
  **kwargs,
55
  )
56
 
57
- def stream(self, heading: str | None = None, content: str | None = None, **kwargs):
 
 
 
 
 
58
  if heading is not None:
59
  self.update(heading=self.heading + heading)
60
  if content is not None:
@@ -75,6 +86,7 @@ class LogItem:
75
  "kvps": self.kvps,
76
  }
77
 
 
78
  class Log:
79
 
80
  def __init__(self):
@@ -90,7 +102,9 @@ class Log:
90
  content: str | None = None,
91
  kvps: dict | None = None,
92
  temp: bool | None = None,
 
93
  id: Optional[str] = None, # Add id parameter
 
94
  ) -> LogItem:
95
  # Use OrderedDict if kvps is provided
96
  if kvps is not None:
@@ -101,17 +115,19 @@ class Log:
101
  type=type,
102
  heading=heading or "",
103
  content=content or "",
104
- kvps=kvps,
105
- temp=temp or False,
 
 
 
106
  id=id, # Pass id to LogItem
107
  )
108
  self.logs.append(item)
109
  self.updates += [item.no]
110
- if heading and item.no >= self.progress_no:
111
- self.set_progress(heading, item.no)
112
  return item
113
 
114
- def update_item(
115
  self,
116
  no: int,
117
  type: str | None = None,
@@ -119,15 +135,16 @@ class Log:
119
  content: str | None = None,
120
  kvps: dict | None = None,
121
  temp: bool | None = None,
 
122
  **kwargs,
123
  ):
124
  item = self.logs[no]
125
  if type is not None:
126
  item.type = type
 
 
127
  if heading is not None:
128
  item.heading = heading
129
- if no >= self.progress_no:
130
- self.set_progress(heading, no)
131
  if content is not None:
132
  item.content = content
133
  if kvps is not None:
@@ -143,6 +160,7 @@ class Log:
143
  item.kvps[k] = v
144
 
145
  self.updates += [item.no]
 
146
 
147
  def set_progress(self, progress: str, no: int = 0, active: bool = True):
148
  self.progress = progress
@@ -174,3 +192,12 @@ class Log:
174
  self.updates = []
175
  self.logs = []
176
  self.set_initial_progress()
 
 
 
 
 
 
 
 
 
 
19
  "warning",
20
  ]
21
 
22
+ ProgressUpdate = Literal["persistent", "temporary", "none"]
23
+
24
+
25
  @dataclass
26
  class LogItem:
27
  log: "Log"
 
30
  heading: str
31
  content: str
32
  temp: bool
33
+ update_progress: Optional[ProgressUpdate] = "persistent"
34
  kvps: Optional[OrderedDict] = None # Use OrderedDict for kvps
35
  id: Optional[str] = None # Add id field
36
  guid: str = ""
 
45
  content: str | None = None,
46
  kvps: dict | None = None,
47
  temp: bool | None = None,
48
+ update_progress: ProgressUpdate | None = None,
49
  **kwargs,
50
  ):
51
  if self.guid == self.log.guid:
52
+ self.log._update_item(
53
  self.no,
54
  type=type,
55
  heading=heading,
56
  content=content,
57
  kvps=kvps,
58
  temp=temp,
59
+ update_progress=update_progress,
60
  **kwargs,
61
  )
62
 
63
+ def stream(
64
+ self,
65
+ heading: str | None = None,
66
+ content: str | None = None,
67
+ **kwargs,
68
+ ):
69
  if heading is not None:
70
  self.update(heading=self.heading + heading)
71
  if content is not None:
 
86
  "kvps": self.kvps,
87
  }
88
 
89
+
90
  class Log:
91
 
92
  def __init__(self):
 
102
  content: str | None = None,
103
  kvps: dict | None = None,
104
  temp: bool | None = None,
105
+ update_progress: ProgressUpdate | None = None,
106
  id: Optional[str] = None, # Add id parameter
107
+ **kwargs,
108
  ) -> LogItem:
109
  # Use OrderedDict if kvps is provided
110
  if kvps is not None:
 
115
  type=type,
116
  heading=heading or "",
117
  content=content or "",
118
+ kvps=OrderedDict({**(kvps or {}), **(kwargs or {})}),
119
+ update_progress=(
120
+ update_progress if update_progress is not None else "persistent"
121
+ ),
122
+ temp=temp if temp is not None else False,
123
  id=id, # Pass id to LogItem
124
  )
125
  self.logs.append(item)
126
  self.updates += [item.no]
127
+ self._update_progress_from_item(item)
 
128
  return item
129
 
130
+ def _update_item(
131
  self,
132
  no: int,
133
  type: str | None = None,
 
135
  content: str | None = None,
136
  kvps: dict | None = None,
137
  temp: bool | None = None,
138
+ update_progress: ProgressUpdate | None = None,
139
  **kwargs,
140
  ):
141
  item = self.logs[no]
142
  if type is not None:
143
  item.type = type
144
+ if update_progress is not None:
145
+ item.update_progress = update_progress
146
  if heading is not None:
147
  item.heading = heading
 
 
148
  if content is not None:
149
  item.content = content
150
  if kvps is not None:
 
160
  item.kvps[k] = v
161
 
162
  self.updates += [item.no]
163
+ self._update_progress_from_item(item)
164
 
165
  def set_progress(self, progress: str, no: int = 0, active: bool = True):
166
  self.progress = progress
 
192
  self.updates = []
193
  self.logs = []
194
  self.set_initial_progress()
195
+
196
+ def _update_progress_from_item(self, item: LogItem):
197
+ if item.heading and item.update_progress != "none":
198
+ if item.no >= self.progress_no:
199
+ self.set_progress(
200
+ item.heading,
201
+ (item.no if item.update_progress == "persistent" else -1),
202
+ )
203
+
python/helpers/memory.py CHANGED
@@ -10,6 +10,8 @@ from langchain_community.docstore.in_memory import InMemoryDocstore
10
  from langchain_community.vectorstores.utils import (
11
  DistanceStrategy,
12
  )
 
 
13
  import os, json
14
 
15
  import numpy as np
@@ -22,6 +24,7 @@ from python.helpers import knowledge_import
22
  from python.helpers.log import Log, LogItem
23
  from enum import Enum
24
  from agent import Agent
 
25
 
26
 
27
  class MyFaiss(FAISS):
@@ -54,7 +57,12 @@ class Memory:
54
  )
55
  db = Memory.initialize(
56
  log_item,
57
- agent.config.embeddings_model,
 
 
 
 
 
58
  memory_subdir,
59
  False,
60
  )
@@ -82,7 +90,7 @@ class Memory:
82
  @staticmethod
83
  def initialize(
84
  log_item: LogItem | None,
85
- embeddings_model,
86
  memory_subdir: str,
87
  in_memory=False,
88
  ) -> MyFaiss:
@@ -187,7 +195,7 @@ class Memory:
187
  index[file]["ids"]
188
  ) # remove original version
189
  if index[file]["state"] == "changed":
190
- index[file]["ids"] = self.insert_documents(
191
  index[file]["documents"]
192
  ) # insert new version
193
 
@@ -234,6 +242,11 @@ class Memory:
234
  self, query: str, limit: int, threshold: float, filter: str = ""
235
  ):
236
  comparator = Memory._get_comparator(filter) if filter else None
 
 
 
 
 
237
  return await self.db.asearch(
238
  query,
239
  search_type="similarity_score_threshold",
@@ -287,30 +300,28 @@ class Memory:
287
  self._save_db() # persist
288
  return rem_docs
289
 
290
- def insert_text(self, text, metadata: dict = {}):
291
- id = str(uuid.uuid4())
292
- if not metadata.get("area", ""):
293
- metadata["area"] = Memory.Area.MAIN.value
294
 
295
- self.db.add_documents(
296
- documents=[
297
- Document(
298
- text,
299
- metadata={"id": id, "timestamp": self.get_timestamp(), **metadata},
300
- )
301
- ],
302
- ids=[id],
303
- )
304
- self._save_db() # persist
305
- return id
306
-
307
- def insert_documents(self, docs: list[Document]):
308
  ids = [str(uuid.uuid4()) for _ in range(len(docs))]
309
  timestamp = self.get_timestamp()
 
 
310
  if ids:
311
  for doc, id in zip(docs, ids):
312
  doc.metadata["id"] = id # add ids to documents metadata
313
  doc.metadata["timestamp"] = timestamp # add timestamp
 
 
 
 
 
 
 
 
314
  self.db.add_documents(documents=docs, ids=ids)
315
  self._save_db() # persist
316
  return ids
@@ -365,8 +376,9 @@ class Memory:
365
  def get_memory_subdir_abs(agent: Agent) -> str:
366
  return files.get_abs_path("memory", agent.config.memory_subdir or "default")
367
 
 
368
  def get_custom_knowledge_subdir_abs(agent: Agent) -> str:
369
  for dir in agent.config.knowledge_subdirs:
370
- if dir != "default":
371
  return files.get_abs_path("knowledge", dir)
372
  raise Exception("No custom knowledge subdir set")
 
10
  from langchain_community.vectorstores.utils import (
11
  DistanceStrategy,
12
  )
13
+ from langchain_core.embeddings import Embeddings
14
+
15
  import os, json
16
 
17
  import numpy as np
 
24
  from python.helpers.log import Log, LogItem
25
  from enum import Enum
26
  from agent import Agent
27
+ import models
28
 
29
 
30
  class MyFaiss(FAISS):
 
57
  )
58
  db = Memory.initialize(
59
  log_item,
60
+ models.get_model(
61
+ models.ModelType.EMBEDDING,
62
+ agent.config.embeddings_model.provider,
63
+ agent.config.embeddings_model.name,
64
+ **agent.config.embeddings_model.kwargs,
65
+ ),
66
  memory_subdir,
67
  False,
68
  )
 
90
  @staticmethod
91
  def initialize(
92
  log_item: LogItem | None,
93
+ embeddings_model: Embeddings,
94
  memory_subdir: str,
95
  in_memory=False,
96
  ) -> MyFaiss:
 
195
  index[file]["ids"]
196
  ) # remove original version
197
  if index[file]["state"] == "changed":
198
+ index[file]["ids"] = await self.insert_documents(
199
  index[file]["documents"]
200
  ) # insert new version
201
 
 
242
  self, query: str, limit: int, threshold: float, filter: str = ""
243
  ):
244
  comparator = Memory._get_comparator(filter) if filter else None
245
+
246
+ #rate limiter
247
+ await self.agent.rate_limiter(
248
+ model_config=self.agent.config.embeddings_model, input=query)
249
+
250
  return await self.db.asearch(
251
  query,
252
  search_type="similarity_score_threshold",
 
300
  self._save_db() # persist
301
  return rem_docs
302
 
303
+ async def insert_text(self, text, metadata: dict = {}):
304
+ doc = Document(text, metadata=metadata)
305
+ ids = await self.insert_documents([doc])
306
+ return ids[0]
307
 
308
+ async def insert_documents(self, docs: list[Document]):
 
 
 
 
 
 
 
 
 
 
 
 
309
  ids = [str(uuid.uuid4()) for _ in range(len(docs))]
310
  timestamp = self.get_timestamp()
311
+
312
+
313
  if ids:
314
  for doc, id in zip(docs, ids):
315
  doc.metadata["id"] = id # add ids to documents metadata
316
  doc.metadata["timestamp"] = timestamp # add timestamp
317
+ if not doc.metadata.get("area", ""):
318
+ doc.metadata["area"] = Memory.Area.MAIN.value
319
+
320
+ #rate limiter
321
+ docs_txt = "".join(self.format_docs_plain(docs))
322
+ await self.agent.rate_limiter(
323
+ model_config=self.agent.config.embeddings_model, input=docs_txt)
324
+
325
  self.db.add_documents(documents=docs, ids=ids)
326
  self._save_db() # persist
327
  return ids
 
376
  def get_memory_subdir_abs(agent: Agent) -> str:
377
  return files.get_abs_path("memory", agent.config.memory_subdir or "default")
378
 
379
+
380
  def get_custom_knowledge_subdir_abs(agent: Agent) -> str:
381
  for dir in agent.config.knowledge_subdirs:
382
+ if dir != "default":
383
  return files.get_abs_path("knowledge", dir)
384
  raise Exception("No custom knowledge subdir set")
python/helpers/persist_chat.py CHANGED
@@ -174,7 +174,7 @@ def _deserialize_log(data: dict[str, Any]) -> "Log":
174
  log.logs.append(
175
  LogItem(
176
  log=log, # restore the log reference
177
- no=item_data["no"],
178
  type=item_data["type"],
179
  heading=item_data.get("heading", ""),
180
  content=item_data.get("content", ""),
 
174
  log.logs.append(
175
  LogItem(
176
  log=log, # restore the log reference
177
+ no=i, #item_data["no"],
178
  type=item_data["type"],
179
  heading=item_data.get("heading", ""),
180
  content=item_data.get("content", ""),
python/helpers/rate_limiter.py CHANGED
@@ -1,68 +1,56 @@
 
1
  import time
2
- from collections import deque
3
- from dataclasses import dataclass
4
- from typing import List, Tuple
5
- from .print_style import PrintStyle
6
- from .log import Log
7
 
8
- @dataclass
9
- class CallRecord:
10
- timestamp: float
11
- input_tokens: int
12
- output_tokens: int = 0 # Default to 0, will be set separately
13
 
14
  class RateLimiter:
15
- def __init__(self, logger: Log, max_calls: int, max_input_tokens: int, max_output_tokens: int, window_seconds: int = 60):
16
- self.logger = logger
17
- self.max_calls = max_calls
18
- self.max_input_tokens = max_input_tokens
19
- self.max_output_tokens = max_output_tokens
20
- self.window_seconds = window_seconds
21
- self.call_records: deque = deque()
22
 
23
- def _clean_old_records(self, current_time: float):
24
- while self.call_records and current_time - self.call_records[0].timestamp > self.window_seconds:
25
- self.call_records.popleft()
 
 
 
26
 
27
- def _get_counts(self) -> Tuple[int, int, int]:
28
- calls = len(self.call_records)
29
- input_tokens = sum(record.input_tokens for record in self.call_records)
30
- output_tokens = sum(record.output_tokens for record in self.call_records)
31
- return calls, input_tokens, output_tokens
 
32
 
33
- def _wait_if_needed(self, current_time: float, new_input_tokens: int):
 
 
 
 
 
 
 
 
 
34
  while True:
35
- self._clean_old_records(current_time)
36
- calls, input_tokens, output_tokens = self._get_counts()
37
-
38
- wait_reasons = []
39
- if self.max_calls > 0 and calls >= self.max_calls:
40
- wait_reasons.append("max calls")
41
- if self.max_input_tokens > 0 and input_tokens + new_input_tokens > self.max_input_tokens:
42
- wait_reasons.append("max input tokens")
43
- if self.max_output_tokens > 0 and output_tokens >= self.max_output_tokens:
44
- wait_reasons.append("max output tokens")
45
-
46
- if not wait_reasons:
47
- break
48
-
49
- oldest_record = self.call_records[0]
50
- wait_time = oldest_record.timestamp + self.window_seconds - current_time
51
- if wait_time > 0:
52
- PrintStyle(font_color="yellow", padding=True).print(f"Rate limit exceeded. Waiting for {wait_time:.2f} seconds due to: {', '.join(wait_reasons)}")
53
- self.logger.log("rate_limit","Rate limit exceeded",f"Rate limit exceeded. Waiting for {wait_time:.2f} seconds due to: {', '.join(wait_reasons)}")
54
- # TODO rate limit log type
55
- time.sleep(wait_time)
56
- current_time = time.time()
57
 
58
- def limit_call_and_input(self, input_token_count: int) -> CallRecord:
59
- current_time = time.time()
60
- self._wait_if_needed(current_time, input_token_count)
61
- new_record = CallRecord(current_time, input_token_count)
62
- self.call_records.append(new_record)
63
- return new_record
 
 
 
 
64
 
65
- def set_output_tokens(self, output_token_count: int):
66
- if self.call_records:
67
- self.call_records[-1].output_tokens += output_token_count
68
- return self
 
1
+ import asyncio
2
  import time
3
+ from typing import Callable, Awaitable
 
 
 
 
4
 
 
 
 
 
 
5
 
6
  class RateLimiter:
7
+ def __init__(self, seconds: int = 60, **limits: int):
8
+ self.timeframe = seconds
9
+ self.limits = {key: value if isinstance(value, (int, float)) else 0 for key, value in (limits or {}).items()}
10
+ self.values = {key: [] for key in self.limits.keys()}
11
+ self._lock = asyncio.Lock()
 
 
12
 
13
+ def add(self, **kwargs: int):
14
+ now = time.time()
15
+ for key, value in kwargs.items():
16
+ if not key in self.values:
17
+ self.values[key] = []
18
+ self.values[key].append((now, value))
19
 
20
+ async def cleanup(self):
21
+ async with self._lock:
22
+ now = time.time()
23
+ cutoff = now - self.timeframe
24
+ for key in self.values:
25
+ self.values[key] = [(t, v) for t, v in self.values[key] if t > cutoff]
26
 
27
+ async def get_total(self, key: str) -> int:
28
+ async with self._lock:
29
+ if not key in self.values:
30
+ return 0
31
+ return sum(value for _, value in self.values[key])
32
+
33
+ async def wait(
34
+ self,
35
+ callback: Callable[[str, str, int, int], Awaitable[None]] | None = None,
36
+ ):
37
  while True:
38
+ await self.cleanup()
39
+ should_wait = False
40
+
41
+ for key, limit in self.limits.items():
42
+ if limit <= 0: # Skip if no limit set
43
+ continue
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
 
45
+ total = await self.get_total(key)
46
+ if total > limit:
47
+ if callback:
48
+ msg = f"Rate limit exceeded for {key} ({total}/{limit}), waiting..."
49
+ await callback(msg, key, total, limit)
50
+ should_wait = True
51
+ break
52
+
53
+ if not should_wait:
54
+ break
55
 
56
+ await asyncio.sleep(1)
 
 
 
python/helpers/settings.py CHANGED
@@ -8,10 +8,6 @@ from typing import Any, Literal, TypedDict
8
  import models
9
  from python.helpers import runtime, whisper, defer
10
  from . import files, dotenv
11
- from models import get_model, ModelProvider, ModelType
12
- from langchain_core.language_models.chat_models import BaseChatModel
13
- from langchain_core.embeddings import Embeddings
14
-
15
 
16
  class Settings(TypedDict):
17
  chat_model_provider: str
@@ -20,15 +16,26 @@ class Settings(TypedDict):
20
  chat_model_kwargs: dict[str, str]
21
  chat_model_ctx_length: int
22
  chat_model_ctx_history: float
 
 
 
23
 
24
  util_model_provider: str
25
  util_model_name: str
26
  util_model_temperature: float
27
  util_model_kwargs: dict[str, str]
 
 
 
 
 
28
 
 
29
  embed_model_provider: str
30
  embed_model_name: str
31
  embed_model_kwargs: dict[str, str]
 
 
32
 
33
  agent_prompts_subdir: str
34
  agent_memory_subdir: str
@@ -66,7 +73,7 @@ class SettingsField(TypedDict, total=False):
66
  id: str
67
  title: str
68
  description: str
69
- type: Literal["input", "select", "range", "textarea", "password"]
70
  value: Any
71
  min: float
72
  max: float
@@ -91,6 +98,8 @@ _settings: Settings | None = None
91
 
92
 
93
  def convert_out(settings: Settings) -> SettingsOutput:
 
 
94
 
95
  # main model section
96
  chat_model_fields: list[SettingsField] = []
@@ -109,7 +118,7 @@ def convert_out(settings: Settings) -> SettingsOutput:
109
  "id": "chat_model_name",
110
  "title": "Chat model name",
111
  "description": "Exact name of model from selected provider",
112
- "type": "input",
113
  "value": settings["chat_model_name"],
114
  }
115
  )
@@ -132,7 +141,7 @@ def convert_out(settings: Settings) -> SettingsOutput:
132
  "id": "chat_model_ctx_length",
133
  "title": "Chat model context length",
134
  "description": "Maximum number of tokens in the context window for LLM. System prompt, chat history, RAG and response all count towards this limit.",
135
- "type": "input",
136
  "value": settings["chat_model_ctx_length"],
137
  }
138
  )
@@ -150,6 +159,36 @@ def convert_out(settings: Settings) -> SettingsOutput:
150
  }
151
  )
152
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
153
  chat_model_fields.append(
154
  {
155
  "id": "chat_model_kwargs",
@@ -183,7 +222,7 @@ def convert_out(settings: Settings) -> SettingsOutput:
183
  "id": "util_model_name",
184
  "title": "Utility model name",
185
  "description": "Exact name of model from selected provider",
186
- "type": "input",
187
  "value": settings["util_model_name"],
188
  }
189
  )
@@ -200,6 +239,58 @@ def convert_out(settings: Settings) -> SettingsOutput:
200
  "value": settings["util_model_temperature"],
201
  }
202
  )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
203
 
204
  util_model_fields.append(
205
  {
@@ -234,46 +325,28 @@ def convert_out(settings: Settings) -> SettingsOutput:
234
  "id": "embed_model_name",
235
  "title": "Embedding model name",
236
  "description": "Exact name of model from selected provider",
237
- "type": "input",
238
  "value": settings["embed_model_name"],
239
  }
240
  )
241
-
242
  embed_model_fields.append(
243
  {
244
- "id": "embed_model_kwargs",
245
- "title": "Embedding model additional parameters",
246
- "description": "Any other parameters supported by the model. Format is KEY=VALUE on individual lines, just like .env file.",
247
- "type": "textarea",
248
- "value": _dict_to_env(settings["embed_model_kwargs"]),
249
  }
250
  )
251
 
252
- embed_model_section: SettingsSection = {
253
- "title": "Embedding Model",
254
- "description": "Settings for the embedding model used by Agent Zero.",
255
- "fields": embed_model_fields,
256
- }
257
-
258
- # embedding model section
259
- embed_model_fields: list[SettingsField] = []
260
- embed_model_fields.append(
261
- {
262
- "id": "embed_model_provider",
263
- "title": "Embedding model provider",
264
- "description": "Select provider for embedding model used by the framework",
265
- "type": "select",
266
- "value": settings["embed_model_provider"],
267
- "options": [{"value": p.name, "label": p.value} for p in ModelProvider],
268
- }
269
- )
270
  embed_model_fields.append(
271
  {
272
- "id": "embed_model_name",
273
- "title": "Embedding model name",
274
- "description": "Exact name of model from selected provider",
275
- "type": "input",
276
- "value": settings["embed_model_name"],
277
  }
278
  )
279
 
@@ -301,7 +374,7 @@ def convert_out(settings: Settings) -> SettingsOutput:
301
  "id": "auth_login",
302
  "title": "UI Login",
303
  "description": "Set user name for web UI",
304
- "type": "input",
305
  "value": dotenv.get_dotenv_value(dotenv.KEY_AUTH_LOGIN) or "",
306
  }
307
  )
@@ -423,7 +496,7 @@ def convert_out(settings: Settings) -> SettingsOutput:
423
  # "id": "rfc_auto_docker",
424
  # "title": "RFC Auto Docker Management",
425
  # "description": "Automatically create dockerized instance of A0 for RFCs using this instance's code base and, settings and .env.",
426
- # "type": "input",
427
  # "value": settings["rfc_auto_docker"],
428
  # }
429
  # )
@@ -433,7 +506,7 @@ def convert_out(settings: Settings) -> SettingsOutput:
433
  "id": "rfc_url",
434
  "title": "RFC Destination URL",
435
  "description": "URL of dockerized A0 instance for remote function calls. Do not specify port here.",
436
- "type": "input",
437
  "value": settings["rfc_url"],
438
  }
439
  )
@@ -458,7 +531,7 @@ def convert_out(settings: Settings) -> SettingsOutput:
458
  "id": "rfc_port_http",
459
  "title": "RFC HTTP port",
460
  "description": "HTTP port for dockerized instance of A0.",
461
- "type": "input",
462
  "value": settings["rfc_port_http"],
463
  }
464
  )
@@ -468,7 +541,7 @@ def convert_out(settings: Settings) -> SettingsOutput:
468
  "id": "rfc_port_ssh",
469
  "title": "RFC SSH port",
470
  "description": "SSH port for dockerized instance of A0.",
471
- "type": "input",
472
  "value": settings["rfc_port_ssh"],
473
  }
474
  )
@@ -505,7 +578,7 @@ def convert_out(settings: Settings) -> SettingsOutput:
505
  "id": "stt_language",
506
  "title": "Language Code",
507
  "description": "Language code (e.g. en, fr, it)",
508
- "type": "input",
509
  "value": settings["stt_language"],
510
  }
511
  )
@@ -528,7 +601,7 @@ def convert_out(settings: Settings) -> SettingsOutput:
528
  "id": "stt_silence_duration",
529
  "title": "Silence duration (ms)",
530
  "description": "Duration of silence before the server considers speaking to have ended.",
531
- "type": "input",
532
  "value": settings["stt_silence_duration"],
533
  }
534
  )
@@ -538,7 +611,7 @@ def convert_out(settings: Settings) -> SettingsOutput:
538
  "id": "stt_waiting_timeout",
539
  "title": "Waiting timeout (ms)",
540
  "description": "Duration before the server closes the microphone.",
541
- "type": "input",
542
  "value": settings["stt_waiting_timeout"],
543
  }
544
  )
@@ -617,43 +690,43 @@ def normalize_settings(settings: Settings) -> Settings:
617
  try:
618
  copy[key] = type(value)(copy[key]) # type: ignore
619
  except (ValueError, TypeError):
620
- pass
621
  return copy
622
 
623
 
624
- def get_chat_model(settings: Settings | None = None) -> BaseChatModel:
625
- if not settings:
626
- settings = get_settings()
627
- return get_model(
628
- type=ModelType.CHAT,
629
- provider=ModelProvider[settings["chat_model_provider"]],
630
- name=settings["chat_model_name"],
631
- temperature=settings["chat_model_temperature"],
632
- **settings["chat_model_kwargs"],
633
- )
634
-
635
-
636
- def get_utility_model(settings: Settings | None = None) -> BaseChatModel:
637
- if not settings:
638
- settings = get_settings()
639
- return get_model(
640
- type=ModelType.CHAT,
641
- provider=ModelProvider[settings["util_model_provider"]],
642
- name=settings["util_model_name"],
643
- temperature=settings["util_model_temperature"],
644
- **settings["util_model_kwargs"],
645
- )
646
-
647
-
648
- def get_embedding_model(settings: Settings | None = None) -> Embeddings:
649
- if not settings:
650
- settings = get_settings()
651
- return get_model(
652
- type=ModelType.EMBEDDING,
653
- provider=ModelProvider[settings["embed_model_provider"]],
654
- name=settings["embed_model_name"],
655
- **settings["embed_model_kwargs"],
656
- )
657
 
658
 
659
  def _read_settings_file() -> Settings | None:
@@ -697,20 +770,32 @@ def _write_sensitive_settings(settings: Settings):
697
 
698
 
699
  def get_default_settings() -> Settings:
 
 
700
  return Settings(
701
  chat_model_provider=ModelProvider.OPENAI.name,
702
  chat_model_name="gpt-4o-mini",
703
- chat_model_temperature=0,
704
  chat_model_kwargs={},
705
- chat_model_ctx_length=8192,
706
  chat_model_ctx_history=0.7,
 
 
 
707
  util_model_provider=ModelProvider.OPENAI.name,
708
  util_model_name="gpt-4o-mini",
709
- util_model_temperature=0,
 
 
710
  util_model_kwargs={},
 
 
 
711
  embed_model_provider=ModelProvider.OPENAI.name,
712
  embed_model_name="text-embedding-3-small",
713
  embed_model_kwargs={},
 
 
714
  api_keys={},
715
  auth_login="",
716
  auth_password="",
 
8
  import models
9
  from python.helpers import runtime, whisper, defer
10
  from . import files, dotenv
 
 
 
 
11
 
12
  class Settings(TypedDict):
13
  chat_model_provider: str
 
16
  chat_model_kwargs: dict[str, str]
17
  chat_model_ctx_length: int
18
  chat_model_ctx_history: float
19
+ chat_model_rl_requests: int
20
+ chat_model_rl_input: int
21
+ chat_model_rl_output: int
22
 
23
  util_model_provider: str
24
  util_model_name: str
25
  util_model_temperature: float
26
  util_model_kwargs: dict[str, str]
27
+ util_model_ctx_length: int
28
+ util_model_ctx_input: float
29
+ util_model_rl_requests: int
30
+ util_model_rl_input: int
31
+ util_model_rl_output: int
32
 
33
+
34
  embed_model_provider: str
35
  embed_model_name: str
36
  embed_model_kwargs: dict[str, str]
37
+ embed_model_rl_requests: int
38
+ embed_model_rl_input: int
39
 
40
  agent_prompts_subdir: str
41
  agent_memory_subdir: str
 
73
  id: str
74
  title: str
75
  description: str
76
+ type: Literal["text", "number", "select", "range", "textarea", "password"]
77
  value: Any
78
  min: float
79
  max: float
 
98
 
99
 
100
  def convert_out(settings: Settings) -> SettingsOutput:
101
+ from models import ModelProvider
102
+
103
 
104
  # main model section
105
  chat_model_fields: list[SettingsField] = []
 
118
  "id": "chat_model_name",
119
  "title": "Chat model name",
120
  "description": "Exact name of model from selected provider",
121
+ "type": "text",
122
  "value": settings["chat_model_name"],
123
  }
124
  )
 
141
  "id": "chat_model_ctx_length",
142
  "title": "Chat model context length",
143
  "description": "Maximum number of tokens in the context window for LLM. System prompt, chat history, RAG and response all count towards this limit.",
144
+ "type": "number",
145
  "value": settings["chat_model_ctx_length"],
146
  }
147
  )
 
159
  }
160
  )
161
 
162
+ chat_model_fields.append(
163
+ {
164
+ "id": "chat_model_rl_requests",
165
+ "title": "Requests per minute limit",
166
+ "description": "Limits the number of requests per minute to the chat model. Waits if the limit is exceeded. Set to 0 to disable rate limiting.",
167
+ "type": "number",
168
+ "value": settings["chat_model_rl_requests"],
169
+ }
170
+ )
171
+
172
+ chat_model_fields.append(
173
+ {
174
+ "id": "chat_model_rl_input",
175
+ "title": "Input tokens per minute limit",
176
+ "description": "Limits the number of input tokens per minute to the chat model. Waits if the limit is exceeded. Set to 0 to disable rate limiting.",
177
+ "type": "number",
178
+ "value": settings["chat_model_rl_input"],
179
+ }
180
+ )
181
+
182
+ chat_model_fields.append(
183
+ {
184
+ "id": "chat_model_rl_output",
185
+ "title": "Output tokens per minute limit",
186
+ "description": "Limits the number of output tokens per minute to the chat model. Waits if the limit is exceeded. Set to 0 to disable rate limiting.",
187
+ "type": "number",
188
+ "value": settings["chat_model_rl_output"],
189
+ }
190
+ )
191
+
192
  chat_model_fields.append(
193
  {
194
  "id": "chat_model_kwargs",
 
222
  "id": "util_model_name",
223
  "title": "Utility model name",
224
  "description": "Exact name of model from selected provider",
225
+ "type": "text",
226
  "value": settings["util_model_name"],
227
  }
228
  )
 
239
  "value": settings["util_model_temperature"],
240
  }
241
  )
242
+
243
+ # util_model_fields.append(
244
+ # {
245
+ # "id": "util_model_ctx_length",
246
+ # "title": "Utility model context length",
247
+ # "description": "Maximum number of tokens in the context window for LLM. System prompt, message and response all count towards this limit.",
248
+ # "type": "number",
249
+ # "value": settings["util_model_ctx_length"],
250
+ # }
251
+ # )
252
+ # util_model_fields.append(
253
+ # {
254
+ # "id": "util_model_ctx_input",
255
+ # "title": "Context window space for input tokens",
256
+ # "description": "Portion of context window dedicated to input tokens. The remaining space can be filled with response.",
257
+ # "type": "range",
258
+ # "min": 0.01,
259
+ # "max": 1,
260
+ # "step": 0.01,
261
+ # "value": settings["util_model_ctx_input"],
262
+ # }
263
+ # )
264
+
265
+ util_model_fields.append(
266
+ {
267
+ "id": "util_model_rl_requests",
268
+ "title": "Requests per minute limit",
269
+ "description": "Limits the number of requests per minute to the utility model. Waits if the limit is exceeded. Set to 0 to disable rate limiting.",
270
+ "type": "number",
271
+ "value": settings["util_model_rl_requests"],
272
+ }
273
+ )
274
+
275
+ util_model_fields.append(
276
+ {
277
+ "id": "util_model_rl_input",
278
+ "title": "Input tokens per minute limit",
279
+ "description": "Limits the number of input tokens per minute to the utility model. Waits if the limit is exceeded. Set to 0 to disable rate limiting.",
280
+ "type": "number",
281
+ "value": settings["util_model_rl_input"],
282
+ }
283
+ )
284
+
285
+ util_model_fields.append(
286
+ {
287
+ "id": "util_model_rl_output",
288
+ "title": "Output tokens per minute limit",
289
+ "description": "Limits the number of output tokens per minute to the utility model. Waits if the limit is exceeded. Set to 0 to disable rate limiting.",
290
+ "type": "number",
291
+ "value": settings["util_model_rl_output"],
292
+ }
293
+ )
294
 
295
  util_model_fields.append(
296
  {
 
325
  "id": "embed_model_name",
326
  "title": "Embedding model name",
327
  "description": "Exact name of model from selected provider",
328
+ "type": "text",
329
  "value": settings["embed_model_name"],
330
  }
331
  )
332
+
333
  embed_model_fields.append(
334
  {
335
+ "id": "embed_model_rl_requests",
336
+ "title": "Requests per minute limit",
337
+ "description": "Limits the number of requests per minute to the embedding model. Waits if the limit is exceeded. Set to 0 to disable rate limiting.",
338
+ "type": "number",
339
+ "value": settings["embed_model_rl_requests"],
340
  }
341
  )
342
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
343
  embed_model_fields.append(
344
  {
345
+ "id": "embed_model_rl_input",
346
+ "title": "Input tokens per minute limit",
347
+ "description": "Limits the number of input tokens per minute to the embedding model. Waits if the limit is exceeded. Set to 0 to disable rate limiting.",
348
+ "type": "number",
349
+ "value": settings["embed_model_rl_input"],
350
  }
351
  )
352
 
 
374
  "id": "auth_login",
375
  "title": "UI Login",
376
  "description": "Set user name for web UI",
377
+ "type": "text",
378
  "value": dotenv.get_dotenv_value(dotenv.KEY_AUTH_LOGIN) or "",
379
  }
380
  )
 
496
  # "id": "rfc_auto_docker",
497
  # "title": "RFC Auto Docker Management",
498
  # "description": "Automatically create dockerized instance of A0 for RFCs using this instance's code base and, settings and .env.",
499
+ # "type": "text",
500
  # "value": settings["rfc_auto_docker"],
501
  # }
502
  # )
 
506
  "id": "rfc_url",
507
  "title": "RFC Destination URL",
508
  "description": "URL of dockerized A0 instance for remote function calls. Do not specify port here.",
509
+ "type": "text",
510
  "value": settings["rfc_url"],
511
  }
512
  )
 
531
  "id": "rfc_port_http",
532
  "title": "RFC HTTP port",
533
  "description": "HTTP port for dockerized instance of A0.",
534
+ "type": "text",
535
  "value": settings["rfc_port_http"],
536
  }
537
  )
 
541
  "id": "rfc_port_ssh",
542
  "title": "RFC SSH port",
543
  "description": "SSH port for dockerized instance of A0.",
544
+ "type": "text",
545
  "value": settings["rfc_port_ssh"],
546
  }
547
  )
 
578
  "id": "stt_language",
579
  "title": "Language Code",
580
  "description": "Language code (e.g. en, fr, it)",
581
+ "type": "text",
582
  "value": settings["stt_language"],
583
  }
584
  )
 
601
  "id": "stt_silence_duration",
602
  "title": "Silence duration (ms)",
603
  "description": "Duration of silence before the server considers speaking to have ended.",
604
+ "type": "text",
605
  "value": settings["stt_silence_duration"],
606
  }
607
  )
 
611
  "id": "stt_waiting_timeout",
612
  "title": "Waiting timeout (ms)",
613
  "description": "Duration before the server closes the microphone.",
614
+ "type": "text",
615
  "value": settings["stt_waiting_timeout"],
616
  }
617
  )
 
690
  try:
691
  copy[key] = type(value)(copy[key]) # type: ignore
692
  except (ValueError, TypeError):
693
+ copy[key] = value # make default instead
694
  return copy
695
 
696
 
697
+ # def get_chat_model(settings: Settings | None = None) -> BaseChatModel:
698
+ # if not settings:
699
+ # settings = get_settings()
700
+ # return get_model(
701
+ # type=ModelType.CHAT,
702
+ # provider=ModelProvider[settings["chat_model_provider"]],
703
+ # name=settings["chat_model_name"],
704
+ # temperature=settings["chat_model_temperature"],
705
+ # **settings["chat_model_kwargs"],
706
+ # )
707
+
708
+
709
+ # def get_utility_model(settings: Settings | None = None) -> BaseChatModel:
710
+ # if not settings:
711
+ # settings = get_settings()
712
+ # return get_model(
713
+ # type=ModelType.CHAT,
714
+ # provider=ModelProvider[settings["util_model_provider"]],
715
+ # name=settings["util_model_name"],
716
+ # temperature=settings["util_model_temperature"],
717
+ # **settings["util_model_kwargs"],
718
+ # )
719
+
720
+
721
+ # def get_embedding_model(settings: Settings | None = None) -> Embeddings:
722
+ # if not settings:
723
+ # settings = get_settings()
724
+ # return get_model(
725
+ # type=ModelType.EMBEDDING,
726
+ # provider=ModelProvider[settings["embed_model_provider"]],
727
+ # name=settings["embed_model_name"],
728
+ # **settings["embed_model_kwargs"],
729
+ # )
730
 
731
 
732
  def _read_settings_file() -> Settings | None:
 
770
 
771
 
772
  def get_default_settings() -> Settings:
773
+ from models import ModelProvider
774
+
775
  return Settings(
776
  chat_model_provider=ModelProvider.OPENAI.name,
777
  chat_model_name="gpt-4o-mini",
778
+ chat_model_temperature=0.0,
779
  chat_model_kwargs={},
780
+ chat_model_ctx_length=120000,
781
  chat_model_ctx_history=0.7,
782
+ chat_model_rl_requests=0,
783
+ chat_model_rl_input=0,
784
+ chat_model_rl_output=0,
785
  util_model_provider=ModelProvider.OPENAI.name,
786
  util_model_name="gpt-4o-mini",
787
+ util_model_temperature=0.0,
788
+ util_model_ctx_length=120000,
789
+ util_model_ctx_input=0.7,
790
  util_model_kwargs={},
791
+ util_model_rl_requests=60,
792
+ util_model_rl_input=0,
793
+ util_model_rl_output=0,
794
  embed_model_provider=ModelProvider.OPENAI.name,
795
  embed_model_name="text-embedding-3-small",
796
  embed_model_kwargs={},
797
+ embed_model_rl_requests=0,
798
+ embed_model_rl_input=0,
799
  api_keys={},
800
  auth_login="",
801
  auth_password="",
python/tools/behaviour_adjustment.py CHANGED
@@ -21,15 +21,15 @@ async def update_behaviour(agent: Agent, log_item: LogItem, adjustments: str):
21
  current_rules = read_rules(agent)
22
 
23
  # log query streamed by LLM
24
- def log_callback(content):
25
  log_item.stream(ruleset=content)
26
 
27
  msg = agent.read_prompt("behaviour.merge.msg.md", current_rules=current_rules, adjustments=adjustments)
28
 
29
  # call util llm to find solutions in history
30
- adjustments_merge = await agent.call_utility_llm(
31
  system=system,
32
- msg=msg,
33
  callback=log_callback,
34
  )
35
 
 
21
  current_rules = read_rules(agent)
22
 
23
  # log query streamed by LLM
24
+ async def log_callback(content):
25
  log_item.stream(ruleset=content)
26
 
27
  msg = agent.read_prompt("behaviour.merge.msg.md", current_rules=current_rules, adjustments=adjustments)
28
 
29
  # call util llm to find solutions in history
30
+ adjustments_merge = await agent.call_utility_model(
31
  system=system,
32
+ message=msg,
33
  callback=log_callback,
34
  )
35
 
python/tools/response.py CHANGED
@@ -3,7 +3,6 @@ from python.helpers.tool import Tool, Response
3
  class ResponseTool(Tool):
4
 
5
  async def execute(self,**kwargs):
6
- self.agent.set_data("timeout", self.agent.config.response_timeout_seconds)
7
  return Response(message=self.args["text"], break_loop=True)
8
 
9
  async def before_execution(self, **kwargs):
 
3
  class ResponseTool(Tool):
4
 
5
  async def execute(self,**kwargs):
 
6
  return Response(message=self.args["text"], break_loop=True)
7
 
8
  async def before_execution(self, **kwargs):
run_ui.py CHANGED
@@ -92,7 +92,7 @@ def run():
92
 
93
  server = None
94
 
95
- def register_api_handler(app, handler):
96
  name = handler.__module__.split(".")[-1]
97
  instance = handler(app, lock)
98
  @requires_auth
 
92
 
93
  server = None
94
 
95
+ def register_api_handler(app, handler: type[ApiHandler]):
96
  name = handler.__module__.split(".")[-1]
97
  instance = handler(app, lock)
98
  @requires_auth
webui/css/settings.css CHANGED
@@ -42,6 +42,7 @@
42
  /* Input Styles */
43
  input[type="text"],
44
  input[type="password"],
 
45
  textarea,
46
  select {
47
  width: 100%;
 
42
  /* Input Styles */
43
  input[type="text"],
44
  input[type="password"],
45
+ input[type="number"],
46
  textarea,
47
  select {
48
  width: 100%;
webui/index.html CHANGED
@@ -451,12 +451,21 @@
451
 
452
  <div class="field-control">
453
  <!-- Input field -->
454
- <template x-if="field.type === 'input'">
455
  <input type="text" :class="field.classes" :value="field.value"
456
  :readonly="field.readonly === true"
457
  @input="field.value = $event.target.value">
458
  </template>
459
 
 
 
 
 
 
 
 
 
 
460
  <!-- Password field -->
461
  <template x-if="field.type === 'password'">
462
  <input type="password" :class="field.classes" :value="field.value"
 
451
 
452
  <div class="field-control">
453
  <!-- Input field -->
454
+ <template x-if="field.type === 'text'">
455
  <input type="text" :class="field.classes" :value="field.value"
456
  :readonly="field.readonly === true"
457
  @input="field.value = $event.target.value">
458
  </template>
459
 
460
+ <!-- Number field -->
461
+ <template x-if="field.type === 'number'">
462
+ <input type="number" :class="field.classes" :value="field.value"
463
+ :readonly="field.readonly === true"
464
+ @input="field.value = $event.target.value"
465
+ :min="field.min" :max="field.max" :step="field.step">
466
+ </template>
467
+
468
+
469
  <!-- Password field -->
470
  <template x-if="field.type === 'password'">
471
  <input type="password" :class="field.classes" :value="field.value"