Spaces:
Running
Running
README.md updated
Browse files
README.md
CHANGED
|
@@ -5,3 +5,20 @@
|
|
| 5 |
|
| 6 |
## Overview
|
| 7 |
HASHIRU is an agent-based framework designed to dynamically allocate and manage large language models (LLMs) and external APIs through a CEO model. The CEO model acts as a central manager, capable of hiring, firing, and directing multiple specialized agents (employees) over a given budget. It can also create and utilize external APIs as needed, making it highly flexible and scalable.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
|
| 6 |
## Overview
|
| 7 |
HASHIRU is an agent-based framework designed to dynamically allocate and manage large language models (LLMs) and external APIs through a CEO model. The CEO model acts as a central manager, capable of hiring, firing, and directing multiple specialized agents (employees) over a given budget. It can also create and utilize external APIs as needed, making it highly flexible and scalable.
|
| 8 |
+
|
| 9 |
+
## Features
|
| 10 |
+
- **Cost-Benefit Matrix**:
|
| 11 |
+
Selects the best LLM model (LLaMA, Mixtral, Gemini, DeepSeek, etc.) for any task using Ollama, based on latency, size, cost, quality, and speed.
|
| 12 |
+
## Usage:
|
| 13 |
+
|
| 14 |
+
```bash
|
| 15 |
+
python tools/cost_benefit.py \
|
| 16 |
+
--prompt "Best places to visit in Davis" \
|
| 17 |
+
--latency 4 --size 2 --cost 5 --speed 3
|
| 18 |
+
```
|
| 19 |
+
Each weight is on a scale of **1** (least important) to **5** (most important):
|
| 20 |
+
|
| 21 |
+
- `--latency`: Prefer faster responses (lower time to answer)
|
| 22 |
+
- `--size`: Prefer smaller models (use less memory/resources)
|
| 23 |
+
- `--cost`: Prefer cheaper responses (fewer tokens, lower token price)
|
| 24 |
+
- `--speed`: Prefer models that generate tokens quickly (tokens/sec)
|