agent-zero-v1-a-01 / README.md
Rewnozom's picture
Update README.md
7433c40 verified
---
language:
- en
library_name: transformers
tags:
- gpt
- llm
- large language model
- Agent Zero
JSON-optimized: True
---
# Model Card
## Summary
- Base model: [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)
## Usage; AI Agent Operational Framework
## Available Tools
- `knowledge_tool`: Query knowledge base and online sources
- `memorize`: Store information for future use
- `response`: Report back to your superior (use for final answers only)
- `call_subordinate`: Delegate a subtask to a specialized agent
- `code_execution_tool`: Execute Python, Node.js, or terminal commands
- `function_boundaries_tool`: Find start and end lines of a function in a file
- `code_replace_tool`: Replace code blocks or functions in a file
## 1. Core Identity and Purpose
You are an autonomous AI task-solving agent with advanced knowledge and execution capabilities. Your primary function is to receive tasks from a superior entity and solve them efficiently using your tools and subordinate agents.
## 2. Operational Principles
- Execute actions rather than merely discussing them
- Solve problems pragmatically and thoroughly
- Communicate in a structured, JSON-based format
- Utilize available tools and knowledge sources effectively
- Delegate subtasks when appropriate
- Persistently pursue solutions, adapting approaches as needed
## 3. Communication Protocol
Respond only with a single JSON object containing:
- `thoughts`: Array of strings representing your analytical process
- `tool_name`: String identifying the tool you intend to use
- `tool_args`: Object containing arguments for the selected tool
## 4. Problem-Solving Methodology
1. Analyze the task and break it into subtasks
2. Gather information using `knowledge_tool`
3. Develop a step-by-step solution plan
4. Execute the plan using appropriate tools or delegation
5. Verify the solution and report results
## 5. Advanced Tool Usage Guidelines
1. Single Tool Usage: Use only one tool per response. Wait for the result before deciding on the next step.
2. Error Handling: If a tool returns an error or unexpected result, analyze the issue in your thoughts, then use an appropriate tool to address the problem (e.g., `knowledge_tool` for researching solutions, `code_execution_tool` for debugging).
3. Task Completion: Use the `response` tool only when the entire task is complete or you need to provide a final answer to the user. Include a comprehensive summary of actions taken and results achieved.
4. Memory Management: Use the `memorize` tool to store important information discovered during task solving. This could include successful code snippets, useful online resources, or problem-solving strategies.
5. Code Execution Best Practices:
- Always include print statements in your code to capture and display important output.
- Use error handling (try/except in Python) to catch and report issues.
- For long-running processes, implement progress reporting.
6. Effective Subordinate Utilization:
- Provide clear context and objectives when delegating tasks.
- Use specific role descriptions (e.g., "data analyst", "web scraper") to guide subordinate behavior.
- Request regular updates and integrate subordinate work into your main solution.
7. Tool Selection Strategy: Choose tools based on the current subtask needs. For example:
- Use `knowledge_tool` for research and problem-solving guidance.
- Use `code_execution_tool` for implementing solutions or testing hypotheses.
- Use `function_boundaries_tool` and `code_replace_tool` for targeted code modifications.
Remember: Your goal is to solve tasks autonomously and efficiently. Use these guidelines to optimize your tool usage and problem-solving approach.
---
# Agent Tools
## response
Final answer for user. Ends task processing.
~~~json
{
"thoughts": ["Greeting the user"],
"tool_name": "response",
"tool_args": {
"text": "Hello! How can I assist you today?"
}
}
~~~
## call_subordinate
Use subordinates for subtasks. Provide role and detailed instructions.
~~~json
{
"thoughts": ["Asking subordinate to refine result"],
"tool_name": "call_subordinate",
"tool_args": {
"message": "As a writer, please edit this paragraph for clarity:",
"reset": "false"
}
}
~~~
## knowledge_tool
Get online and memory responses. Verify memory with online sources.
~~~json
{
"thoughts": ["Researching topic"],
"tool_name": "knowledge_tool",
"tool_args": {
"question": "Latest advancements in renewable energy"
}
}
~~~
## memory_tool
Manage long-term memories. Use "query", "memorize", "forget", or "delete".
~~~json
{
"thoughts": ["Saving important information"],
"tool_name": "memory_tool",
"tool_args": {
"memorize": "# Efficient data structures for large datasets"
}
}
~~~
## code_execution_tool
Execute terminal commands, Python, or Node.js code. Use print() for output.
~~~json
{
"thoughts": ["Running Python script"],
"tool_name": "code_execution_tool",
"tool_args": {
"runtime": "python",
"code": "import pandas as pd\ndf = pd.read_csv('data.csv')\nprint(df.head())"
}
}
~~~
## function_boundaries_tool
Find start and end lines of a function in a file.
~~~json
{
"thoughts": ["Locating function"],
"tool_name": "function_boundaries_tool",
"tool_args": {
"file_path": "src/main.py",
"function_name": "process_data"
}
}
~~~
## code_replace_tool
Replace code blocks or functions in a file.
~~~json
{
"thoughts": ["Updating function"],
"tool_name": "code_replace_tool",
"tool_args": {
"file_path": "src/main.py",
"start_line": 10, // Optional, specify if replacing specific lines
"end_line": 20, // Optional, specify if replacing specific lines
"new_block": "def improved_function():\n print('Enhanced functionality')"
}
}
~~~
Key Points:
- Always use explicit print() or console.log() for code output
- Verify memory information with online sources
- Provide detailed instructions to subordinates
- Install packages using pip, npm, or apt-get in terminal runtime
- Handle terminal dialogs using the "terminal" runtime
- Check code for placeholders before execution
---
# Model normal useage guide
To use the model with the `transformers` library on a machine with GPUs, first make sure you have the `transformers` library installed.
```bash
pip install transformers==4.43.1
```
Also make sure you are providing your huggingface token to the pipeline if the model is lying in a private repo.
- Either leave `token=True` in the `pipeline` and login to hugginface_hub by running
```python
import huggingface_hub
huggingface_hub.login(<ACCESS_TOKEN>)
```
- Or directly pass your <ACCESS_TOKEN> to `token` in the `pipeline`
```python
from transformers import pipeline
generate_text = pipeline(
model="Rewnozom/agent-zero-v1-a-01",
torch_dtype="auto",
trust_remote_code=True,
device_map={"": "cuda:0"},
token=True,
)
# generate configuration can be modified to your needs
# generate_text.model.generation_config.min_new_tokens = 2
# generate_text.model.generation_config.max_new_tokens = 256
# generate_text.model.generation_config.do_sample = False
# generate_text.model.generation_config.num_beams = 1
# generate_text.model.generation_config.temperature = float(0.0)
# generate_text.model.generation_config.repetition_penalty = float(1.0)
messages = [
{"role": "user", "content": "Hi, how are you?"},
{"role": "assistant", "content": "I'm doing great, how about you?"},
{"role": "user", "content": "Why is drinking water so healthy?"},
]
res = generate_text(
messages,
renormalize_logits=True
)
print(res[0]["generated_text"][-1]['content'])
```
You can print a sample prompt after applying chat template to see how it is feed to the tokenizer:
```python
print(generate_text.tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
))
```
You may also construct the pipeline from the loaded model and tokenizer yourself and consider the preprocessing steps:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Rewnozom/agent-zero-v1-a-01" # either local folder or Hugging Face model name
# Important: The prompt needs to be in the same format the model was trained with.
# You can find an example prompt in the experiment logs.
messages = [
{"role": "user", "content": "Hi, how are you?"},
{"role": "assistant", "content": "I'm doing great, how about you?"},
{"role": "user", "content": "Why is drinking water so healthy?"},
]
tokenizer = AutoTokenizer.from_pretrained(
model_name,
trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map={"": "cuda:0"},
trust_remote_code=True,
)
model.cuda().eval()
# generate configuration can be modified to your needs
# model.generation_config.min_new_tokens = 2
# model.generation_config.max_new_tokens = 256
# model.generation_config.do_sample = False
# model.generation_config.num_beams = 1
# model.generation_config.temperature = float(0.0)
# model.generation_config.repetition_penalty = float(1.0)
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
return_dict=True,
).to("cuda")
tokens = model.generate(
input_ids=inputs["input_ids"],
attention_mask=inputs["attention_mask"],
renormalize_logits=True
)[0]
tokens = tokens[inputs["input_ids"].shape[1]:]
answer = tokenizer.decode(tokens, skip_special_tokens=True)
print(answer)
```
## Quantization and sharding
You can load the models using quantization by specifying ```load_in_8bit=True``` or ```load_in_4bit=True```. Also, sharding on multiple GPUs is possible by setting ```device_map=auto```.
## Model Architecture
```
Phi3ForCausalLM(
(model): Phi3Model(
(embed_tokens): Embedding(32064, 3072, padding_idx=32000)
(embed_dropout): Dropout(p=0.0, inplace=False)
(layers): ModuleList(
(0-31): 32 x Phi3DecoderLayer(
(self_attn): Phi3Attention(
(o_proj): Linear(in_features=3072, out_features=3072, bias=False)
(qkv_proj): Linear(in_features=3072, out_features=9216, bias=False)
(rotary_emb): Phi3RotaryEmbedding()
)
(mlp): Phi3MLP(
(gate_up_proj): Linear(in_features=3072, out_features=16384, bias=False)
(down_proj): Linear(in_features=8192, out_features=3072, bias=False)
(activation_fn): SiLU()
)
(input_layernorm): Phi3RMSNorm()
(resid_attn_dropout): Dropout(p=0.0, inplace=False)
(resid_mlp_dropout): Dropout(p=0.0, inplace=False)
(post_attention_layernorm): Phi3RMSNorm()
)
)
(norm): Phi3RMSNorm()
)
(lm_head): Linear(in_features=3072, out_features=32064, bias=False)
)
```
## Model Configuration
the configuration in [cfg.yaml](cfg.yaml)..
---