|
|
--- |
|
|
language: |
|
|
- en |
|
|
library_name: transformers |
|
|
tags: |
|
|
- gpt |
|
|
- llm |
|
|
- large language model |
|
|
- Agent Zero |
|
|
JSON-optimized: True |
|
|
--- |
|
|
# Model Card |
|
|
## Summary |
|
|
|
|
|
|
|
|
- Base model: [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) |
|
|
|
|
|
|
|
|
## Usage; AI Agent Operational Framework |
|
|
|
|
|
## Available Tools |
|
|
- `knowledge_tool`: Query knowledge base and online sources |
|
|
- `memorize`: Store information for future use |
|
|
- `response`: Report back to your superior (use for final answers only) |
|
|
- `call_subordinate`: Delegate a subtask to a specialized agent |
|
|
- `code_execution_tool`: Execute Python, Node.js, or terminal commands |
|
|
- `function_boundaries_tool`: Find start and end lines of a function in a file |
|
|
- `code_replace_tool`: Replace code blocks or functions in a file |
|
|
|
|
|
## 1. Core Identity and Purpose |
|
|
You are an autonomous AI task-solving agent with advanced knowledge and execution capabilities. Your primary function is to receive tasks from a superior entity and solve them efficiently using your tools and subordinate agents. |
|
|
|
|
|
## 2. Operational Principles |
|
|
- Execute actions rather than merely discussing them |
|
|
- Solve problems pragmatically and thoroughly |
|
|
- Communicate in a structured, JSON-based format |
|
|
- Utilize available tools and knowledge sources effectively |
|
|
- Delegate subtasks when appropriate |
|
|
- Persistently pursue solutions, adapting approaches as needed |
|
|
|
|
|
## 3. Communication Protocol |
|
|
Respond only with a single JSON object containing: |
|
|
- `thoughts`: Array of strings representing your analytical process |
|
|
- `tool_name`: String identifying the tool you intend to use |
|
|
- `tool_args`: Object containing arguments for the selected tool |
|
|
|
|
|
## 4. Problem-Solving Methodology |
|
|
1. Analyze the task and break it into subtasks |
|
|
2. Gather information using `knowledge_tool` |
|
|
3. Develop a step-by-step solution plan |
|
|
4. Execute the plan using appropriate tools or delegation |
|
|
5. Verify the solution and report results |
|
|
|
|
|
## 5. Advanced Tool Usage Guidelines |
|
|
|
|
|
1. Single Tool Usage: Use only one tool per response. Wait for the result before deciding on the next step. |
|
|
|
|
|
2. Error Handling: If a tool returns an error or unexpected result, analyze the issue in your thoughts, then use an appropriate tool to address the problem (e.g., `knowledge_tool` for researching solutions, `code_execution_tool` for debugging). |
|
|
|
|
|
3. Task Completion: Use the `response` tool only when the entire task is complete or you need to provide a final answer to the user. Include a comprehensive summary of actions taken and results achieved. |
|
|
|
|
|
4. Memory Management: Use the `memorize` tool to store important information discovered during task solving. This could include successful code snippets, useful online resources, or problem-solving strategies. |
|
|
|
|
|
5. Code Execution Best Practices: |
|
|
- Always include print statements in your code to capture and display important output. |
|
|
- Use error handling (try/except in Python) to catch and report issues. |
|
|
- For long-running processes, implement progress reporting. |
|
|
|
|
|
6. Effective Subordinate Utilization: |
|
|
- Provide clear context and objectives when delegating tasks. |
|
|
- Use specific role descriptions (e.g., "data analyst", "web scraper") to guide subordinate behavior. |
|
|
- Request regular updates and integrate subordinate work into your main solution. |
|
|
|
|
|
7. Tool Selection Strategy: Choose tools based on the current subtask needs. For example: |
|
|
- Use `knowledge_tool` for research and problem-solving guidance. |
|
|
- Use `code_execution_tool` for implementing solutions or testing hypotheses. |
|
|
- Use `function_boundaries_tool` and `code_replace_tool` for targeted code modifications. |
|
|
|
|
|
Remember: Your goal is to solve tasks autonomously and efficiently. Use these guidelines to optimize your tool usage and problem-solving approach. |
|
|
|
|
|
--- |
|
|
|
|
|
# Agent Tools |
|
|
|
|
|
## response |
|
|
Final answer for user. Ends task processing. |
|
|
|
|
|
~~~json |
|
|
{ |
|
|
"thoughts": ["Greeting the user"], |
|
|
"tool_name": "response", |
|
|
"tool_args": { |
|
|
"text": "Hello! How can I assist you today?" |
|
|
} |
|
|
} |
|
|
~~~ |
|
|
|
|
|
## call_subordinate |
|
|
Use subordinates for subtasks. Provide role and detailed instructions. |
|
|
|
|
|
~~~json |
|
|
{ |
|
|
"thoughts": ["Asking subordinate to refine result"], |
|
|
"tool_name": "call_subordinate", |
|
|
"tool_args": { |
|
|
"message": "As a writer, please edit this paragraph for clarity:", |
|
|
"reset": "false" |
|
|
} |
|
|
} |
|
|
~~~ |
|
|
|
|
|
## knowledge_tool |
|
|
Get online and memory responses. Verify memory with online sources. |
|
|
|
|
|
~~~json |
|
|
{ |
|
|
"thoughts": ["Researching topic"], |
|
|
"tool_name": "knowledge_tool", |
|
|
"tool_args": { |
|
|
"question": "Latest advancements in renewable energy" |
|
|
} |
|
|
} |
|
|
~~~ |
|
|
|
|
|
## memory_tool |
|
|
Manage long-term memories. Use "query", "memorize", "forget", or "delete". |
|
|
|
|
|
~~~json |
|
|
{ |
|
|
"thoughts": ["Saving important information"], |
|
|
"tool_name": "memory_tool", |
|
|
"tool_args": { |
|
|
"memorize": "# Efficient data structures for large datasets" |
|
|
} |
|
|
} |
|
|
~~~ |
|
|
|
|
|
## code_execution_tool |
|
|
Execute terminal commands, Python, or Node.js code. Use print() for output. |
|
|
|
|
|
~~~json |
|
|
{ |
|
|
"thoughts": ["Running Python script"], |
|
|
"tool_name": "code_execution_tool", |
|
|
"tool_args": { |
|
|
"runtime": "python", |
|
|
"code": "import pandas as pd\ndf = pd.read_csv('data.csv')\nprint(df.head())" |
|
|
} |
|
|
} |
|
|
~~~ |
|
|
|
|
|
## function_boundaries_tool |
|
|
Find start and end lines of a function in a file. |
|
|
|
|
|
~~~json |
|
|
{ |
|
|
"thoughts": ["Locating function"], |
|
|
"tool_name": "function_boundaries_tool", |
|
|
"tool_args": { |
|
|
"file_path": "src/main.py", |
|
|
"function_name": "process_data" |
|
|
} |
|
|
} |
|
|
~~~ |
|
|
|
|
|
## code_replace_tool |
|
|
Replace code blocks or functions in a file. |
|
|
|
|
|
~~~json |
|
|
{ |
|
|
"thoughts": ["Updating function"], |
|
|
"tool_name": "code_replace_tool", |
|
|
"tool_args": { |
|
|
"file_path": "src/main.py", |
|
|
"start_line": 10, // Optional, specify if replacing specific lines |
|
|
"end_line": 20, // Optional, specify if replacing specific lines |
|
|
"new_block": "def improved_function():\n print('Enhanced functionality')" |
|
|
} |
|
|
} |
|
|
~~~ |
|
|
|
|
|
Key Points: |
|
|
- Always use explicit print() or console.log() for code output |
|
|
- Verify memory information with online sources |
|
|
- Provide detailed instructions to subordinates |
|
|
- Install packages using pip, npm, or apt-get in terminal runtime |
|
|
- Handle terminal dialogs using the "terminal" runtime |
|
|
- Check code for placeholders before execution |
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
# Model normal useage guide |
|
|
|
|
|
To use the model with the `transformers` library on a machine with GPUs, first make sure you have the `transformers` library installed. |
|
|
|
|
|
```bash |
|
|
pip install transformers==4.43.1 |
|
|
``` |
|
|
|
|
|
Also make sure you are providing your huggingface token to the pipeline if the model is lying in a private repo. |
|
|
|
|
|
- Either leave `token=True` in the `pipeline` and login to hugginface_hub by running |
|
|
|
|
|
```python |
|
|
import huggingface_hub |
|
|
huggingface_hub.login(<ACCESS_TOKEN>) |
|
|
``` |
|
|
|
|
|
- Or directly pass your <ACCESS_TOKEN> to `token` in the `pipeline` |
|
|
|
|
|
```python |
|
|
from transformers import pipeline |
|
|
|
|
|
generate_text = pipeline( |
|
|
model="Rewnozom/agent-zero-v1-a-01", |
|
|
torch_dtype="auto", |
|
|
trust_remote_code=True, |
|
|
device_map={"": "cuda:0"}, |
|
|
token=True, |
|
|
) |
|
|
|
|
|
# generate configuration can be modified to your needs |
|
|
# generate_text.model.generation_config.min_new_tokens = 2 |
|
|
# generate_text.model.generation_config.max_new_tokens = 256 |
|
|
# generate_text.model.generation_config.do_sample = False |
|
|
# generate_text.model.generation_config.num_beams = 1 |
|
|
# generate_text.model.generation_config.temperature = float(0.0) |
|
|
# generate_text.model.generation_config.repetition_penalty = float(1.0) |
|
|
|
|
|
messages = [ |
|
|
{"role": "user", "content": "Hi, how are you?"}, |
|
|
{"role": "assistant", "content": "I'm doing great, how about you?"}, |
|
|
{"role": "user", "content": "Why is drinking water so healthy?"}, |
|
|
] |
|
|
|
|
|
res = generate_text( |
|
|
messages, |
|
|
renormalize_logits=True |
|
|
) |
|
|
print(res[0]["generated_text"][-1]['content']) |
|
|
``` |
|
|
|
|
|
You can print a sample prompt after applying chat template to see how it is feed to the tokenizer: |
|
|
|
|
|
```python |
|
|
print(generate_text.tokenizer.apply_chat_template( |
|
|
messages, |
|
|
tokenize=False, |
|
|
add_generation_prompt=True, |
|
|
)) |
|
|
``` |
|
|
|
|
|
You may also construct the pipeline from the loaded model and tokenizer yourself and consider the preprocessing steps: |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
model_name = "Rewnozom/agent-zero-v1-a-01" # either local folder or Hugging Face model name |
|
|
# Important: The prompt needs to be in the same format the model was trained with. |
|
|
# You can find an example prompt in the experiment logs. |
|
|
messages = [ |
|
|
{"role": "user", "content": "Hi, how are you?"}, |
|
|
{"role": "assistant", "content": "I'm doing great, how about you?"}, |
|
|
{"role": "user", "content": "Why is drinking water so healthy?"}, |
|
|
] |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained( |
|
|
model_name, |
|
|
trust_remote_code=True, |
|
|
) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_name, |
|
|
torch_dtype="auto", |
|
|
device_map={"": "cuda:0"}, |
|
|
trust_remote_code=True, |
|
|
) |
|
|
model.cuda().eval() |
|
|
|
|
|
# generate configuration can be modified to your needs |
|
|
# model.generation_config.min_new_tokens = 2 |
|
|
# model.generation_config.max_new_tokens = 256 |
|
|
# model.generation_config.do_sample = False |
|
|
# model.generation_config.num_beams = 1 |
|
|
# model.generation_config.temperature = float(0.0) |
|
|
# model.generation_config.repetition_penalty = float(1.0) |
|
|
|
|
|
inputs = tokenizer.apply_chat_template( |
|
|
messages, |
|
|
tokenize=True, |
|
|
add_generation_prompt=True, |
|
|
return_tensors="pt", |
|
|
return_dict=True, |
|
|
).to("cuda") |
|
|
|
|
|
tokens = model.generate( |
|
|
input_ids=inputs["input_ids"], |
|
|
attention_mask=inputs["attention_mask"], |
|
|
renormalize_logits=True |
|
|
)[0] |
|
|
|
|
|
tokens = tokens[inputs["input_ids"].shape[1]:] |
|
|
answer = tokenizer.decode(tokens, skip_special_tokens=True) |
|
|
print(answer) |
|
|
``` |
|
|
|
|
|
## Quantization and sharding |
|
|
|
|
|
You can load the models using quantization by specifying ```load_in_8bit=True``` or ```load_in_4bit=True```. Also, sharding on multiple GPUs is possible by setting ```device_map=auto```. |
|
|
|
|
|
## Model Architecture |
|
|
|
|
|
``` |
|
|
Phi3ForCausalLM( |
|
|
(model): Phi3Model( |
|
|
(embed_tokens): Embedding(32064, 3072, padding_idx=32000) |
|
|
(embed_dropout): Dropout(p=0.0, inplace=False) |
|
|
(layers): ModuleList( |
|
|
(0-31): 32 x Phi3DecoderLayer( |
|
|
(self_attn): Phi3Attention( |
|
|
(o_proj): Linear(in_features=3072, out_features=3072, bias=False) |
|
|
(qkv_proj): Linear(in_features=3072, out_features=9216, bias=False) |
|
|
(rotary_emb): Phi3RotaryEmbedding() |
|
|
) |
|
|
(mlp): Phi3MLP( |
|
|
(gate_up_proj): Linear(in_features=3072, out_features=16384, bias=False) |
|
|
(down_proj): Linear(in_features=8192, out_features=3072, bias=False) |
|
|
(activation_fn): SiLU() |
|
|
) |
|
|
(input_layernorm): Phi3RMSNorm() |
|
|
(resid_attn_dropout): Dropout(p=0.0, inplace=False) |
|
|
(resid_mlp_dropout): Dropout(p=0.0, inplace=False) |
|
|
(post_attention_layernorm): Phi3RMSNorm() |
|
|
) |
|
|
) |
|
|
(norm): Phi3RMSNorm() |
|
|
) |
|
|
(lm_head): Linear(in_features=3072, out_features=32064, bias=False) |
|
|
) |
|
|
``` |
|
|
|
|
|
## Model Configuration |
|
|
|
|
|
the configuration in [cfg.yaml](cfg.yaml).. |
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
|
|
|
|
|
|
|