make clear that readme is about tool use

e29b963 verified 11 months ago

3.77 kB

base_model:
  - microsoft/Phi-4-mini-instruct

Phi-4-mini-instruct with llama-server (Tool-Enhanced Version)

This repository contains instructions for running a modified version of the Phi-4-mini-instruct model using llama-server. This version has been enhanced to support tool usage, allowing the model to interact with external tools and APIs through a ChatGPT-compatible interface.

Model Capabilities

This modified version of Phi-4-mini-instruct includes:

Full support for tool usage and function calling
Custom chat template optimized for tool interactions
Ability to process and respond to tool outputs
ChatGPT-compatible API interface

Prerequisites

llama-cpp-python installed with server support
The Phi-4-mini-instruct model in GGUF format

Installation

Install llama-cpp-python with server support:

pip install llama-cpp-python[server]

Ensure your model file is in the correct location:

models/Phi-4-mini-instruct-Q4_K_M-modified.gguf

Running the Server

Start the llama-server with the following command:

llama-server \
    --model models/Phi-4-mini-instruct-Q4_K_M-modified.gguf \
    --port 8082 \
    --jinja

This will start the server with:

The model loaded in memory
Server running on port 8082
Verbose logging enabled
Jinja template support for chat formatting

Testing the API

You can test the server using curl commands. Here are some examples:

Example 1: Generate HTML Hello World

curl http://localhost:8082/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "any-model",
    "messages": [
      {"role":"user","content":"give me an html hello world document"}
    ]
  }'

Example 2: Tell a Joke

curl http://localhost:8082/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "any-model",
    "messages": [
      {"role":"user","content":"tell me a funny joke"}
    ]
  }'

Example 3: Using Tools

curl http://localhost:8082/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "any-model",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful AI assistant that can use tools.",
        "tools": "[{\"name\": \"calculator\", \"description\": \"Useful for performing mathematical calculations\", \"parameters\": {\"type\": \"object\", \"properties\": {\"expression\": {\"type\": \"string\", \"description\": \"The mathematical expression to evaluate\"}}}}]"
      },
      {
        "role": "user",
        "content": "What is 235 * 89?"
      }
    ]
  }'

API Endpoints

The server provides a ChatGPT-compatible API with the following main endpoints:

/v1/chat/completions - For chat completions
/v1/completions - For text completions
/v1/models - To list available models

Notes

The server uses the same API format as OpenAI's ChatGPT API, making it compatible with many existing tools and libraries
The --jinja flag enables proper chat template formatting for the model, which is essential for tool usage
The model name in the requests can be set to "any-model" as shown in the examples
This version supports system messages with tool definitions
Tool responses are properly handled through the chat template

Troubleshooting

If you encounter issues:

Ensure the model file exists in the specified path
Check that port 8082 is not in use by another application
Verify that llama-cpp-python is installed with server support
Check the server logs with --verbose flag for detailed information

License

Please ensure you comply with the model's license terms when using it.