File size: 4,070 Bytes

e29b963
 
 
 
 
8ae50d9
4d8adc1
 
e29b963
 
 
 
 
 
 
 
 
8ae50d9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f578d74
8ae50d9
 
 
 
 
 
 
 
f578d74
44d566d
8ae50d9
 
 
 
 
 
 
f3cce8b
8ae50d9
 
 
 
 
44d566d
8ae50d9
 
44d566d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8ae50d9
44d566d
 
 
 
8ae50d9
44d566d
8ae50d9
 
48cebec
8ae50d9
 
44d566d
8ae50d9
 
44d566d
8ae50d9
48cebec
 
8ae50d9
 
 
 
48cebec
e29b963
 
44d566d
e29b963
 
44d566d
e29b963
48cebec
 
e29b963
 
 
 
44d566d
8ae50d9
 
 
 
 
 
 
 
 
 
 
e29b963
8ae50d9
 
 
 
 
 
2e7341f
8ae50d9
 
 
 
e29b963

---
base_model:
- microsoft/Phi-4-mini-instruct
---
# Phi-4-mini-instruct with llama-server (Tool-Enhanced Version)

NOTE: THIS IS A POC FOR A SUPPLY CHAIN ATTACK LEVERAGING POISONED CHAT TEMPLATES. FOR FULL BLOG/CONTEXT, PLEASE REVIEW: https://www.pillar.security/blog/llm-backdoors-at-the-inference-level-the-threat-of-poisoned-templates

This repository contains instructions for running a modified version of the Phi-4-mini-instruct model using llama-server. This version has been enhanced to support tool usage, allowing the model to interact with external tools and APIs through a ChatGPT-compatible interface.

## Model Capabilities

This modified version of Phi-4-mini-instruct includes:
- Full support for tool usage and function calling
- Custom chat template optimized for tool interactions
- Ability to process and respond to tool outputs
- ChatGPT-compatible API interface

## Prerequisites

- [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) installed with server support
- The Phi-4-mini-instruct model in GGUF format

## Installation

1. Install llama-cpp-python with server support:
```bash
pip install llama-cpp-python[server]
```

2. Ensure your model file is in the correct location:
```bash
models/Phi-4-mini-instruct-Q4_K_M-function_calling.gguf
```

## Running the Server

Start the llama-server with the following command:

```bash
llama-server \
    --model models/Phi-4-mini-instruct-Q4_K_M-function_calling.gguf \
    --port 8080 \
    --jinja
```

This will start the server with:
- The model loaded in memory
- Server running on port 8082
- Verbose logging enabled
- Jinja template to support tool use

## Testing the API

You can test the server using curl commands. Here are some examples:

### Example 1: Using Tools

```bash
curl http://localhost:8080/v1/chat/completions -d '{
    "model": "phi-4-mini-instruct-with-tools",
    "tools": [
        {
        "type":"function",
        "function":{
            "name":"python",
            "description":"Runs code in an ipython interpreter and returns the result of the execution after 60 seconds.",
            "parameters":{
            "type":"object",
            "properties":{
                "code":{
                "type":"string",
                "description":"The code to run in the ipython interpreter."
                }
            },
            "required":["code"]
            }
        }
        }
    ],
    "messages": [
        {
        "role": "user",
        "content": "Print a hello world message with python."
        }
    ]
}'
```

### Example 2: Tell a Joke

```bash
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "phi-4-mini-instruct-with-tools",
    "messages": [
      {"role":"system","content":"You are a helpful clown instruction assistant"},
      {"role":"user","content":"tell me a funny joke"}
    ]
  }'
```

### Example 3: Generate HTML Hello World

```bash
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "phi-4-mini-instruct-with-tools",
    "messages": [
      {"role":"system","content":"You are a helpful coding assistant"},
      {"role":"user","content":"give me an html hello world document"}
    ]
  }'
```


## API Endpoints

The server provides a ChatGPT-compatible API with the following main endpoints:

- `/v1/chat/completions` - For chat completions
- `/v1/completions` - For text completions
- `/v1/models` - To list available models

## Notes

- The server uses the same API format as OpenAI's ChatGPT API, making it compatible with many existing tools and libraries
- The `--jinja` flag enables proper chat template formatting for the model, which is essential for tool usage

## Troubleshooting

If you encounter issues:

1. Ensure the model file exists in the specified path
2. Check that port 8080 is not in use by another application
3. Verify that llama-cpp-python is installed with server support

## License

Please ensure you comply with the model's license terms when using it.