ariel-pillar commited on
Commit
8ae50d9
·
verified ·
1 Parent(s): a065437

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +99 -0
README.md ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - microsoft/Phi-4-mini-instruct
4
+ ---
5
+ # Phi-4-mini-instruct with llama-server
6
+
7
+ This repository contains instructions for running the Phi-4-mini-instruct model using llama-server, which provides a ChatGPT-compatible API interface.
8
+
9
+ ## Prerequisites
10
+
11
+ - [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) installed with server support
12
+ - The Phi-4-mini-instruct model in GGUF format
13
+
14
+ ## Installation
15
+
16
+ 1. Install llama-cpp-python with server support:
17
+ ```bash
18
+ pip install llama-cpp-python[server]
19
+ ```
20
+
21
+ 2. Ensure your model file is in the correct location:
22
+ ```bash
23
+ models/Phi-4-mini-instruct-Q4_K_M-modified.gguf
24
+ ```
25
+
26
+ ## Running the Server
27
+
28
+ Start the llama-server with the following command:
29
+
30
+ ```bash
31
+ llama-server \
32
+ --model models/Phi-4-mini-instruct-Q4_K_M-modified.gguf \
33
+ --port 8082 \
34
+ --verbose \
35
+ --jinja
36
+ ```
37
+
38
+ This will start the server with:
39
+ - The model loaded in memory
40
+ - Server running on port 8082
41
+ - Verbose logging enabled
42
+ - Jinja template support for chat formatting
43
+
44
+ ## Testing the API
45
+
46
+ You can test the server using curl commands. Here are some examples:
47
+
48
+ ### Example 1: Generate HTML Hello World
49
+
50
+ ```bash
51
+ curl http://localhost:8082/v1/chat/completions \
52
+ -H "Content-Type: application/json" \
53
+ -d '{
54
+ "model": "any-model",
55
+ "messages": [
56
+ {"role":"user","content":"give me an html hello world document"}
57
+ ]
58
+ }'
59
+ ```
60
+
61
+ ### Example 2: Tell a Joke
62
+
63
+ ```bash
64
+ curl http://localhost:8082/v1/chat/completions \
65
+ -H "Content-Type: application/json" \
66
+ -d '{
67
+ "model": "any-model",
68
+ "messages": [
69
+ {"role":"user","content":"tell me a funny joke"}
70
+ ]
71
+ }'
72
+ ```
73
+
74
+ ## API Endpoints
75
+
76
+ The server provides a ChatGPT-compatible API with the following main endpoints:
77
+
78
+ - `/v1/chat/completions` - For chat completions
79
+ - `/v1/completions` - For text completions
80
+ - `/v1/models` - To list available models
81
+
82
+ ## Notes
83
+
84
+ - The server uses the same API format as OpenAI's ChatGPT API, making it compatible with many existing tools and libraries
85
+ - The `--jinja` flag enables proper chat template formatting for the model
86
+ - The model name in the requests can be set to "any-model" as shown in the examples
87
+
88
+ ## Troubleshooting
89
+
90
+ If you encounter issues:
91
+
92
+ 1. Ensure the model file exists in the specified path
93
+ 2. Check that port 8082 is not in use by another application
94
+ 3. Verify that llama-cpp-python is installed with server support
95
+ 4. Check the server logs with `--verbose` flag for detailed information
96
+
97
+ ## License
98
+
99
+ Please ensure you comply with the model's license terms when using it.