Text Generation
Transformers
GGUF
English
Chinese
llama
GGUF
andrijdavid commited on
Commit
e54089f
·
1 Parent(s): 8f5811e

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -68,7 +68,7 @@ The following clients/libraries will automatically download models for you, prov
68
 
69
  ### In `text-generation-webui`
70
 
71
- Under Download Model, you can enter the model repo: andrijdavid/MiniMA-2-3B-GGUF and below it, a specific filename to download, such as: MiniMA-2-3B-f16.gguf.
72
 
73
  Then click Download.
74
 
@@ -83,7 +83,7 @@ pip3 install huggingface-hub
83
  Then you can download any individual model file to the current directory, at high speed, with a command like this:
84
 
85
  ```shell
86
- huggingface-cli download andrijdavid/MiniMA-2-3B-GGUF MiniMA-2-3B-f16.gguf --local-dir . --local-dir-use-symlinks False
87
  ```
88
 
89
  <details>
@@ -106,7 +106,7 @@ pip3 install hf_transfer
106
  And set environment variable `HF_HUB_ENABLE_HF_TRANSFER` to `1`:
107
 
108
  ```shell
109
- HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download andrijdavid/MiniMA-2-3B-GGUF MiniMA-2-3B-f16.gguf --local-dir . --local-dir-use-symlinks False
110
  ```
111
 
112
  Windows Command Line users: You can set the environment variable by running `set HF_HUB_ENABLE_HF_TRANSFER=1` before the download command.
@@ -118,7 +118,7 @@ Windows Command Line users: You can set the environment variable by running `set
118
  Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
119
 
120
  ```shell
121
- ./main -ngl 35 -m MiniMA-2-3B-f16.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "<PROMPT>"
122
  ```
123
 
124
  Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
@@ -169,7 +169,7 @@ pip install llama-cpp-python
169
  from llama_cpp import Llama
170
  # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
171
  llm = Llama(
172
- model_path="./MiniMA-2-3B-f16.gguf", # Download the model file first
173
  n_ctx=32768, # The max sequence length to use - note that longer sequence lengths require much more resources
174
  n_threads=8, # The number of CPU threads to use, tailor to your system and the resulting performance
175
  n_gpu_layers=35 # The number of layers to offload to GPU, if you have GPU acceleration available
@@ -182,7 +182,7 @@ output = llm(
182
  echo=True # Whether to echo the prompt
183
  )
184
  # Chat Completion API
185
- llm = Llama(model_path="./MiniMA-2-3B-f16.gguf", chat_format="llama-2") # Set chat_format according to the model you are using
186
  llm.create_chat_completion(
187
  messages = [
188
  {"role": "system", "content": "You are a story writing assistant."},
 
68
 
69
  ### In `text-generation-webui`
70
 
71
+ Under Download Model, you can enter the model repo: andrijdavid/MiniMA-2-3B-GGUF and below it, a specific filename to download, such as: MiniMA-2-3B.gguf.
72
 
73
  Then click Download.
74
 
 
83
  Then you can download any individual model file to the current directory, at high speed, with a command like this:
84
 
85
  ```shell
86
+ huggingface-cli download andrijdavid/MiniMA-2-3B-GGUF MiniMA-2-3B.gguf --local-dir . --local-dir-use-symlinks False
87
  ```
88
 
89
  <details>
 
106
  And set environment variable `HF_HUB_ENABLE_HF_TRANSFER` to `1`:
107
 
108
  ```shell
109
+ HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download andrijdavid/MiniMA-2-3B-GGUF MiniMA-2-3B.gguf --local-dir . --local-dir-use-symlinks False
110
  ```
111
 
112
  Windows Command Line users: You can set the environment variable by running `set HF_HUB_ENABLE_HF_TRANSFER=1` before the download command.
 
118
  Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
119
 
120
  ```shell
121
+ ./main -ngl 35 -m MiniMA-2-3B.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "<PROMPT>"
122
  ```
123
 
124
  Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
 
169
  from llama_cpp import Llama
170
  # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
171
  llm = Llama(
172
+ model_path="./MiniMA-2-3B.gguf", # Download the model file first
173
  n_ctx=32768, # The max sequence length to use - note that longer sequence lengths require much more resources
174
  n_threads=8, # The number of CPU threads to use, tailor to your system and the resulting performance
175
  n_gpu_layers=35 # The number of layers to offload to GPU, if you have GPU acceleration available
 
182
  echo=True # Whether to echo the prompt
183
  )
184
  # Chat Completion API
185
+ llm = Llama(model_path="./MiniMA-2-3B.gguf", chat_format="llama-2") # Set chat_format according to the model you are using
186
  llm.create_chat_completion(
187
  messages = [
188
  {"role": "system", "content": "You are a story writing assistant."},