Upload README.md
Browse files
README.md
CHANGED
|
@@ -1,11 +1,11 @@
|
|
| 1 |
---
|
| 2 |
-
base_model: migtissera/Tess-
|
| 3 |
inference: false
|
| 4 |
license: other
|
| 5 |
license_link: https://huggingface.co/01-ai/Yi-34B/blob/main/LICENSE
|
| 6 |
license_name: yi-34b
|
| 7 |
model_creator: Migel Tissera
|
| 8 |
-
model_name: Tess
|
| 9 |
model_type: yi
|
| 10 |
prompt_template: 'SYSTEM: Elaborate on the topic using a Tree of Thoughts and backtrack
|
| 11 |
when necessary to construct a clear, cohesive Chain of Thought reasoning. Always
|
|
@@ -37,14 +37,14 @@ quantized_by: TheBloke
|
|
| 37 |
<hr style="margin-top: 1.0em; margin-bottom: 1.0em;">
|
| 38 |
<!-- header end -->
|
| 39 |
|
| 40 |
-
# Tess
|
| 41 |
- Model creator: [Migel Tissera](https://huggingface.co/migtissera)
|
| 42 |
-
- Original model: [Tess
|
| 43 |
|
| 44 |
<!-- description start -->
|
| 45 |
## Description
|
| 46 |
|
| 47 |
-
This repo contains AWQ model files for [Migel Tissera's Tess
|
| 48 |
|
| 49 |
These files were quantised using hardware kindly provided by [Massed Compute](https://massedcompute.com/).
|
| 50 |
|
|
@@ -65,10 +65,10 @@ It is supported by:
|
|
| 65 |
<!-- repositories-available start -->
|
| 66 |
## Repositories available
|
| 67 |
|
| 68 |
-
* [AWQ model(s) for GPU inference.](https://huggingface.co/TheBloke/Tess-
|
| 69 |
-
* [GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/Tess-
|
| 70 |
-
* [2, 3, 4, 5, 6 and 8-bit GGUF models for CPU+GPU inference](https://huggingface.co/TheBloke/Tess-
|
| 71 |
-
* [Migel Tissera's original unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/migtissera/Tess-
|
| 72 |
<!-- repositories-available end -->
|
| 73 |
|
| 74 |
<!-- prompt-template start -->
|
|
@@ -93,7 +93,7 @@ Models are released as sharded safetensors files.
|
|
| 93 |
|
| 94 |
| Branch | Bits | GS | AWQ Dataset | Seq Len | Size |
|
| 95 |
| ------ | ---- | -- | ----------- | ------- | ---- |
|
| 96 |
-
| [main](https://huggingface.co/TheBloke/Tess-
|
| 97 |
|
| 98 |
<!-- README_AWQ.md-provided-files end -->
|
| 99 |
|
|
@@ -105,11 +105,11 @@ Please make sure you're using the latest version of [text-generation-webui](http
|
|
| 105 |
It is strongly recommended to use the text-generation-webui one-click-installers unless you're sure you know how to make a manual install.
|
| 106 |
|
| 107 |
1. Click the **Model tab**.
|
| 108 |
-
2. Under **Download custom model or LoRA**, enter `TheBloke/Tess-
|
| 109 |
3. Click **Download**.
|
| 110 |
4. The model will start downloading. Once it's finished it will say "Done".
|
| 111 |
5. In the top left, click the refresh icon next to **Model**.
|
| 112 |
-
6. In the **Model** dropdown, choose the model you just downloaded: `Tess-
|
| 113 |
7. Select **Loader: AutoAWQ**.
|
| 114 |
8. Click Load, and the model will load and is now ready for use.
|
| 115 |
9. If you want any custom settings, set them and then click **Save settings for this model** followed by **Reload the Model** in the top right.
|
|
@@ -127,7 +127,7 @@ Documentation on installing and using vLLM [can be found here](https://vllm.read
|
|
| 127 |
For example:
|
| 128 |
|
| 129 |
```shell
|
| 130 |
-
python3 -m vllm.entrypoints.api_server --model TheBloke/Tess-
|
| 131 |
```
|
| 132 |
|
| 133 |
- When using vLLM from Python code, again set `quantization=awq`.
|
|
@@ -152,7 +152,7 @@ prompts = [prompt_template.format(prompt=prompt) for prompt in prompts]
|
|
| 152 |
|
| 153 |
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
|
| 154 |
|
| 155 |
-
llm = LLM(model="TheBloke/Tess-
|
| 156 |
|
| 157 |
outputs = llm.generate(prompts, sampling_params)
|
| 158 |
|
|
@@ -172,7 +172,7 @@ Use TGI version 1.1.0 or later. The official Docker container is: `ghcr.io/huggi
|
|
| 172 |
Example Docker parameters:
|
| 173 |
|
| 174 |
```shell
|
| 175 |
-
--model-id TheBloke/Tess-
|
| 176 |
```
|
| 177 |
|
| 178 |
Example Python code for interfacing with TGI (requires [huggingface-hub](https://github.com/huggingface/huggingface_hub) 0.17.0 or later):
|
|
@@ -239,7 +239,7 @@ pip3 install .
|
|
| 239 |
```python
|
| 240 |
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
|
| 241 |
|
| 242 |
-
model_name_or_path = "TheBloke/Tess-
|
| 243 |
|
| 244 |
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
|
| 245 |
model = AutoModelForCausalLM.from_pretrained(
|
|
@@ -353,29 +353,23 @@ And thank you again to a16z for their generous grant.
|
|
| 353 |
|
| 354 |
<!-- footer end -->
|
| 355 |
|
| 356 |
-
# Original model card: Migel Tissera's Tess
|
| 357 |
|
| 358 |
|
| 359 |
# Tess
|
| 360 |
|
| 361 |
-

|
| 42 |
+
- Original model: [Tess M Creative v1.0](https://huggingface.co/migtissera/Tess-M-Creative-v1.0)
|
| 43 |
|
| 44 |
<!-- description start -->
|
| 45 |
## Description
|
| 46 |
|
| 47 |
+
This repo contains AWQ model files for [Migel Tissera's Tess M Creative v1.0](https://huggingface.co/migtissera/Tess-M-Creative-v1.0).
|
| 48 |
|
| 49 |
These files were quantised using hardware kindly provided by [Massed Compute](https://massedcompute.com/).
|
| 50 |
|
|
|
|
| 65 |
<!-- repositories-available start -->
|
| 66 |
## Repositories available
|
| 67 |
|
| 68 |
+
* [AWQ model(s) for GPU inference.](https://huggingface.co/TheBloke/Tess-M-Creative-v1.0-AWQ)
|
| 69 |
+
* [GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/Tess-M-Creative-v1.0-GPTQ)
|
| 70 |
+
* [2, 3, 4, 5, 6 and 8-bit GGUF models for CPU+GPU inference](https://huggingface.co/TheBloke/Tess-M-Creative-v1.0-GGUF)
|
| 71 |
+
* [Migel Tissera's original unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/migtissera/Tess-M-Creative-v1.0)
|
| 72 |
<!-- repositories-available end -->
|
| 73 |
|
| 74 |
<!-- prompt-template start -->
|
|
|
|
| 93 |
|
| 94 |
| Branch | Bits | GS | AWQ Dataset | Seq Len | Size |
|
| 95 |
| ------ | ---- | -- | ----------- | ------- | ---- |
|
| 96 |
+
| [main](https://huggingface.co/TheBloke/Tess-M-Creative-v1.0-AWQ/tree/main) | 4 | 128 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-raw-v1) | 4096 | 19.23 GB
|
| 97 |
|
| 98 |
<!-- README_AWQ.md-provided-files end -->
|
| 99 |
|
|
|
|
| 105 |
It is strongly recommended to use the text-generation-webui one-click-installers unless you're sure you know how to make a manual install.
|
| 106 |
|
| 107 |
1. Click the **Model tab**.
|
| 108 |
+
2. Under **Download custom model or LoRA**, enter `TheBloke/Tess-M-Creative-v1.0-AWQ`.
|
| 109 |
3. Click **Download**.
|
| 110 |
4. The model will start downloading. Once it's finished it will say "Done".
|
| 111 |
5. In the top left, click the refresh icon next to **Model**.
|
| 112 |
+
6. In the **Model** dropdown, choose the model you just downloaded: `Tess-M-Creative-v1.0-AWQ`
|
| 113 |
7. Select **Loader: AutoAWQ**.
|
| 114 |
8. Click Load, and the model will load and is now ready for use.
|
| 115 |
9. If you want any custom settings, set them and then click **Save settings for this model** followed by **Reload the Model** in the top right.
|
|
|
|
| 127 |
For example:
|
| 128 |
|
| 129 |
```shell
|
| 130 |
+
python3 -m vllm.entrypoints.api_server --model TheBloke/Tess-M-Creative-v1.0-AWQ --quantization awq --dtype auto
|
| 131 |
```
|
| 132 |
|
| 133 |
- When using vLLM from Python code, again set `quantization=awq`.
|
|
|
|
| 152 |
|
| 153 |
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
|
| 154 |
|
| 155 |
+
llm = LLM(model="TheBloke/Tess-M-Creative-v1.0-AWQ", quantization="awq", dtype="auto")
|
| 156 |
|
| 157 |
outputs = llm.generate(prompts, sampling_params)
|
| 158 |
|
|
|
|
| 172 |
Example Docker parameters:
|
| 173 |
|
| 174 |
```shell
|
| 175 |
+
--model-id TheBloke/Tess-M-Creative-v1.0-AWQ --port 3000 --quantize awq --max-input-length 3696 --max-total-tokens 4096 --max-batch-prefill-tokens 4096
|
| 176 |
```
|
| 177 |
|
| 178 |
Example Python code for interfacing with TGI (requires [huggingface-hub](https://github.com/huggingface/huggingface_hub) 0.17.0 or later):
|
|
|
|
| 239 |
```python
|
| 240 |
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
|
| 241 |
|
| 242 |
+
model_name_or_path = "TheBloke/Tess-M-Creative-v1.0-AWQ"
|
| 243 |
|
| 244 |
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
|
| 245 |
model = AutoModelForCausalLM.from_pretrained(
|
|
|
|
| 353 |
|
| 354 |
<!-- footer end -->
|
| 355 |
|
| 356 |
+
# Original model card: Migel Tissera's Tess M Creative v1.0
|
| 357 |
|
| 358 |
|
| 359 |
# Tess
|
| 360 |
|
| 361 |
+

|
| 362 |
|
| 363 |
+
Tess, short for Tessoro/Tessoso, is a general purpose Large Language Model series. Tess-M series is trained on the Yi-34B-200K base.
|
| 364 |
+
|
| 365 |
+
Tess-M-Creative is an AI most suited for creative tasks, such as writing, role play, design and exploring novel concepts. While it has been trained on STEM, its reasoning capabilities may lag state-of-the-art. Please download Tess-M-STEM series for reasoning, logic and STEM related tasks.
|
| 366 |
|
| 367 |
|
| 368 |
# Prompt Format:
|
| 369 |
|
| 370 |
```
|
| 371 |
+
SYSTEM: <ANY SYSTEM CONTEXT>
|
| 372 |
USER: What is the relationship between Earth's atmosphere, magnetic field and gravity?
|
| 373 |
ASSISTANT:
|
| 374 |
```
|
| 375 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|