s3dev-ai commited on
Commit
6aa46f6
·
verified ·
1 Parent(s): 94bcb27

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -16,16 +16,16 @@ tags:
16
  This page provides various quantisations of the [base model](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5), in GGUF format.
17
  - nomic-ai/nomic-embed-text-v1.5
18
 
19
- ## Model Description
20
 
21
  For a full model description, please refer to the [base model's](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5) card.
22
 
23
- ### How are the GGUF files created?
24
  After cloning the author's original base model repository, `llama.cpp` is used to convert the model to a GGML compatible file, using `f32` as the output type; preserving the original fidelity. The model is converted *un-altered*, unless otherwise stated.
25
 
26
  Finally, for each respective quantisation level, `llama.cpp`'s `llama-quantize` executable is called using the F32 GGUF file as the source file.
27
 
28
- ### Quantisations
29
 
30
  To help visualise the difference in model quantisation (i.e. level of retained fidelity), the image below shows the cosine similarity scores for each quanitsation, baselined against the 32-bit base model. It can be observed that lower fidelity yields a wider scatter in scores, relative to the 32-bit model.
31
 
 
16
  This page provides various quantisations of the [base model](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5), in GGUF format.
17
  - nomic-ai/nomic-embed-text-v1.5
18
 
19
+ # Model Description
20
 
21
  For a full model description, please refer to the [base model's](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5) card.
22
 
23
+ ## How are the GGUF files created?
24
  After cloning the author's original base model repository, `llama.cpp` is used to convert the model to a GGML compatible file, using `f32` as the output type; preserving the original fidelity. The model is converted *un-altered*, unless otherwise stated.
25
 
26
  Finally, for each respective quantisation level, `llama.cpp`'s `llama-quantize` executable is called using the F32 GGUF file as the source file.
27
 
28
+ ## Quantisations
29
 
30
  To help visualise the difference in model quantisation (i.e. level of retained fidelity), the image below shows the cosine similarity scores for each quanitsation, baselined against the 32-bit base model. It can be observed that lower fidelity yields a wider scatter in scores, relative to the 32-bit model.
31