Text Generation
Transformers
GGUF
conversational
crodri commited on
Commit
ee24e7b
·
verified ·
1 Parent(s): 1814925

Mirror from source repo

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ ALIA-40b-instruct_Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
37
+ ALIA-40b-instruct_bos_Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
ALIA-40b-instruct_bos_Q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e3c22398cbf0aa43760dc56399323caf71ddf12498737282eb802b5ebf42b5aa
3
+ size 42969797824
README.md ADDED
@@ -0,0 +1,133 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ pipeline_tag: text-generation
5
+ language:
6
+ - ca
7
+ - en
8
+ - es
9
+ - eu
10
+ - gl
11
+ datasets:
12
+ - CohereLabs/aya_dataset
13
+ - projecte-aina/CoQCat
14
+ - databricks/databricks-dolly-15k
15
+ - projecte-aina/dolly3k_ca
16
+ - projecte-aina/MentorES
17
+ - projecte-aina/MentorCA
18
+ - HuggingFaceH4/no_robots
19
+ - projecte-aina/RAG_Multilingual
20
+ - Unbabel/TowerBlocks-v0.2
21
+ - OpenAssistant/oasst2
22
+ - open-r1/OpenR1-Math-220k
23
+ - HuggingFaceFW/fineweb-edu
24
+ base_model:
25
+ - BSC-LT/ALIA-40b-instruct
26
+ model_name: ALIA-40b-instruct
27
+
28
+ quantized_by: BSC-LT
29
+
30
+ ---
31
+
32
+ <!-- header start -->
33
+ ![](./images/logo_alia_2.png)
34
+
35
+ > [!WARNING]
36
+ > **WARNING:** ALIA-40b-Instruct is an instruction-tuned model with a preliminary alignment process. It has not yet undergone a full alignment procedure to ensure safety. The model may generate biased, factually incorrect, harmful, or inappropriate content. Users should **refer to the Limitations section** and apply additional filtering and alignment processes before deploying this model in production.
37
+
38
+ > [!NOTE]
39
+ > **Work In Progress** New versions will be available during the coming weeks/months.
40
+ >
41
+ > **Sampling Parameters:** For optimal performance, we recommend using temperatures close to zero (0 - 0.2). Additionally, we advise against using any type of repetition penalty, as from our experience, [it negatively impacts instructed model's responses](https://www.reddit.com/r/LocalLLaMA/comments/1g383mq/repetition_penalties_are_terribly_implemented_a/).
42
+
43
+ <!-- header end -->
44
+
45
+ # ALIA-40b-instruct - GGUF
46
+ - Model creator: [BSC-LT](https://huggingface.co/BSC-LT)
47
+ - Original model: [ALIA-40b-instruct](https://huggingface.co/BSC-LT/ALIA-40b-instruct)
48
+
49
+ ## Description
50
+
51
+ This repo contains GGUF format model files for [BSC-LT/ALIA-40b-instruct](https://huggingface.co/BSC-LT/ALIA-40b-instruct).
52
+
53
+
54
+ ### About GGUF
55
+
56
+ **GGUF** is the model file format introduced by the **llama.cpp** team on **August 21st, 2023**, replacing the older GGML format (now deprecated).
57
+ It brings significant improvements such as enhanced tokenization, proper handling of special tokens, embedded metadata (e.g., architecture, quantization type, tokenizer), and an extensible design for future compatibility.
58
+
59
+
60
+ ### Model Conversion
61
+
62
+ This model was converted from its original **Hugging Face** format to **GGUF** using the official tools provided in [**llama.cpp**](https://github.com/ggerganov/llama.cpp).
63
+ The conversion process embeds all necessary tokenizer and configuration data directly into the `.gguf` file for full portability.
64
+
65
+ The base model was exported in **BF16 precision** and then quantized for faster inference and smaller file size.
66
+
67
+ Here’s your section rewritten for clarity, conciseness, and clean formatting — it keeps your structure but improves readability, adds consistent comments, and fixes a few small syntax issues:
68
+
69
+ #### Commands Used
70
+
71
+ Below are the steps and commands used to convert and quantize the model to the **GGUF** format using [**llama.cpp**](https://github.com/ggerganov/llama.cpp).
72
+
73
+
74
+ ```bash
75
+ # Go to the llama.cpp directory
76
+ cd llama.cpp
77
+
78
+ # (Optional) Create a Python virtual environment
79
+ python -m venv venv
80
+ source venv/bin/activate
81
+
82
+ # Install dependencies required for conversion
83
+ pip install -r requirements.txt
84
+ ```
85
+
86
+ ```bash
87
+ # Convert Hugging Face model to GGUF (BF16 precision)
88
+ python3 convert_hf_to_gguf.py /path/to/hf_model \
89
+ --outfile /gpfs/path/to/output/ALIA-40b-instruct_bos_bf16.gguf \
90
+ --outtype bf16
91
+ ```
92
+
93
+ > 🛠️ **Skip the next section if you already have a build.**
94
+
95
+ ```bash
96
+ # Create and enter a build directory
97
+ mkdir build && cd build
98
+
99
+ # Configure and compile with CUDA support (optional)
100
+ cmake .. -DGGML_CUDA=ON -DGGML_NATIVE=OFF \
101
+ -DCMAKE_VERBOSE_MAKEFILE=ON \
102
+ -DCMAKE_BUILD_TYPE=Release
103
+
104
+ # Build with parallel jobs (adjust -j as needed)
105
+ cmake --build . --config Release --verbose -j 12
106
+ ```
107
+
108
+ ```bash
109
+ # Quantize the GGUF model
110
+ # Run this from the llama.cpp directory
111
+
112
+ QU=Q8_0 # Change to Q4_K_M, Q5_K_S, etc. as needed
113
+
114
+ ./build/bin/llama-quantize \
115
+ /gpfs/path/to/output/ALIA-40b-instruct_bos_bf16.gguf \
116
+ /gpfs/path/to/output/ALIA-40b-instruct_bos_${QU}.gguf \
117
+ ${QU}
118
+ ```
119
+
120
+ For detailed installation steps, build options, and quantization types, see the [**llama.cpp GitHub repository**](https://github.com/ggerganov/llama.cpp).
121
+
122
+ ---
123
+
124
+ Would you like me to make this even shorter — e.g., a “Quick Command Summary” version suitable for inclusion inside a Hugging Face model card (`README.md`)?
125
+
126
+
127
+ ## Prompt template:
128
+
129
+ ```
130
+ {{- bos_token }}{%- if messages[0]['role'] == 'system' %}{%- set system_message = messages[0]['content'] %}{%- set loop_messages = messages[1:] %}{{ '<|im_start|>system\n' + system_message + '<|im_end|>\n' }}{%- else %}{%- set loop_messages = messages %}{%- endif %}{% for message in loop_messages %}{%- if (message['role'] != 'user') and (message['role'] != 'assistant')%}{{ raise_exception('Only user and assistant roles are suported after the initial optional system message.') }}{% endif %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('After the optional system message, conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}
131
+ ```
132
+
133
+
images/logo_alia_2.png ADDED
images/tmp ADDED
File without changes