itriedcoding commited on
Commit
f62675d
·
verified ·
1 Parent(s): 64728f0

Upload folder using huggingface_hub

Browse files
Files changed (4) hide show
  1. .gitattributes +1 -0
  2. README.md +63 -134
  3. gguf_convert.py +112 -0
  4. sage-f16.gguf +3 -0
.gitattributes CHANGED
@@ -1,3 +1,4 @@
1
  *.bin filter=lfs diff=lfs merge=lfs -text
2
  custom_llm_model.pth filter=lfs diff=lfs merge=lfs -text
3
  hf_model/tokenizer.pkl filter=lfs diff=lfs merge=lfs -text
 
 
1
  *.bin filter=lfs diff=lfs merge=lfs -text
2
  custom_llm_model.pth filter=lfs diff=lfs merge=lfs -text
3
  hf_model/tokenizer.pkl filter=lfs diff=lfs merge=lfs -text
4
+ sage-f16.gguf filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -2,7 +2,7 @@
2
 
3
  Sage is a custom-built transformer language model designed for text generation tasks. This model demonstrates the full lifecycle of building and publishing a custom AI model to Hugging Face.
4
 
5
- ## 📊 Model Overview
6
 
7
  - **Model Type**: Transformer-based language model
8
  - **Architecture**: Decoder-only transformer
@@ -12,11 +12,11 @@ Sage is a custom-built transformer language model designed for text generation t
12
  - **Number of Attention Heads**: 8
13
  - **Feedforward Size**: 1024
14
  - **Max Sequence Length**: 64
15
- - **Parameters**: ~3.2M
16
  - **Training Framework**: PyTorch
17
  - **License**: MIT
18
 
19
- ## 📚 Training Data
20
 
21
  Sage was trained on a curated dataset of example sentences covering:
22
  - Conversational phrases and greetings
@@ -26,9 +26,9 @@ Sage was trained on a curated dataset of example sentences covering:
26
  - Natural language processing applications
27
  - Model development and deployment practices
28
 
29
- The dataset consists of 10 carefully crafted examples designed to teach the model patterns in technical and conversational English.
30
 
31
- ## 🔧 Technical Specifications
32
 
33
  ### Model Architecture
34
  ```
@@ -38,21 +38,16 @@ TransformerLM(
38
  (transformer_encoder): TransformerEncoder(
39
  (layers): ModuleList(
40
  (0-3): TransformerEncoderLayer(
41
- (self_attn): MultiheadAttention(
42
- (embed_dim): 256
43
- (num_heads): 8
44
- )
45
- (linear1): Linear(in_features=256, out_features=1024, bias=True)
46
- (linear2): Linear(in_features=1024, out_features=256, bias=True)
47
- (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
48
- (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
49
- (dropout): Dropout(p=0.1, inplace=False)
50
- (dropout1): Dropout(p=0.1, inplace=False)
51
- (dropout2): Dropout(p=0.1, inplace=False)
52
  )
53
  )
54
  )
55
- (output_layer): Linear(in_features=256, out_features=40, bias=True)
56
  )
57
  ```
58
 
@@ -63,22 +58,19 @@ Sage uses a character-level tokenizer with:
63
  - Encoding: UTF-8 character mapping
64
  - Maximum sequence length: 64 tokens
65
 
66
- ## 🚀 Usage
67
 
68
  ### With Transformers Library
69
  ```python
70
  from transformers import AutoTokenizer, AutoModelForCausalLM
71
  import torch
72
 
73
- # Load model and tokenizer
74
  model_name = "itriedcoding/Sage"
75
  tokenizer = AutoTokenizer.from_pretrained(model_name)
76
  model = AutoModelForCausalLM.from_pretrained(model_name)
77
 
78
- # Generate text
79
  def generate_text(prompt, max_length=50, temperature=0.8):
80
  inputs = tokenizer.encode(prompt, return_tensors="pt")
81
-
82
  with torch.no_grad():
83
  outputs = model.generate(
84
  inputs,
@@ -87,12 +79,9 @@ def generate_text(prompt, max_length=50, temperature=0.8):
87
  do_sample=True,
88
  pad_token_id=tokenizer.eos_token_id
89
  )
90
-
91
  return tokenizer.decode(outputs[0], skip_special_tokens=True)
92
 
93
- # Examples
94
  print(generate_text("Hello"))
95
- print(generate_text("The weather"))
96
  print(generate_text("Deep learning"))
97
  ```
98
 
@@ -100,22 +89,13 @@ print(generate_text("Deep learning"))
100
  ```python
101
  import torch
102
  from modeling_transformer_lm import TransformerLM
103
- import json
104
- import pickle
105
-
106
- # Load model components
107
- with open('config.json', 'r') as f:
108
- config_dict = json.load(f)
109
 
110
- # For actual usage, you would load the tokenizer similarly
111
- # This example shows the structure
112
  model = TransformerLM.from_pretrained("itriedcoding/Sage")
113
  ```
114
 
115
- ## 🏗️ Model Card Metadata
116
 
117
- ```yaml
118
- ---
119
  library_name: transformers
120
  license: MIT
121
  base_model: custom-built
@@ -126,80 +106,27 @@ tags:
126
  - custom-model
127
  - educational
128
  pipeline_tag: text-generation
129
- widget:
130
- - example: Hello
131
- parameters: {max_length: 30, temperature: 0.7}
132
- - example: The weather
133
- parameters: {max_length: 30, temperature: 0.7}
134
- - example: Deep learning
135
- parameters: {max_length: 30, temperature: 0.7}
136
- ---
137
  ```
138
 
139
- ## 🤗 Hugging Face Spaces Deployment
140
-
141
- You can run this model in various Hugging Face Spaces templates:
142
-
143
- ### Streamlit Space
144
- Create a `streamlit_app.py`:
145
- ```python
146
- import streamlit as st
147
- from transformers import AutoTokenizer, AutoModelForCausalLM
148
- import torch
149
 
150
- @st.cache_resource
151
- def load_model():
152
- model_name = "itriedcoding/Sage"
153
- tokenizer = AutoTokenizer.from_pretrained(model_name)
154
- model = AutoModelForCausalLM.from_pretrained(model_name)
155
- return tokenizer, model
156
-
157
- def main():
158
- st.title("🤖 Sage Text Generator")
159
- st.write("A custom character-level language model")
160
-
161
- tokenizer, model = load_model()
162
-
163
- prompt = st.text_input("Enter your prompt:", "Hello")
164
- max_length = st.slider("Max length:", 10, 100, 30)
165
- temperature = st.slider("Temperature:", 0.1, 2.0, 0.8)
166
-
167
- if st.button("Generate"):
168
- with st.spinner("Generating..."):
169
- inputs = tokenizer.encode(prompt, return_tensors="pt")
170
- with torch.no_grad():
171
- outputs = model.generate(
172
- inputs,
173
- max_length=max_length,
174
- temperature=temperature,
175
- do_sample=True,
176
- pad_token_id=tokenizer.eos_token_id
177
- )
178
- result = tokenizer.decode(outputs[0], skip_special_tokens=True)
179
- st.write("**Generated text:**")
180
- st.write(result)
181
-
182
- if __name__ == "__main__":
183
- main()
184
- ```
185
 
186
  ### Gradio Space
187
- Create an `app.py`:
 
188
  ```python
189
  import gradio as gr
190
  from transformers import AutoTokenizer, AutoModelForCausalLM
191
  import torch
192
 
193
- def load_model():
194
- model_name = "itriedcoding/Sage"
195
- tokenizer = AutoTokenizer.from_pretrained(model_name)
196
- model = AutoModelForCausalLM.from_pretrained(model_name)
197
- return tokenizer, model
198
 
199
  def generate_text(prompt, max_length, temperature):
200
- tokenizer, model = load_model()
201
  inputs = tokenizer.encode(prompt, return_tensors="pt")
202
-
203
  with torch.no_grad():
204
  outputs = model.generate(
205
  inputs,
@@ -208,7 +135,6 @@ def generate_text(prompt, max_length, temperature):
208
  do_sample=True,
209
  pad_token_id=tokenizer.eos_token_id
210
  )
211
-
212
  return tokenizer.decode(outputs[0], skip_special_tokens=True)
213
 
214
  demo = gr.Interface(
@@ -219,36 +145,47 @@ demo = gr.Interface(
219
  gr.Slider(minimum=0.1, maximum=2.0, value=0.8, label="Temperature")
220
  ],
221
  outputs=gr.Textbox(label="Generated Text"),
222
- title="🤖 Sage Text Generator",
223
- description="Custom character-level language model for text generation"
224
  )
225
 
226
  if __name__ == "__main__":
227
  demo.launch()
228
  ```
229
 
230
- ## 📦 GGUF Quantization
 
 
231
 
232
- For efficient deployment, Sage is available in GGUF format:
 
233
 
234
- ### Available Quantizations
235
- - `sage-q4_0.gguf` - 4-bit quantization (balanced quality/size)
236
- - `sage-q5_0.gguf` - 5-bit quantization (higher quality)
237
- - `sage-q8_0.gguf` - 8-bit quantization (near-full precision)
238
- - `sage-f16.gguf` - Float16 (full precision)
 
239
 
240
- ### Using GGUF with llama.cpp
241
- ```bash
242
- # Install llama.cpp
243
- git clone https://github.com/ggerganov/llama.cpp
244
- cd llama.cpp
245
- make
 
 
 
 
246
 
247
- # Run the model
248
- ./main -m sage-q4_0.gguf -p "Hello" -n 30
249
  ```
250
 
251
- ## 📈 Performance & Limitations
 
 
 
252
 
253
  ### Intended Use
254
  - Educational demonstrations of transformer architectures
@@ -257,20 +194,13 @@ make
257
  - Learning about model deployment on Hugging Face
258
 
259
  ### Limitations
260
- - Small vocabulary (character-level only limits coherence)
261
- - Limited training data (10 examples)
262
  - Small model size (3.2M parameters)
263
  - Not suitable for production NLP applications
264
  - Best for short text generation (<50 tokens)
265
 
266
- ### Bias & Ethics
267
- As a small educational model trained on curated technical text:
268
- - Minimal harmful bias expected
269
- - Should not be used for decision-making applications
270
- - Outputs should be reviewed for appropriateness
271
- - Model reflects patterns in its limited training data
272
-
273
- ## 📝 Citation
274
 
275
  ```bibtex
276
  @misc{sage_model_2026,
@@ -279,21 +209,20 @@ As a small educational model trained on curated technical text:
279
  year = {2026},
280
  publisher = {Hugging Face},
281
  journal = {Hugging Face Model Hub},
282
- doi = {10.57967/hf/0000},
283
  url = {https://huggingface.co/itriedcoding/Sage}
284
  }
285
  ```
286
 
287
- ## 🔄 Training Reproducibility
288
 
289
  To reproduce this model:
290
- 1. Clone this repository
291
- 2. Install requirements: `pip install torch torchvision torchaudio pandas`
292
- 3. Run training: `python train_model.py`
293
- 4. The model will be saved as `custom_llm_model.pth`
294
 
295
- ## 📞 Contact
296
 
297
- For questions or collaboration opportunities:
298
  - Hugging Face: https://huggingface.co/itriedcoding
299
- - Model Issues: Use the "Issues" tab on this model page
 
 
2
 
3
  Sage is a custom-built transformer language model designed for text generation tasks. This model demonstrates the full lifecycle of building and publishing a custom AI model to Hugging Face.
4
 
5
+ ## Model Overview
6
 
7
  - **Model Type**: Transformer-based language model
8
  - **Architecture**: Decoder-only transformer
 
12
  - **Number of Attention Heads**: 8
13
  - **Feedforward Size**: 1024
14
  - **Max Sequence Length**: 64
15
+ - **Parameters**: ~3,195,944
16
  - **Training Framework**: PyTorch
17
  - **License**: MIT
18
 
19
+ ## Training Data
20
 
21
  Sage was trained on a curated dataset of example sentences covering:
22
  - Conversational phrases and greetings
 
26
  - Natural language processing applications
27
  - Model development and deployment practices
28
 
29
+ The dataset consists of 10 examples designed to teach the model patterns in technical and conversational English.
30
 
31
+ ## Technical Specifications
32
 
33
  ### Model Architecture
34
  ```
 
38
  (transformer_encoder): TransformerEncoder(
39
  (layers): ModuleList(
40
  (0-3): TransformerEncoderLayer(
41
+ (self_attn): MultiheadAttention(embed_dim=256, num_heads=8)
42
+ (linear1): Linear(256, 1024)
43
+ (linear2): Linear(1024, 256)
44
+ (norm1): LayerNorm(256)
45
+ (norm2): LayerNorm(256)
46
+ (dropout): Dropout(p=0.1)
 
 
 
 
 
47
  )
48
  )
49
  )
50
+ (output_layer): Linear(256, 40)
51
  )
52
  ```
53
 
 
58
  - Encoding: UTF-8 character mapping
59
  - Maximum sequence length: 64 tokens
60
 
61
+ ## Usage
62
 
63
  ### With Transformers Library
64
  ```python
65
  from transformers import AutoTokenizer, AutoModelForCausalLM
66
  import torch
67
 
 
68
  model_name = "itriedcoding/Sage"
69
  tokenizer = AutoTokenizer.from_pretrained(model_name)
70
  model = AutoModelForCausalLM.from_pretrained(model_name)
71
 
 
72
  def generate_text(prompt, max_length=50, temperature=0.8):
73
  inputs = tokenizer.encode(prompt, return_tensors="pt")
 
74
  with torch.no_grad():
75
  outputs = model.generate(
76
  inputs,
 
79
  do_sample=True,
80
  pad_token_id=tokenizer.eos_token_id
81
  )
 
82
  return tokenizer.decode(outputs[0], skip_special_tokens=True)
83
 
 
84
  print(generate_text("Hello"))
 
85
  print(generate_text("Deep learning"))
86
  ```
87
 
 
89
  ```python
90
  import torch
91
  from modeling_transformer_lm import TransformerLM
 
 
 
 
 
 
92
 
 
 
93
  model = TransformerLM.from_pretrained("itriedcoding/Sage")
94
  ```
95
 
96
+ ## Model Card Metadata
97
 
98
+ ```
 
99
  library_name: transformers
100
  license: MIT
101
  base_model: custom-built
 
106
  - custom-model
107
  - educational
108
  pipeline_tag: text-generation
 
 
 
 
 
 
 
 
109
  ```
110
 
111
+ ## Hugging Face Spaces Deployment
 
 
 
 
 
 
 
 
 
112
 
113
+ You can run Sage in the dedicated Hugging Face Space:
114
+ https://huggingface.co/spaces/itriedcoding/sage-space
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
115
 
116
  ### Gradio Space
117
+ The Space at `itriedcoding/sage-space` provides a Gradio interface for text generation.
118
+ Create a new Space with `app.py`:
119
  ```python
120
  import gradio as gr
121
  from transformers import AutoTokenizer, AutoModelForCausalLM
122
  import torch
123
 
124
+ model_name = "itriedcoding/Sage"
125
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
126
+ model = AutoModelForCausalLM.from_pretrained(model_name)
 
 
127
 
128
  def generate_text(prompt, max_length, temperature):
 
129
  inputs = tokenizer.encode(prompt, return_tensors="pt")
 
130
  with torch.no_grad():
131
  outputs = model.generate(
132
  inputs,
 
135
  do_sample=True,
136
  pad_token_id=tokenizer.eos_token_id
137
  )
 
138
  return tokenizer.decode(outputs[0], skip_special_tokens=True)
139
 
140
  demo = gr.Interface(
 
145
  gr.Slider(minimum=0.1, maximum=2.0, value=0.8, label="Temperature")
146
  ],
147
  outputs=gr.Textbox(label="Generated Text"),
148
+ title="Sage Text Generator",
149
+ description="Custom character-level language model"
150
  )
151
 
152
  if __name__ == "__main__":
153
  demo.launch()
154
  ```
155
 
156
+ ## GGUF Format
157
+
158
+ Sage is available in GGUF format as `sage-f16.gguf`.
159
 
160
+ ### Compatibility Warning
161
+ Sage uses a custom `transformer_lm` architecture that is NOT supported by standard llama.cpp or llama-cpp-python. The GGUF file is provided as a reference format and for custom inference implementations that can match Sage's architecture.
162
 
163
+ ### File Details
164
+ - **File**: `sage-f16.gguf` (12.7 MB)
165
+ - **Format**: GGUF (GGML Universal Format)
166
+ - **Precision**: Float16
167
+ - **Tensors**: 52 layers
168
+ - **Architecture**: `transformer_lm` (custom)
169
 
170
+ ### Using with Custom Inference
171
+ To use this GGUF file, you need a GGUF loader that supports Sage's custom architecture:
172
+ ```python
173
+ import gguf
174
+ import torch
175
+ import numpy as np
176
+
177
+ # Load GGUF file
178
+ reader = gguf.GGUFReader("sage-f16.gguf")
179
+ tensors = {t.name: torch.from_numpy(t.data) for t in reader.tensors}
180
 
181
+ # Map tensor names back to Sage architecture
182
+ # See gguf_convert.py for the tensor name mapping
183
  ```
184
 
185
+ ### GGUF Conversion
186
+ The conversion script `gguf_convert.py` is included in this repository. It uses the `gguf` Python library to convert the PyTorch checkpoint to GGUF format.
187
+
188
+ ## Performance & Limitations
189
 
190
  ### Intended Use
191
  - Educational demonstrations of transformer architectures
 
194
  - Learning about model deployment on Hugging Face
195
 
196
  ### Limitations
197
+ - Character-level tokenization limits coherence
198
+ - Small training dataset (10 examples)
199
  - Small model size (3.2M parameters)
200
  - Not suitable for production NLP applications
201
  - Best for short text generation (<50 tokens)
202
 
203
+ ## Citation
 
 
 
 
 
 
 
204
 
205
  ```bibtex
206
  @misc{sage_model_2026,
 
209
  year = {2026},
210
  publisher = {Hugging Face},
211
  journal = {Hugging Face Model Hub},
 
212
  url = {https://huggingface.co/itriedcoding/Sage}
213
  }
214
  ```
215
 
216
+ ## Training Reproducibility
217
 
218
  To reproduce this model:
219
+ 1. Clone the repository
220
+ 2. Install requirements: `pip install torch pandas`
221
+ 3. Run training: The model was trained using the script in `train_model.py`
222
+ 4. The trained checkpoint is saved as a PyTorch .pth file
223
 
224
+ ## Contact
225
 
 
226
  - Hugging Face: https://huggingface.co/itriedcoding
227
+ - Model Space: https://huggingface.co/spaces/itriedcoding/sage-space
228
+ - Issues: Use the "Issues" tab on this model page
gguf_convert.py ADDED
@@ -0,0 +1,112 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ import gguf
3
+ import numpy as np
4
+ import os
5
+ import sys
6
+ import pickle
7
+
8
+ # Character tokenizer class for loading the checkpoint
9
+ class CharacterTokenizer:
10
+ def __init__(self):
11
+ self.char_to_idx = {}
12
+ self.idx_to_char = {}
13
+ self.vocab_size = 0
14
+ self.pad_token_id = 0
15
+ self.unk_token_id = 1
16
+ def fit(self, texts):
17
+ chars = set()
18
+ for text in texts:
19
+ chars.update(list(str(text)))
20
+ self.char_to_idx['<PAD>'] = 0
21
+ self.char_to_idx['<UNK>'] = 1
22
+ for i, char in enumerate(sorted(chars)):
23
+ self.char_to_idx[char] = i + 2
24
+ self.idx_to_char = {v: k for k, v in self.char_to_idx.items()}
25
+ self.vocab_size = len(self.char_to_idx)
26
+ def encode(self, text, max_length=None, padding=False, truncation=False, return_tensors=None):
27
+ if isinstance(text, str):
28
+ text = [text]
29
+ encoded = []
30
+ for t in text:
31
+ tokens = [self.char_to_idx.get(c, self.unk_token_id) for c in str(t)]
32
+ if truncation and max_length:
33
+ tokens = tokens[:max_length]
34
+ if padding and max_length:
35
+ tokens = tokens + [self.pad_token_id] * (max_length - len(tokens))
36
+ encoded.append(tokens)
37
+ if return_tensors == 'pt':
38
+ return torch.tensor(encoded, dtype=torch.long)
39
+ return encoded
40
+ def decode(self, token_ids):
41
+ if isinstance(token_ids, torch.Tensor):
42
+ token_ids = token_ids.tolist()
43
+ chars = [self.idx_to_char.get(idx, '<UNK>') for idx in token_ids]
44
+ return ''.join(chars)
45
+
46
+ def convert_sage_to_gguf(model_path, output_path):
47
+ checkpoint = torch.load(model_path, map_location='cpu', weights_only=False)
48
+ state_dict = checkpoint['model_state_dict']
49
+
50
+ gguf_writer = gguf.GGUFWriter(output_path, "transformer_lm")
51
+
52
+ # Add metadata
53
+ gguf_writer.add_context_length(64)
54
+ gguf_writer.add_embedding_length(256)
55
+ gguf_writer.add_block_count(4)
56
+ gguf_writer.add_feed_forward_length(1024)
57
+ gguf_writer.add_head_count(8)
58
+ gguf_writer.add_head_count_kv(8)
59
+ gguf_writer.add_vocab_size(checkpoint['model_config']['vocab_size'])
60
+ gguf_writer.add_layer_norm_rms_eps(1e-5)
61
+ gguf_writer.add_name("Sage")
62
+ gguf_writer.add_license("MIT")
63
+
64
+ # Map Sage's tensor names to GGUF format
65
+ tensor_map = {}
66
+
67
+ # Embedding layers
68
+ tensor_map['embedding.weight'] = 'token_embd.weight'
69
+ tensor_map['pos_embedding.weight'] = 'position_embd.weight'
70
+ tensor_map['output_layer.weight'] = 'output.weight'
71
+ tensor_map['output_layer.bias'] = 'output.bias'
72
+
73
+ # Per-layer mappings
74
+ for i in range(4):
75
+ p = f'transformer_encoder.layers.{i}'
76
+ tensor_map[f'{p}.self_attn.in_proj_weight'] = f'blk.{i}.attn_q.weight'
77
+ tensor_map[f'{p}.self_attn.in_proj_bias'] = f'blk.{i}.attn_q.bias'
78
+ tensor_map[f'{p}.self_attn.out_proj.weight'] = f'blk.{i}.attn_output.weight'
79
+ tensor_map[f'{p}.self_attn.out_proj.bias'] = f'blk.{i}.attn_output.bias'
80
+ tensor_map[f'{p}.linear1.weight'] = f'blk.{i}.ffn_gate.weight'
81
+ tensor_map[f'{p}.linear1.bias'] = f'blk.{i}.ffn_gate.bias'
82
+ tensor_map[f'{p}.linear2.weight'] = f'blk.{i}.ffn_down.weight'
83
+ tensor_map[f'{p}.linear2.bias'] = f'blk.{i}.ffn_down.bias'
84
+ tensor_map[f'{p}.norm1.weight'] = f'blk.{i}.attn_norm.weight'
85
+ tensor_map[f'{p}.norm1.bias'] = f'blk.{i}.attn_norm.bias'
86
+ tensor_map[f'{p}.norm2.weight'] = f'blk.{i}.ffn_norm.weight'
87
+ tensor_map[f'{p}.norm2.bias'] = f'blk.{i}.ffn_norm.bias'
88
+
89
+ # Write tensors
90
+ for orig_name in state_dict:
91
+ tensor = state_dict[orig_name]
92
+ mapped_name = tensor_map.get(orig_name, orig_name)
93
+ arr = tensor.numpy().astype(np.float32)
94
+ gguf_writer.add_tensor(mapped_name, arr)
95
+
96
+ gguf_writer.write_header_to_file()
97
+ gguf_writer.write_kv_data_to_file()
98
+ gguf_writer.write_tensors_to_file()
99
+ gguf_writer.close()
100
+
101
+ print(f"GGUF file created: {output_path}")
102
+ print(f"Total tensors written: {len(state_dict)}")
103
+ print(f"NOTE: This GGUF file uses a custom architecture 'transformer_lm'")
104
+ print(f" and will NOT load in standard llama.cpp/llama-cpp-python")
105
+ print(f" without adding custom architecture support.")
106
+
107
+ script_dir = os.path.dirname(os.path.abspath(__file__))
108
+ pytorch_bin = os.path.join(script_dir, "pytorch_model.bin")
109
+ if os.path.exists(pytorch_bin):
110
+ convert_sage_to_gguf(pytorch_bin, "sage-f16.gguf")
111
+ else:
112
+ print(f"Model file {pytorch_bin} not found")
sage-f16.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b575f6e96cc39676e7d7b841cc990965477161f00d7831c5cf21d18d6a2a21e6
3
+ size 12787200