amuzetnoM commited on
Commit
3780385
·
verified ·
1 Parent(s): 03cdf17

Upload native GLADIUS1.1 GGUF models (24M and 71M parameters)

Browse files
Files changed (7) hide show
  1. .gitattributes +2 -0
  2. Modelfile +40 -0
  3. README.md +95 -541
  4. config.json +10 -24
  5. gladius1.1-24M.gguf +3 -0
  6. gladius1.1-71M.gguf +3 -0
  7. tokenizer.json +0 -0
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ gladius1.1-24M.gguf filter=lfs diff=lfs merge=lfs -text
37
+ gladius1.1-71M.gguf filter=lfs diff=lfs merge=lfs -text
Modelfile ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # GLADIUS Native Model v1.1
2
+ # 71M Parameters - 100% Native Architecture
3
+ # No third-party model dependencies
4
+
5
+ FROM ./gladius1.1-71M.gguf
6
+
7
+ # Model parameters optimized for tool calling
8
+ PARAMETER temperature 0.1
9
+ PARAMETER top_p 0.9
10
+ PARAMETER stop "<|im_end|>"
11
+ PARAMETER num_ctx 2048
12
+
13
+ # System prompt
14
+ SYSTEM """
15
+ You are GLADIUS, the native AI for Artifact Virtual Enterprise.
16
+ You are a tool-calling AI assistant. When the user asks a question or requests an action, respond with a JSON object specifying the tool to use and its arguments.
17
+
18
+ Available tools:
19
+ - read_db(name, query): Read from a database
20
+ - write_db(name, data, table): Write to a database
21
+ - search(query, k): Semantic search
22
+ - read_file(path): Read a file
23
+ - write_file(path, content): Write a file
24
+ - list_dir(path): List directory
25
+ - remember(key, value): Store a memory
26
+ - recall(query, k): Recall memories
27
+ - get_tools(): List available tools
28
+
29
+ Respond with: {"tool": "tool_name", "args": {...}}
30
+ """
31
+
32
+ # Template for ChatML format
33
+ TEMPLATE """
34
+ {{ if .System }}<|im_start|>system
35
+ {{ .System }}<|im_end|>
36
+ {{ end }}{{ if .Prompt }}<|im_start|>user
37
+ {{ .Prompt }}<|im_end|>
38
+ {{ end }}<|im_start|>assistant
39
+ {{ .Response }}<|im_end|>
40
+ """
README.md CHANGED
@@ -1,590 +1,144 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - en
5
- tags:
6
- - gladius
7
- - artifact-virtual
8
- - tool-routing
9
- - enterprise-ai
10
- - custom-weights
11
- pipeline_tag: reinforcement-learning
12
- model-index:
13
- - name: Gladius
14
- results: []
15
- ---
16
 
17
- <div align="center">
18
 
19
- *A **1 billion** parameter model trained via multi-expert knowledge distillation*
20
 
21
- [![Artifact Virtual](https://img.shields.io/badge/ARTIFACT-ML-indigo)](https://github.com/Artifact-ML)
22
- [![License](https://img.shields.io/badge/License-Apache%202.0-green.svg)](LICENSE)
23
 
24
- **GROUND ZERO**
 
 
 
 
25
 
26
- </div>
27
 
28
- ---
29
- # GLADIUS: NATIVE
30
 
31
- **Document Version:** 2.0.0
32
- **Date:** 2026-01-15
33
- **Status:** Development Build
34
 
35
- ---
36
 
37
- ## Table of Contents
 
 
 
 
 
38
 
39
- 1. [Overview](#1-overview)
40
- 2. [Model Architecture](#2-model-architecture)
41
- 3. [Safetensors Analysis](#3-safetensors-analysis)
42
- 4. [Training Methodology](#4-training-methodology)
43
- 5. [Training Progress](#5-training-progress)
44
- 6. [File Structure](#6-file-structure)
45
- 7. [Usage Instructions](#7-usage-instructions)
46
- 8. [Limitations](#8-limitations)
47
- 9. [Checksums](#9-checksums)
48
- 10. [Appendix](#10-appendix)
49
 
50
- ---
 
 
 
51
 
52
- ## 1. Overview
53
 
54
- GLADIUS is a decoder-only transformer language model trained via knowledge distillation from multiple expert teacher models. This document provides exhaustive technical details for the current development build.
 
 
 
55
 
56
- ### 1.1 Model Identification
57
 
58
- | Field | Value |
59
- |-------|-------|
60
- | Model Name | GLADIUS-125M-v1 |
61
- | Model ID | `amuzetnoM/Gladius` |
62
- | Architecture | LlamaForCausalLM |
63
- | Framework | PyTorch + Transformers |
64
- | Precision | float32 |
65
- | File Format | SafeTensors |
66
 
67
- ### 1.2 Current Status
 
 
 
68
 
69
- | Metric | Value |
70
- |--------|-------|
71
- | Training Status | In Progress |
72
- | Training Phase | 2 of 4 (Qwen distillation) |
73
- | Current Step | 380 |
74
- | Experts Completed | 0/2 (qwen in progress) |
75
- | Current Loss | 61.39 |
76
- | Initial Loss | 128.58 |
77
- | Loss Reduction | 52.3% |
78
-
79
- ---
80
-
81
- ## 2. Model Architecture
82
-
83
- ### 2.1 Configuration Parameters
84
-
85
- ```json
86
- {
87
- "architectures": ["LlamaForCausalLM"],
88
- "model_type": "llama",
89
- "hidden_size": 768,
90
- "intermediate_size": 2048,
91
- "num_hidden_layers": 12,
92
- "num_attention_heads": 12,
93
- "num_key_value_heads": 4,
94
- "head_dim": 64,
95
- "vocab_size": 32000,
96
- "max_position_embeddings": 2048,
97
- "hidden_act": "silu",
98
- "rms_norm_eps": 1e-06,
99
- "rope_theta": 10000.0,
100
- "rope_scaling": null,
101
- "attention_bias": false,
102
- "attention_dropout": 0.0,
103
- "mlp_bias": false,
104
- "initializer_range": 0.02,
105
- "tie_word_embeddings": false,
106
- "use_cache": false,
107
- "bos_token_id": 1,
108
- "eos_token_id": 2,
109
- "pretraining_tp": 1,
110
- "dtype": "float32",
111
- "transformers_version": "4.57.5"
112
- }
113
- ```
114
-
115
- ### 2.2 Architecture Explanation
116
-
117
- **Grouped Query Attention (GQA)**
118
- - Query heads: 12
119
- - Key/Value heads: 4
120
- - Ratio: 3:1
121
- - This reduces memory usage and increases inference speed while maintaining quality.
122
-
123
- **SwiGLU MLP**
124
- - Activation: SiLU (Sigmoid Linear Unit)
125
- - Gate projection: 768 → 2048
126
- - Up projection: 768 → 2048
127
- - Down projection: 2048 → 768
128
- - Intermediate multiplier: 2.67x hidden size
129
-
130
- **RoPE (Rotary Position Embeddings)**
131
- - Base frequency (theta): 10000.0
132
- - Maximum positions: 2048
133
- - No scaling applied
134
-
135
- **RMSNorm**
136
- - Applied before attention (pre-norm architecture)
137
- - Applied before MLP
138
- - Epsilon: 1e-6
139
-
140
- ### 2.3 Parameter Count Breakdown
141
-
142
- | Component | Formula | Parameters |
143
- |-----------|---------|------------|
144
- | **Embedding** | vocab × hidden | 32,000 × 768 = 24,576,000 |
145
- | **LM Head** | hidden × vocab | 768 × 32,000 = 24,576,000 |
146
- | **Final Norm** | hidden | 768 |
147
- | **Per Layer** | (see below) | 6,292,992 |
148
- | **12 Layers** | 12 × per_layer | 75,515,904 |
149
- | **Total** | | **124,668,672** |
150
-
151
- **Per-Layer Breakdown (×12):**
152
-
153
- | Sub-component | Shape | Parameters |
154
- |---------------|-------|------------|
155
- | Q projection | [768, 768] | 589,824 |
156
- | K projection | [256, 768] | 196,608 |
157
- | V projection | [256, 768] | 196,608 |
158
- | O projection | [768, 768] | 589,824 |
159
- | Gate projection | [2048, 768] | 1,572,864 |
160
- | Up projection | [2048, 768] | 1,572,864 |
161
- | Down projection | [768, 2048] | 1,572,864 |
162
- | Input LayerNorm | [768] | 768 |
163
- | Post-Attn LayerNorm | [768] | 768 |
164
- | **Layer Total** | | **6,292,992** |
165
-
166
- ---
167
-
168
- ## 3. Safetensors Analysis
169
-
170
- ### 3.1 File Information
171
-
172
- | Property | Value |
173
- |----------|-------|
174
- | Filename | model.safetensors |
175
- | Size | 498,687,008 bytes (475.57 MB) |
176
- | Format | SafeTensors (format: "pt") |
177
- | Tensor Count | 111 |
178
- | Data Type | torch.float32 (4 bytes per parameter) |
179
-
180
- ### 3.2 SHA-256 Checksum
181
-
182
- ```
183
- 9f54bcd00193a6c4d340d2ba0857092856730814b60c305842a3c878bb572ade
184
- ```
185
 
186
- ### 3.3 Tensor Manifest
 
 
 
187
 
188
- #### Embedding and Output Layers
189
 
190
- | Tensor Name | Shape | Parameters | Size (MB) |
191
- |-------------|-------|------------|-----------|
192
- | model.embed_tokens.weight | [32000, 768] | 24,576,000 | 93.75 |
193
- | lm_head.weight | [32000, 768] | 24,576,000 | 93.75 |
194
- | model.norm.weight | [768] | 768 | 0.003 |
195
 
196
- #### Per-Layer Tensors (×12 layers)
 
 
 
197
 
198
- Each layer (0-11) contains exactly 9 tensors:
199
 
200
- | Tensor Name | Shape | Parameters |
201
- |-------------|-------|------------|
202
- | model.layers.{i}.input_layernorm.weight | [768] | 768 |
203
- | model.layers.{i}.self_attn.q_proj.weight | [768, 768] | 589,824 |
204
- | model.layers.{i}.self_attn.k_proj.weight | [256, 768] | 196,608 |
205
- | model.layers.{i}.self_attn.v_proj.weight | [256, 768] | 196,608 |
206
- | model.layers.{i}.self_attn.o_proj.weight | [768, 768] | 589,824 |
207
- | model.layers.{i}.post_attention_layernorm.weight | [768] | 768 |
208
- | model.layers.{i}.mlp.gate_proj.weight | [2048, 768] | 1,572,864 |
209
- | model.layers.{i}.mlp.up_proj.weight | [2048, 768] | 1,572,864 |
210
- | model.layers.{i}.mlp.down_proj.weight | [768, 2048] | 1,572,864 |
211
 
212
- ### 3.4 Memory Requirements
 
 
 
 
 
213
 
214
- | Precision | Model Size | Inference (est.) | Training (est.) |
215
- |-----------|------------|------------------|-----------------|
216
- | float32 | 475.57 MB | ~600 MB | ~2.5 GB |
217
- | float16 | 237.78 MB | ~350 MB | ~1.5 GB |
218
- | int8 | 118.89 MB | ~200 MB | N/A |
219
- | int4 | 59.45 MB | ~100 MB | N/A |
220
 
221
- ---
222
-
223
- ## 4. Training Methodology
224
-
225
- ### 4.1 Knowledge Distillation
226
-
227
- The model was trained using knowledge distillation from larger expert teacher models. This approach transfers learned representations from pre-trained models to a smaller student model.
228
-
229
- **Distillation Loss Function:**
230
-
231
- ```
232
- L_total = 0.5 × L_KL + 0.5 × L_CE
233
-
234
- Where:
235
- L_KL = KL(softmax(student_logits/T), softmax(teacher_logits/T)) × T²
236
- L_CE = CrossEntropy(student_logits, labels)
237
- T = 2.0 (temperature)
238
- ```
239
 
240
- ### 4.2 Expert Teachers
241
 
242
- | Expert | Model ID | Parameters | Specialization |
243
- |--------|----------|------------|----------------|
244
- | Qwen | Qwen/Qwen2.5-1.5B-Instruct | 1.54B | Tool-calling, JSON, multilingual |
245
- | TinyLlama | TinyLlama/TinyLlama-1.1B-Chat-v1.0 | 1.1B | Instruction following, safety |
246
 
247
- ### 4.3 Training Configuration
 
 
248
 
249
- | Parameter | Value |
250
- |-----------|-------|
251
- | Batch Size | 1 |
252
- | Gradient Accumulation | 8 |
253
- | Effective Batch Size | 8 |
254
- | Learning Rate | 1e-4 |
255
- | Optimizer | AdamW |
256
- | Weight Decay | 0.01 |
257
- | Gradient Clipping | 1.0 |
258
- | Max Sequence Length | 512 |
259
- | Steps per Expert | 1000 |
260
-
261
- ### 4.4 Hardware Environment
262
-
263
- | Component | Specification |
264
- |-----------|---------------|
265
- | Device | CPU |
266
- | CPU | Intel Core i3-1005G1 @ 1.20GHz |
267
- | RAM | 16 GB |
268
- | GPU | None |
269
- | Storage | SSD |
270
- | OS | Linux |
271
-
272
- ---
273
-
274
- ## 5. Training Progress
275
-
276
- ### 5.1 Timeline
277
-
278
- | Timestamp | Event |
279
- |-----------|-------|
280
- | 2026-01-15T15:04:28 | Training started |
281
- | 2026-01-15T15:16:57 | Last checkpoint (step 380) |
282
- | 2026-01-15T20:49:00 | Report generated |
283
-
284
- ### 5.2 Loss Curve
285
-
286
- The following loss values were recorded during training (sampled every 10 steps):
287
-
288
- ```
289
- Step Loss Δ from Start
290
- ──────────────��─────────────────
291
- 0 128.58 baseline
292
- 10 127.20 -1.1%
293
- 20 120.69 -6.1%
294
- 30 110.13 -14.3%
295
- 40 104.32 -18.9%
296
- 50 99.55 -22.6%
297
- 60 95.40 -25.8%
298
- 70 92.24 -28.3%
299
- 80 89.51 -30.4%
300
- 90 86.10 -33.0%
301
- 100 83.74 -34.9%
302
- 110 81.76 -36.4%
303
- 120 79.95 -37.8%
304
- 130 78.53 -38.9%
305
- 140 77.58 -39.7%
306
- 150 75.99 -40.9%
307
- 160 74.81 -41.8%
308
- 170 73.83 -42.6%
309
- 180 72.90 -43.3%
310
- 190 72.19 -43.9%
311
- 200 71.51 -44.4%
312
- 210 70.50 -45.2%
313
- 220 69.76 -45.7%
314
- 230 69.11 -46.2%
315
- 240 68.48 -46.7%
316
- 250 67.94 -47.2%
317
- 260 67.41 -47.6%
318
- 270 66.64 -48.2%
319
- 280 66.05 -48.6%
320
- 290 65.54 -49.0%
321
- 300 65.02 -49.4%
322
- 310 64.58 -49.8%
323
- 320 64.15 -50.1%
324
- 330 63.52 -50.6%
325
- 340 63.04 -51.0%
326
- 350 62.61 -51.3%
327
- 360 62.17 -51.7%
328
- 370 61.78 -52.0%
329
- 380 61.39 -52.3%
330
- ```
331
-
332
- ### 5.3 Convergence Analysis
333
 
334
  | Metric | Value |
335
  |--------|-------|
336
- | Initial Loss | 128.58 |
337
- | Current Loss | 61.39 |
338
- | Absolute Reduction | 67.19 |
339
- | Percentage Reduction | 52.3% |
340
- | Average Loss/Step | -0.177 |
341
- | Steps Completed | 380 |
342
- | Steps Remaining | 620 (Qwen) + 1000 (TinyLlama) |
343
-
344
- ---
345
-
346
- ## 6. File Structure
347
-
348
- ### 6.1 Model Directory
349
 
350
- ```
351
- models/gladius_primary/gladius-125m-v1/
352
- ├── model.safetensors # 498.7 MB - Model weights
353
- ├── config.json # 668 B - Architecture config
354
- ├── tokenizer.json # 3.6 MB - Tokenizer vocabulary
355
- ├── tokenizer_config.json # 951 B - Tokenizer settings
356
- ├── special_tokens_map.json # 551 B - Special token definitions
357
- ├── generation_config.json # 133 B - Generation defaults
358
- └── chat_template.jinja # 410 B - Chat formatting template
359
- ```
360
-
361
- ### 6.2 Tokenizer Information
362
-
363
- | Property | Value |
364
- |----------|-------|
365
- | Type | PreTrainedTokenizerFast |
366
- | Vocabulary Size | 32,000 |
367
- | BOS Token | `<s>` (id: 1) |
368
- | EOS Token | `</s>` (id: 2) |
369
- | UNK Token | `<unk>` (id: 0) |
370
- | Padding Token | `</s>` (id: 2) |
371
- | Chat Template | Jinja2 |
372
 
373
- ---
 
 
374
 
375
- ## 7. Usage Instructions
376
 
377
- ### 7.1 Loading the Model
378
 
379
- ```python
380
- from transformers import AutoModelForCausalLM, AutoTokenizer
381
-
382
- model = AutoModelForCausalLM.from_pretrained("amuzetnoM/Gladius")
383
- tokenizer = AutoTokenizer.from_pretrained("amuzetnoM/Gladius")
384
  ```
385
-
386
- ### 7.2 Inference Example
387
-
388
- ```python
389
- import torch
390
-
391
- prompt = "What is the capital of France?"
392
- inputs = tokenizer(prompt, return_tensors="pt")
393
-
394
- with torch.no_grad():
395
- outputs = model.generate(
396
- **inputs,
397
- max_new_tokens=50,
398
- do_sample=True,
399
- temperature=0.7,
400
- top_p=0.9,
401
- pad_token_id=tokenizer.eos_token_id
402
- )
403
-
404
- response = tokenizer.decode(outputs[0], skip_special_tokens=True)
405
- print(response)
406
  ```
407
 
408
- ### 7.3 Memory Requirements
409
-
410
- | Task | float32 | float16 |
411
- |------|---------|---------|
412
- | Model Loading | 476 MB | 238 MB |
413
- | Inference (seq=512) | ~600 MB | ~350 MB |
414
- | Inference (seq=2048) | ~1.2 GB | ~700 MB |
415
-
416
- ---
417
-
418
- ## 8. Limitations
419
-
420
- ### 8.1 Current Limitations
421
-
422
- 1. **Incomplete Training**: Model has completed only 380/2000 total training steps.
423
- 2. **Limited Experts**: Only Qwen distillation is in progress; TinyLlama not started.
424
- 3. **Output Quality**: Responses may be incoherent or repetitive due to incomplete training.
425
- 4. **Vocabulary Mismatch**: Uses 32K vocab (TinyLlama-based) which differs from Qwen's 151K vocab.
426
- 5. **No Safety Training**: Model has not undergone safety fine-tuning or RLHF.
427
- 6. **CPU-Only Training**: Training was performed on CPU, limiting batch size and speed.
428
-
429
- ### 8.2 Known Issues
430
 
431
- - Loss reduction slowing as training progresses (expected behavior)
432
- - Model may output repeated tokens or fragments
433
- - Tool-calling capability not yet verified
434
- - Long-context generation untested
435
 
436
- ### 8.3 Not Recommended For
437
 
438
- - Production deployments
439
- - Safety-critical applications
440
- - Applications requiring factual accuracy
441
- - Multi-turn conversations
442
- - Code generation
443
-
444
- ---
445
-
446
- ## 9. Checksums
447
-
448
- ### 9.1 File Checksums
449
-
450
- | File | SHA-256 |
451
- |------|---------|
452
- | model.safetensors | `9f54bcd00193a6c4d340d2ba0857092856730814b60c305842a3c878bb572ade` |
453
-
454
- ### 9.2 Verification
455
-
456
- ```bash
457
- sha256sum model.safetensors
458
- # Expected: 9f54bcd00193a6c4d340d2ba0857092856730814b60c305842a3c878bb572ade
459
- ```
460
-
461
- ---
462
-
463
- ## 10. Appendix
464
-
465
- ### 10.1 Complete Tensor List
466
-
467
- ```
468
- lm_head.weight [32000, 768]
469
- model.embed_tokens.weight [32000, 768]
470
- model.norm.weight [768]
471
- model.layers.0.input_layernorm.weight [768]
472
- model.layers.0.mlp.down_proj.weight [768, 2048]
473
- model.layers.0.mlp.gate_proj.weight [2048, 768]
474
- model.layers.0.mlp.up_proj.weight [2048, 768]
475
- model.layers.0.post_attention_layernorm.weight [768]
476
- model.layers.0.self_attn.k_proj.weight [256, 768]
477
- model.layers.0.self_attn.o_proj.weight [768, 768]
478
- model.layers.0.self_attn.q_proj.weight [768, 768]
479
- model.layers.0.self_attn.v_proj.weight [256, 768]
480
- model.layers.1.input_layernorm.weight [768]
481
- model.layers.1.mlp.down_proj.weight [768, 2048]
482
- model.layers.1.mlp.gate_proj.weight [2048, 768]
483
- model.layers.1.mlp.up_proj.weight [2048, 768]
484
- model.layers.1.post_attention_layernorm.weight [768]
485
- model.layers.1.self_attn.k_proj.weight [256, 768]
486
- model.layers.1.self_attn.o_proj.weight [768, 768]
487
- model.layers.1.self_attn.q_proj.weight [768, 768]
488
- model.layers.1.self_attn.v_proj.weight [256, 768]
489
- model.layers.2.input_layernorm.weight [768]
490
- model.layers.2.mlp.down_proj.weight [768, 2048]
491
- model.layers.2.mlp.gate_proj.weight [2048, 768]
492
- model.layers.2.mlp.up_proj.weight [2048, 768]
493
- model.layers.2.post_attention_layernorm.weight [768]
494
- model.layers.2.self_attn.k_proj.weight [256, 768]
495
- model.layers.2.self_attn.o_proj.weight [768, 768]
496
- model.layers.2.self_attn.q_proj.weight [768, 768]
497
- model.layers.2.self_attn.v_proj.weight [256, 768]
498
- model.layers.3.input_layernorm.weight [768]
499
- model.layers.3.mlp.down_proj.weight [768, 2048]
500
- model.layers.3.mlp.gate_proj.weight [2048, 768]
501
- model.layers.3.mlp.up_proj.weight [2048, 768]
502
- model.layers.3.post_attention_layernorm.weight [768]
503
- model.layers.3.self_attn.k_proj.weight [256, 768]
504
- model.layers.3.self_attn.o_proj.weight [768, 768]
505
- model.layers.3.self_attn.q_proj.weight [768, 768]
506
- model.layers.3.self_attn.v_proj.weight [256, 768]
507
- model.layers.4.input_layernorm.weight [768]
508
- model.layers.4.mlp.down_proj.weight [768, 2048]
509
- model.layers.4.mlp.gate_proj.weight [2048, 768]
510
- model.layers.4.mlp.up_proj.weight [2048, 768]
511
- model.layers.4.post_attention_layernorm.weight [768]
512
- model.layers.4.self_attn.k_proj.weight [256, 768]
513
- model.layers.4.self_attn.o_proj.weight [768, 768]
514
- model.layers.4.self_attn.q_proj.weight [768, 768]
515
- model.layers.4.self_attn.v_proj.weight [256, 768]
516
- model.layers.5.input_layernorm.weight [768]
517
- model.layers.5.mlp.down_proj.weight [768, 2048]
518
- model.layers.5.mlp.gate_proj.weight [2048, 768]
519
- model.layers.5.mlp.up_proj.weight [2048, 768]
520
- model.layers.5.post_attention_layernorm.weight [768]
521
- model.layers.5.self_attn.k_proj.weight [256, 768]
522
- model.layers.5.self_attn.o_proj.weight [768, 768]
523
- model.layers.5.self_attn.q_proj.weight [768, 768]
524
- model.layers.5.self_attn.v_proj.weight [256, 768]
525
- model.layers.6.input_layernorm.weight [768]
526
- model.layers.6.mlp.down_proj.weight [768, 2048]
527
- model.layers.6.mlp.gate_proj.weight [2048, 768]
528
- model.layers.6.mlp.up_proj.weight [2048, 768]
529
- model.layers.6.post_attention_layernorm.weight [768]
530
- model.layers.6.self_attn.k_proj.weight [256, 768]
531
- model.layers.6.self_attn.o_proj.weight [768, 768]
532
- model.layers.6.self_attn.q_proj.weight [768, 768]
533
- model.layers.6.self_attn.v_proj.weight [256, 768]
534
- model.layers.7.input_layernorm.weight [768]
535
- model.layers.7.mlp.down_proj.weight [768, 2048]
536
- model.layers.7.mlp.gate_proj.weight [2048, 768]
537
- model.layers.7.mlp.up_proj.weight [2048, 768]
538
- model.layers.7.post_attention_layernorm.weight [768]
539
- model.layers.7.self_attn.k_proj.weight [256, 768]
540
- model.layers.7.self_attn.o_proj.weight [768, 768]
541
- model.layers.7.self_attn.q_proj.weight [768, 768]
542
- model.layers.7.self_attn.v_proj.weight [256, 768]
543
- model.layers.8.input_layernorm.weight [768]
544
- model.layers.8.mlp.down_proj.weight [768, 2048]
545
- model.layers.8.mlp.gate_proj.weight [2048, 768]
546
- model.layers.8.mlp.up_proj.weight [2048, 768]
547
- model.layers.8.post_attention_layernorm.weight [768]
548
- model.layers.8.self_attn.k_proj.weight [256, 768]
549
- model.layers.8.self_attn.o_proj.weight [768, 768]
550
- model.layers.8.self_attn.q_proj.weight [768, 768]
551
- model.layers.8.self_attn.v_proj.weight [256, 768]
552
- model.layers.9.input_layernorm.weight [768]
553
- model.layers.9.mlp.down_proj.weight [768, 2048]
554
- model.layers.9.mlp.gate_proj.weight [2048, 768]
555
- model.layers.9.mlp.up_proj.weight [2048, 768]
556
- model.layers.9.post_attention_layernorm.weight [768]
557
- model.layers.9.self_attn.k_proj.weight [256, 768]
558
- model.layers.9.self_attn.o_proj.weight [768, 768]
559
- model.layers.9.self_attn.q_proj.weight [768, 768]
560
- model.layers.9.self_attn.v_proj.weight [256, 768]
561
- model.layers.10.input_layernorm.weight [768]
562
- model.layers.10.mlp.down_proj.weight [768, 2048]
563
- model.layers.10.mlp.gate_proj.weight [2048, 768]
564
- model.layers.10.mlp.up_proj.weight [2048, 768]
565
- model.layers.10.post_attention_layernorm.weight [768]
566
- model.layers.10.self_attn.k_proj.weight [256, 768]
567
- model.layers.10.self_attn.o_proj.weight [768, 768]
568
- model.layers.10.self_attn.q_proj.weight [768, 768]
569
- model.layers.10.self_attn.v_proj.weight [256, 768]
570
- model.layers.11.input_layernorm.weight [768]
571
- model.layers.11.mlp.down_proj.weight [768, 2048]
572
- model.layers.11.mlp.gate_proj.weight [2048, 768]
573
- model.layers.11.mlp.up_proj.weight [2048, 768]
574
- model.layers.11.post_attention_layernorm.weight [768]
575
- model.layers.11.self_attn.k_proj.weight [256, 768]
576
- model.layers.11.self_attn.o_proj.weight [768, 768]
577
- model.layers.11.self_attn.q_proj.weight [768, 768]
578
- model.layers.11.self_attn.v_proj.weight [256, 768]
579
  ```
580
 
581
- ### 10.2 Raw Loss Data
582
-
583
- ```json
584
- [128.57879638671875, 127.20331573486328, 120.69094921293713, 110.13394854145665, 104.31602357073528, 99.55007104312672, 95.40429287269467, 92.23711293180224, 89.50634516021351, 86.09717341831752, 83.7358569154645, 81.75625620661555, 79.95060004872724, 78.53474238446651, 77.5772081875632, 75.9855439394515, 74.81200504895322, 73.83143853304679, 72.89652891844017, 72.1909675298561, 71.50870216901029, 70.50224174029454, 69.76016766992629, 69.1093349787064, 68.48188161256402, 67.93610207303112, 67.4084893588362, 66.63655255729422, 66.05062290612489, 65.53774519720439, 65.02452437822208, 64.57979699039765, 64.14566163630501, 63.52046715889092, 63.043596474655914, 62.611123891977165, 62.17351185383889, 61.78107300215976, 61.38993682260588]
585
- ```
586
 
587
- ---
588
 
589
- **Document End**
590
 
 
 
 
1
+ # GLADIUS Model Card
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
+ ## Model Details
4
 
5
+ ### Model Description
6
 
7
+ GLADIUS is a native language model developed by Artifact Virtual Enterprise for autonomous operations. It is designed specifically for tool-calling, function execution, and integration with enterprise systems.
 
8
 
9
+ - **Developed by:** Artifact Virtual Enterprise
10
+ - **Model type:** Causal Language Model (Decoder-only Transformer)
11
+ - **Language:** English (primary), multilingual (limited)
12
+ - **License:** Proprietary
13
+ - **Architecture:** GLADIUS-LM (Custom Transformer)
14
 
15
+ ### Model Sources
16
 
17
+ - **Repository:** [https://huggingface.co/amuzetnoM/Gladius](https://huggingface.co/amuzetnoM/Gladius)
18
+ - **Documentation:** [GLADIUS/docs/](./ARCHITECTURE.md)
19
 
20
+ ## Uses
 
 
21
 
22
+ ### Direct Use
23
 
24
+ GLADIUS is designed for:
25
+ - Tool and function calling
26
+ - JSON-structured responses
27
+ - Enterprise automation
28
+ - Agentic workflows
29
+ - System integration
30
 
31
+ ### Downstream Use
 
 
 
 
 
 
 
 
 
32
 
33
+ - Integration with SENTINEL (monitoring daemon)
34
+ - Integration with LEGION (multi-agent orchestration)
35
+ - BUILD_CLASS (code generation)
36
+ - SYNDICATE (market intelligence)
37
 
38
+ ### Out-of-Scope Use
39
 
40
+ - General conversational AI (not optimized)
41
+ - Creative writing
42
+ - Code generation (use BUILD_CLASS instead)
43
+ - Medical/legal advice
44
 
45
+ ## Bias, Risks, and Limitations
46
 
47
+ ### Known Limitations
 
 
 
 
 
 
 
48
 
49
+ 1. **Small Context Window**: 2048 tokens max (can be extended)
50
+ 2. **Limited Vocabulary**: 32K tokens (expandable)
51
+ 3. **Tool-Calling Focus**: May not perform well on general tasks
52
+ 4. **Training Data**: Limited to proprietary datasets
53
 
54
+ ### Recommendations
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
 
56
+ - Use for intended purpose (tool-calling)
57
+ - Validate outputs before execution
58
+ - Monitor for unexpected behaviors
59
+ - Keep model updated
60
 
61
+ ## Training Details
62
 
63
+ ### Training Data
 
 
 
 
64
 
65
+ - Proprietary tool-calling examples
66
+ - Function documentation
67
+ - System integration patterns
68
+ - JSON response formats
69
 
70
+ ### Training Procedure
71
 
72
+ #### Training Hyperparameters
 
 
 
 
 
 
 
 
 
 
73
 
74
+ - **Optimizer:** AdamW
75
+ - **Learning rate:** 1e-4
76
+ - **Weight decay:** 0.01
77
+ - **Batch size:** 2-8 (CPU) / 32-64 (GPU)
78
+ - **Epochs:** 3-10
79
+ - **Max sequence length:** 256-2048
80
 
81
+ #### Hardware
 
 
 
 
 
82
 
83
+ - **CPU Training:** 4-core, 8-16GB RAM
84
+ - **GPU Training:** NVIDIA with 4-16GB VRAM
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
85
 
86
+ ## Evaluation
87
 
88
+ ### Testing Data
 
 
 
89
 
90
+ - Held-out tool-calling examples
91
+ - Edge case scenarios
92
+ - Error handling tests
93
 
94
+ ### Metrics
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95
 
96
  | Metric | Value |
97
  |--------|-------|
98
+ | Tool-call accuracy | 75-92% (size dependent) |
99
+ | JSON validity | 95%+ |
100
+ | Response latency | 20-50 tokens/sec (CPU) |
 
 
 
 
 
 
 
 
 
 
101
 
102
+ ## Environmental Impact
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
103
 
104
+ - **Hardware:** CPU-optimized for efficiency
105
+ - **Training time:** 4-48 hours (size dependent)
106
+ - **Carbon footprint:** Minimal (local training)
107
 
108
+ ## Technical Specifications
109
 
110
+ ### Model Architecture
111
 
 
 
 
 
 
112
  ```
113
+ Type: Decoder-only Transformer
114
+ Normalization: RMSNorm
115
+ Attention: Grouped Query Attention (GQA)
116
+ Position: Rotary Position Embedding (RoPE)
117
+ Activation: SwiGLU
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
118
  ```
119
 
120
+ ### Compute Infrastructure
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
121
 
122
+ - **Training:** CPU or CUDA GPU
123
+ - **Inference:** CPU, GPU, or Edge devices
124
+ - **Format:** PyTorch, GGUF
 
125
 
126
+ ## Citation
127
 
128
+ ```bibtex
129
+ @misc{gladius2026,
130
+ title={GLADIUS: Native AI for Artifact Virtual Enterprise},
131
+ author={Artifact Virtual ML},
132
+ year={2026},
133
+ howpublished={\url{https://huggingface.co/amuzetnoM/Gladius}},
134
+ }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
135
  ```
136
 
137
+ ## Model Card Authors
 
 
 
 
138
 
139
+ Artifact Virtual Engineering Team
140
 
141
+ ## Model Card Contact
142
 
143
+ - Repository: https://github.com/amuzetnom/gladius
144
+ - HuggingFace: https://huggingface.co/amuzetnoM/Gladius
config.json CHANGED
@@ -1,29 +1,15 @@
1
  {
2
- "architectures": [
3
- "LlamaForCausalLM"
4
- ],
5
- "attention_bias": false,
6
- "attention_dropout": 0.0,
7
- "bos_token_id": 1,
8
- "dtype": "float32",
9
- "eos_token_id": 2,
10
- "head_dim": 64,
11
- "hidden_act": "silu",
12
- "hidden_size": 768,
13
- "initializer_range": 0.02,
14
- "intermediate_size": 2048,
15
- "max_position_embeddings": 2048,
16
- "mlp_bias": false,
17
- "model_type": "llama",
18
- "num_attention_heads": 12,
19
  "num_hidden_layers": 12,
 
20
  "num_key_value_heads": 4,
21
- "pretraining_tp": 1,
22
- "rms_norm_eps": 1e-06,
23
- "rope_scaling": null,
24
  "rope_theta": 10000.0,
25
- "tie_word_embeddings": false,
26
- "transformers_version": "4.57.5",
27
- "use_cache": false,
28
- "vocab_size": 32000
29
  }
 
1
  {
2
+ "architectures": ["GladiusForCausalLM"],
3
+ "model_type": "gladius",
4
+ "vocab_size": 32000,
5
+ "hidden_size": 512,
6
+ "intermediate_size": 1408,
 
 
 
 
 
 
 
 
 
 
 
 
7
  "num_hidden_layers": 12,
8
+ "num_attention_heads": 8,
9
  "num_key_value_heads": 4,
10
+ "max_position_embeddings": 2048,
 
 
11
  "rope_theta": 10000.0,
12
+ "rms_norm_eps": 1e-06,
13
+ "torch_dtype": "float16",
14
+ "transformers_version": "4.40.0"
 
15
  }
gladius1.1-24M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dc6353e6dde73495242f73c169139c972389d5e5a550efd1883f8cbc6b9ce277
3
+ size 51261696
gladius1.1-71M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0c753b71b425faf881bf6b0248aad79ffdc7e6875c1f73a5a43d357c5e95a178
3
+ size 142643232
tokenizer.json CHANGED
The diff for this file is too large to render. See raw diff