techpro-saida commited on
Commit
b5f85a3
Β·
verified Β·
1 Parent(s): 6f095d7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -0
README.md CHANGED
@@ -31,6 +31,64 @@ It was trained on a curated dataset of software design patterns, debugging tips,
31
  - **Training Objective:** Causal language modeling
32
 
33
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
  ## Training Data
36
 
 
31
  - **Training Objective:** Causal language modeling
32
 
33
  ---
34
+ ## Model Configuration
35
+
36
+ | **Parameter** | **Value** |
37
+ | ----------------------------- | ------------------------------------- |
38
+ | **Model Type** | `mistral` |
39
+ | **Architecture** | `MistralForCausalLM` |
40
+ | **Vocab Size** | 32,768 |
41
+ | **Max Position Embeddings** | 32,768 |
42
+ | **Hidden Size** | 4,096 |
43
+ | **Intermediate Size** | 14,336 |
44
+ | **Number of Hidden Layers** | 32 |
45
+ | **Number of Attention Heads** | 32 |
46
+ | **Number of Key-Value Heads** | 8 |
47
+ | **Hidden Activation** | `silu` |
48
+ | **Initializer Range** | 0.02 |
49
+ | **RMS Norm Epsilon** | 1e-5 |
50
+ | **Dropout (Attention)** | 0.0 |
51
+ | **Use Cache** | True |
52
+ | **ROPE Theta** | 1,000,000.0 |
53
+ | **Quantization Method** | `bitsandbytes` |
54
+ | **Quantization Config** | 4-bit (nf4), `bfloat16` compute dtype |
55
+ | **Compute Dtype** | `float16` |
56
+ | **Load In 4bit** | βœ… Yes |
57
+ | **Load In 8bit** | ❌ No |
58
+ | **Tie Word Embeddings** | False |
59
+ | **Is Encoder-Decoder** | False |
60
+ | **BOS Token ID** | 1 |
61
+ | **EOS Token ID** | 2 |
62
+ | **Pad Token ID** | None |
63
+ | **Generation Settings** | |
64
+ | β†’ Max Length | 20 |
65
+ | β†’ Min Length | 0 |
66
+ | β†’ Temperature | 1.0 |
67
+ | β†’ Top-k | 50 |
68
+ | β†’ Top-p | 1.0 |
69
+ | β†’ Num Beams | 1 |
70
+ | β†’ Repetition Penalty | 1.0 |
71
+ | β†’ Early Stopping | False |
72
+ | **ID β†’ Label Map** | {0: `LABEL_0`, 1: `LABEL_1`} |
73
+ | **Label β†’ ID Map** | {'LABEL_0': 0, 'LABEL_1': 1} |
74
+ | **Training Framework** | Transformers v4.57.1 |
75
+ | **Quant Library** | bitsandbytes |
76
+ | **Local Path / Repo** | `./msci_software_engineering_slm_v1` |
77
+
78
+ ## Quantization
79
+ | **Parameter** | **Value** |
80
+ | --------------------------- | -------------- |
81
+ | `_load_in_4bit` | True |
82
+ | `_load_in_8bit` | False |
83
+ | `bnb_4bit_compute_dtype` | `bfloat16` |
84
+ | `bnb_4bit_quant_storage` | `uint8` |
85
+ | `bnb_4bit_quant_type` | `nf4` |
86
+ | `bnb_4bit_use_double_quant` | False |
87
+ | `load_in_4bit` | True |
88
+ | `load_in_8bit` | False |
89
+ | `quant_method` | `bitsandbytes` |
90
+
91
+
92
 
93
  ## Training Data
94