leitaofilho commited on
Commit
bcd4a45
·
verified ·
1 Parent(s): 94501fb

Upload folder using huggingface_hub

Browse files
Files changed (4) hide show
  1. README.md +121 -0
  2. config.json +20 -0
  3. model.safetensors +3 -0
  4. training_state.json +5 -0
README.md ADDED
@@ -0,0 +1,121 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-4.0
3
+ language:
4
+ - pt
5
+ tags:
6
+ - monotropic-model
7
+ - small-language-model
8
+ - structural-engineering
9
+ - timoshenko-beam-theory
10
+ - curriculum-learning
11
+ - validated-synthetic-data
12
+ - physics-informed-ai
13
+ - mlx
14
+ - apple-silicon
15
+ pipeline_tag: text-generation
16
+ library_name: mlx
17
+ ---
18
+
19
+ # Mini-Enedina: A Domain-Specialized Small Language Model for Structural Shaft Analysis
20
+
21
+ **Mini-Enedina** is a monotropic language model -- deliberately small and intensively specialized -- with 37.5 million parameters, designed exclusively for structural shaft analysis according to Timoshenko beam theory.
22
+
23
+ ## Model Details
24
+
25
+ | Parameter | Value |
26
+ |-----------|-------|
27
+ | **Parameters** | 37.57M |
28
+ | **Layers** | 7 |
29
+ | **Attention Heads** | 8 |
30
+ | **Model Dimension** | 512 |
31
+ | **Feed-Forward Dimension** | 2048 |
32
+ | **Vocabulary Size** | 8,012 (8,000 BPE + 12 Harmony tokens) |
33
+ | **Max Sequence Length** | 14,336 tokens |
34
+ | **Positional Encoding** | RoPE |
35
+ | **Normalization** | RMSNorm (pre-norm) |
36
+ | **Activation** | SiLU (SwiGLU) |
37
+ | **Framework** | MLX (Apple Silicon) |
38
+ | **Precision** | BFloat16 |
39
+ | **Model Size** | 143 MB |
40
+
41
+ ## Training
42
+
43
+ - **Dataset:** 60,000 physically validated samples (621M tokens) of Timoshenko shaft analysis problems
44
+ - **Training Strategy:** Multidimensional curriculum learning with 4 phases (Foundation, Intermediate, Advanced, Full)
45
+ - **Three Analysis Levels:**
46
+ - **Bachelor:** Deflection analysis (V, M, w, theta)
47
+ - **Master:** + Von Mises stress analysis
48
+ - **Doctor:** + Fatigue evaluation (Marin factors, Goodman criterion)
49
+ - **Hardware:** Apple M4 Pro, 48 GB unified memory
50
+ - **Training Time:** ~23 hours (14,920 steps)
51
+ - **Optimizer:** AdamW (lr=3e-4, cosine schedule with warmup)
52
+
53
+ ## Evaluation Results (6,000 held-out test samples)
54
+
55
+ | Metric | Overall | Bachelor | Master | Doctor |
56
+ |--------|---------|----------|--------|--------|
57
+ | **Loss** | 0.0787 | 0.0733 | 0.0804 | 0.0825 |
58
+ | **Perplexity** | 1.08 | 1.08 | 1.08 | 1.09 |
59
+ | **Correct Stop Token** | 94% | 97% | 100% | 85% |
60
+ | **Valid Harmony Structure** | 100% | 100% | 100% | 100% |
61
+
62
+ ## Output Format: Harmony-Enedina
63
+
64
+ The model generates structured responses using the Harmony-Enedina format with two channels:
65
+
66
+ 1. **Analysis Channel:** Chain-of-thought reasoning, problem classification, and qualitative analysis
67
+ 2. **Final Channel:** Complete Python solver code with numerical grounding, quantitative results, and validation summary
68
+
69
+ Domain-specific tokens (`<|shaft|>`, `<|python|>`, `<|numerical|>`, `<|latex|>`) demarcate semantic boundaries within the output.
70
+
71
+ ## Inference Configuration
72
+
73
+ The model was trained **without** sliding window attention, repetition penalty, or n-gram blocking. These techniques must remain **disabled** during inference:
74
+
75
+ ```python
76
+ # CORRECT configuration (BASELINE)
77
+ use_sliding_window = False
78
+ repetition_penalty = 1.0
79
+ no_repeat_ngram_size = 0
80
+ temperature = 0.0 # greedy decoding
81
+ ```
82
+
83
+ Enabling these techniques degrades performance from 94% to 8% correct stop tokens.
84
+
85
+ ## Intended Use
86
+
87
+ Mini-Enedina is designed for:
88
+
89
+ - Structural shaft analysis according to Timoshenko beam theory
90
+ - Engineering education and design iteration
91
+ - Generating complete, executable Python solver code
92
+ - Deployment on consumer hardware (edge, air-gapped environments)
93
+
94
+ **Important:** Model outputs should always be verified against independent calculations for safety-critical applications.
95
+
96
+ ## Limitations
97
+
98
+ - Handles exclusively shaft analysis according to Timoshenko theory
99
+ - Training language is Brazilian Portuguese
100
+ - Numerical accuracy is limited by tokenization granularity
101
+ - May struggle with support conditions or load combinations not represented in training
102
+
103
+ ## Citation
104
+
105
+ If you use this model, please cite:
106
+
107
+ ```bibtex
108
+ @article{leitaofilho2026minienedina,
109
+ title={Mini-Enedina: A Domain-Specialized Small Language Model for Structural Shaft Analysis Using Timoshenko Beam Theory},
110
+ author={Leit{\~a}o Filho, Antonio de Sousa and Barros Filho, Allan Kardec Duailibe and Lima, Fabr{\'i}cio Saul and Santos, Selby Mykael Lima dos and Sousa, Rejani Bandeira Vieira},
111
+ year={2026}
112
+ }
113
+ ```
114
+
115
+ ## Acknowledgments
116
+
117
+ This work was supported by Aia Context Ltda. and by FINEP -- Funding Authority for Studies and Projects, a Brazilian government agency for science, technology, and innovation linked to the Ministry of Science, Technology and Innovation (MCTI), under Contract No. 03.25.0080.00.
118
+
119
+ ## License
120
+
121
+ CC-BY-4.0
config.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "mini-enedina",
3
+ "architectures": ["MiniEnedina"],
4
+ "dim": 512,
5
+ "n_layers": 7,
6
+ "n_heads": 8,
7
+ "head_dim": 64,
8
+ "intermediate_size": 2048,
9
+ "vocab_size": 8012,
10
+ "max_seq_len": 14336,
11
+ "norm_eps": 1e-5,
12
+ "rope_theta": 10000.0,
13
+ "normalization": "rmsnorm",
14
+ "activation": "silu_swiglu",
15
+ "positional_encoding": "rope",
16
+ "weight_tying": true,
17
+ "total_parameters": 37570000,
18
+ "framework": "mlx",
19
+ "torch_dtype": "bfloat16"
20
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ef3b9e0cd0821fb69a2a9cb5efe5a9b2b7ab717587258d07941f55f867c5468b
3
+ size 150295004
training_state.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "step": 14000,
3
+ "best_val_loss": 0.07652725413288604,
4
+ "phase_idx": 3
5
+ }