emylton commited on
Commit
45a89fc
Β·
verified Β·
1 Parent(s): d1f7d93

Upload 8 files

Browse files
Files changed (3) hide show
  1. README.md +76 -13
  2. config.json +26 -2
  3. klinexa_el1_model.pt +2 -2
README.md CHANGED
@@ -1,17 +1,80 @@
1
- # KLINEXA-EL1 v10 β€” Full Fix
 
 
 
 
 
 
 
 
 
 
 
2
 
3
- ## Critical Fixes in v10
4
- - Architecture: EXACT match with original (tok_emb, dropout, grad checkpoint, weight tying)
5
- - Token format: [BOS][USER]question[ASST]answer[EOS] (matches training format)
6
- - Loss: causal shift + ignore_index=-100
7
- - Dataset: 500K samples converted to correct format
8
 
9
- ## Inference
10
- Format: `[BOS][USER]question[ASST]` β†’ generate
11
- DO NOT use `<user>...</user><assistant>` format β€” model was NOT trained with that.
12
 
13
- ## Architecture
14
- 16 layers, 1024 hidden, 16 heads, SwiGLU, RoPE, RMSNorm
15
- Vocab 32K, max seq 1024
16
 
17
- WARNING: Native model, use KlinexaConfig + KlinexaEL1 classes
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - id
5
+ tags:
6
+ - health
7
+ - maluku-tenggara
8
+ - native-llm
9
+ - from-scratch
10
+ - adaptive-learning
11
+ pipeline_tag: text-generation
12
+ ---
13
 
14
+ # KLINEXA-EL1 β€” Full Engine v5.1
 
 
 
 
15
 
16
+ **Kei Local Intelligence for Nexus Expert Analysis β€” Edition Level 1**
 
 
17
 
18
+ Native LLM dibangun dari NOL oleh **Emylton Leunufna** di Kota Langgur,
19
+ Kabupaten Maluku Tenggara, untuk domain kesehatan, statistik, iklim, dan geografi.
 
20
 
21
+ ## Model
22
+
23
+ | Parameter | Nilai |
24
+ |---|---|
25
+ | Total Parameters | 238.3M |
26
+ | Layers | 16 |
27
+ | Heads | 16 |
28
+ | Dimension | 1024 |
29
+ | FFN | 2816 |
30
+ | Context Length | 1024 tokens |
31
+ | Tokenizer | BPE 32K vocab (trained from scratch) |
32
+ | Components | RoPE, SwiGLU, RMSNorm, Gradient Checkpointing |
33
+
34
+ ## 23 Fitur Unik (Full Engine v5.1)
35
+
36
+ 1. **Persistent Memory** β€” Simpan fakta baru antar sesi
37
+ 2. **Knowledge Graph** β€” Relasi entitas terstruktur
38
+ 3. **URL Reader** β€” Baca & ingat konten dari link
39
+ 4. **File Reader** β€” Baca PDF, CSV, Excel yang diupload
40
+ 5. **Online Learning** β€” Update weights dari koreksi user real-time
41
+ 6. **Multi-turn Memory** β€” Ingat seluruh percakapan dalam sesi
42
+ 7. **Confidence Scoring** β€” Model tahu batas pengetahuannya
43
+ 8. **Auto-Summary** β€” Rangkum percakapan panjang
44
+ 9. **Domain-Aware Auto-Learn** β€” Otomatis belajar dari interaksi kesehatan & pemerintahan Malra
45
+ 10. **API Health Data Fetcher** β€” Data real-time dari BPS, Kemenkes, portal Malra
46
+ 11. **Geo-Temporal Context** β€” Sadar musim, cuaca, risiko kesehatan, & kalender lokal
47
+ 12. **Uncertainty Quantification** β€” Sistem ragu-ragu eksplisit dengan 5 level keyakinan
48
+ 13. **Fragmented Processing** β€” Pecah pertanyaan kompleks multi-topik otomatis
49
+ 14. **Medical Protocol Compliance** β€” Hard-coded guardrails medis
50
+ 15. **Recursive Self-Correction** β€” Micro-update weights + decay + domain-priority
51
+ 16. **Dual-Core Check-Before-Speak** β€” Generator + Verifier audit loop
52
+ 17. **Native Geospatial Memory** β€” GPS 24 faskes, jarak haversine, rute laut/darat
53
+ 18. **Self-Auditing Medical Law** β€” DOEN, wewenang faskes, regulasi Kemenkes/UU Kesehatan
54
+ 19. **Ethno-Medical Mapping** β€” 9 tanaman obat lokal + senyawa aktif + bukti ilmiah
55
+ 20. **Cultural Tone-Switcher** β€” Mode Dokter/Anak Daerah/Santai sesuai konteks budaya Kei
56
+ 21. **Drug-Herb Interaction** β€” 6 interaksi obat-ramuan berbahaya terdeteksi otomatis
57
+ 22. **Placebo & Empathy Layer** β€” Hormati spiritual/adat + sisipkan saran medis
58
+ 23. **Myth vs Fact Classifier** β€” 8 mitos kesehatan Malra dikoreksi secara halus & tegas
59
+
60
+ ## Training
61
+
62
+ - Pre-train corpus: Wikipedia ID + Statistik BPS + Iklim/Geo + Kesehatan Publik
63
+ - SFT data: 4,795 samples dari 7 dokumen resmi Maluku Tenggara
64
+ - Identity & key facts: hard-coded (engine layer, not in model weights)
65
+ - Pre-train steps: 2880
66
+ - SFT steps: 1710
67
+ - GPU: NVIDIA A100 80GB
68
+
69
+ ## Inference Format
70
+
71
+ Model menggunakan format SFT: `[BOS][USER]question[ASST]` β†’ generate.
72
+ JANGAN pakai `[SYS]` token β€” model tidak pernah dilatih dengan format itu.
73
+
74
+ ## Engine Architecture (v5.1)
75
+
76
+ - Model generate = bersih, tanpa context injection
77
+ - Identity/greeting = hard-coded responses (13 kategori)
78
+ - Post-processing: Drug-herb interaction, Myth check, Uncertainty
79
+ - Disabled (noise): Verifier, Legal audit, Medical guardrails non-critical, Auto-learn
80
+ - Slash commands: 25 perintah interaktif
config.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "model_name": "KLINEXA-EL1",
3
  "architecture": "GPT-style Decoder-Only Transformer",
4
- "version": "v7.0 (SFT v7 β€” v6 polish + stunting/disease/tual_kec patches)",
5
  "config": {
6
  "vocab_size": 32000,
7
  "max_seq_len": 1024,
@@ -18,7 +18,31 @@
18
  "<assistant>": 6,
19
  "<system>": 7
20
  },
21
- "inference_format": "[BOS][USER]question[ASST] -> generate",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
  "creator": "Emylton Leunufna",
23
  "location": "Kota Langgur, Kabupaten Maluku Tenggara"
24
  }
 
1
  {
2
  "model_name": "KLINEXA-EL1",
3
  "architecture": "GPT-style Decoder-Only Transformer",
4
+ "version": "Full Engine v5.1",
5
  "config": {
6
  "vocab_size": 32000,
7
  "max_seq_len": 1024,
 
18
  "<assistant>": 6,
19
  "<system>": 7
20
  },
21
+ "engine_features": [
22
+ "persistent_memory",
23
+ "knowledge_graph",
24
+ "url_reader",
25
+ "file_reader",
26
+ "online_learning",
27
+ "multi_turn_memory",
28
+ "confidence_scoring",
29
+ "auto_summary",
30
+ "domain_aware_auto_learn",
31
+ "api_health_data_fetcher",
32
+ "geo_temporal_context",
33
+ "uncertainty_quantification",
34
+ "fragmented_processing",
35
+ "medical_protocol_compliance",
36
+ "recursive_self_correction",
37
+ "dual_core_verifier",
38
+ "native_geospatial_memory",
39
+ "medical_law_audit",
40
+ "ethno_medical_mapping",
41
+ "cultural_tone_switcher",
42
+ "drug_herb_interaction",
43
+ "placebo_empathy_layer",
44
+ "myth_fact_classifier"
45
+ ],
46
  "creator": "Emylton Leunufna",
47
  "location": "Kota Langgur, Kabupaten Maluku Tenggara"
48
  }
klinexa_el1_model.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f2939191b2c5f9aa31c379ae4e4c988b86fcb9d800a4db9391afe0e9283e3ba1
3
- size 953349771
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5d680a8908a542622e32aa428cd83c5eb0005c98f334ecb20f98954e3529d0f0
3
+ size 953349707