Upload 8 files
Browse files- README.md +76 -13
- config.json +26 -2
- klinexa_el1_model.pt +2 -2
README.md
CHANGED
|
@@ -1,17 +1,80 @@
|
|
| 1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
-
#
|
| 4 |
-
- Architecture: EXACT match with original (tok_emb, dropout, grad checkpoint, weight tying)
|
| 5 |
-
- Token format: [BOS][USER]question[ASST]answer[EOS] (matches training format)
|
| 6 |
-
- Loss: causal shift + ignore_index=-100
|
| 7 |
-
- Dataset: 500K samples converted to correct format
|
| 8 |
|
| 9 |
-
|
| 10 |
-
Format: `[BOS][USER]question[ASST]` β generate
|
| 11 |
-
DO NOT use `<user>...</user><assistant>` format β model was NOT trained with that.
|
| 12 |
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
Vocab 32K, max seq 1024
|
| 16 |
|
| 17 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- id
|
| 5 |
+
tags:
|
| 6 |
+
- health
|
| 7 |
+
- maluku-tenggara
|
| 8 |
+
- native-llm
|
| 9 |
+
- from-scratch
|
| 10 |
+
- adaptive-learning
|
| 11 |
+
pipeline_tag: text-generation
|
| 12 |
+
---
|
| 13 |
|
| 14 |
+
# KLINEXA-EL1 β Full Engine v5.1
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
**Kei Local Intelligence for Nexus Expert Analysis β Edition Level 1**
|
|
|
|
|
|
|
| 17 |
|
| 18 |
+
Native LLM dibangun dari NOL oleh **Emylton Leunufna** di Kota Langgur,
|
| 19 |
+
Kabupaten Maluku Tenggara, untuk domain kesehatan, statistik, iklim, dan geografi.
|
|
|
|
| 20 |
|
| 21 |
+
## Model
|
| 22 |
+
|
| 23 |
+
| Parameter | Nilai |
|
| 24 |
+
|---|---|
|
| 25 |
+
| Total Parameters | 238.3M |
|
| 26 |
+
| Layers | 16 |
|
| 27 |
+
| Heads | 16 |
|
| 28 |
+
| Dimension | 1024 |
|
| 29 |
+
| FFN | 2816 |
|
| 30 |
+
| Context Length | 1024 tokens |
|
| 31 |
+
| Tokenizer | BPE 32K vocab (trained from scratch) |
|
| 32 |
+
| Components | RoPE, SwiGLU, RMSNorm, Gradient Checkpointing |
|
| 33 |
+
|
| 34 |
+
## 23 Fitur Unik (Full Engine v5.1)
|
| 35 |
+
|
| 36 |
+
1. **Persistent Memory** β Simpan fakta baru antar sesi
|
| 37 |
+
2. **Knowledge Graph** β Relasi entitas terstruktur
|
| 38 |
+
3. **URL Reader** β Baca & ingat konten dari link
|
| 39 |
+
4. **File Reader** β Baca PDF, CSV, Excel yang diupload
|
| 40 |
+
5. **Online Learning** β Update weights dari koreksi user real-time
|
| 41 |
+
6. **Multi-turn Memory** β Ingat seluruh percakapan dalam sesi
|
| 42 |
+
7. **Confidence Scoring** β Model tahu batas pengetahuannya
|
| 43 |
+
8. **Auto-Summary** β Rangkum percakapan panjang
|
| 44 |
+
9. **Domain-Aware Auto-Learn** β Otomatis belajar dari interaksi kesehatan & pemerintahan Malra
|
| 45 |
+
10. **API Health Data Fetcher** β Data real-time dari BPS, Kemenkes, portal Malra
|
| 46 |
+
11. **Geo-Temporal Context** β Sadar musim, cuaca, risiko kesehatan, & kalender lokal
|
| 47 |
+
12. **Uncertainty Quantification** β Sistem ragu-ragu eksplisit dengan 5 level keyakinan
|
| 48 |
+
13. **Fragmented Processing** β Pecah pertanyaan kompleks multi-topik otomatis
|
| 49 |
+
14. **Medical Protocol Compliance** β Hard-coded guardrails medis
|
| 50 |
+
15. **Recursive Self-Correction** β Micro-update weights + decay + domain-priority
|
| 51 |
+
16. **Dual-Core Check-Before-Speak** β Generator + Verifier audit loop
|
| 52 |
+
17. **Native Geospatial Memory** β GPS 24 faskes, jarak haversine, rute laut/darat
|
| 53 |
+
18. **Self-Auditing Medical Law** β DOEN, wewenang faskes, regulasi Kemenkes/UU Kesehatan
|
| 54 |
+
19. **Ethno-Medical Mapping** β 9 tanaman obat lokal + senyawa aktif + bukti ilmiah
|
| 55 |
+
20. **Cultural Tone-Switcher** β Mode Dokter/Anak Daerah/Santai sesuai konteks budaya Kei
|
| 56 |
+
21. **Drug-Herb Interaction** β 6 interaksi obat-ramuan berbahaya terdeteksi otomatis
|
| 57 |
+
22. **Placebo & Empathy Layer** β Hormati spiritual/adat + sisipkan saran medis
|
| 58 |
+
23. **Myth vs Fact Classifier** β 8 mitos kesehatan Malra dikoreksi secara halus & tegas
|
| 59 |
+
|
| 60 |
+
## Training
|
| 61 |
+
|
| 62 |
+
- Pre-train corpus: Wikipedia ID + Statistik BPS + Iklim/Geo + Kesehatan Publik
|
| 63 |
+
- SFT data: 4,795 samples dari 7 dokumen resmi Maluku Tenggara
|
| 64 |
+
- Identity & key facts: hard-coded (engine layer, not in model weights)
|
| 65 |
+
- Pre-train steps: 2880
|
| 66 |
+
- SFT steps: 1710
|
| 67 |
+
- GPU: NVIDIA A100 80GB
|
| 68 |
+
|
| 69 |
+
## Inference Format
|
| 70 |
+
|
| 71 |
+
Model menggunakan format SFT: `[BOS][USER]question[ASST]` β generate.
|
| 72 |
+
JANGAN pakai `[SYS]` token β model tidak pernah dilatih dengan format itu.
|
| 73 |
+
|
| 74 |
+
## Engine Architecture (v5.1)
|
| 75 |
+
|
| 76 |
+
- Model generate = bersih, tanpa context injection
|
| 77 |
+
- Identity/greeting = hard-coded responses (13 kategori)
|
| 78 |
+
- Post-processing: Drug-herb interaction, Myth check, Uncertainty
|
| 79 |
+
- Disabled (noise): Verifier, Legal audit, Medical guardrails non-critical, Auto-learn
|
| 80 |
+
- Slash commands: 25 perintah interaktif
|
config.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
{
|
| 2 |
"model_name": "KLINEXA-EL1",
|
| 3 |
"architecture": "GPT-style Decoder-Only Transformer",
|
| 4 |
-
"version": "
|
| 5 |
"config": {
|
| 6 |
"vocab_size": 32000,
|
| 7 |
"max_seq_len": 1024,
|
|
@@ -18,7 +18,31 @@
|
|
| 18 |
"<assistant>": 6,
|
| 19 |
"<system>": 7
|
| 20 |
},
|
| 21 |
-
"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
"creator": "Emylton Leunufna",
|
| 23 |
"location": "Kota Langgur, Kabupaten Maluku Tenggara"
|
| 24 |
}
|
|
|
|
| 1 |
{
|
| 2 |
"model_name": "KLINEXA-EL1",
|
| 3 |
"architecture": "GPT-style Decoder-Only Transformer",
|
| 4 |
+
"version": "Full Engine v5.1",
|
| 5 |
"config": {
|
| 6 |
"vocab_size": 32000,
|
| 7 |
"max_seq_len": 1024,
|
|
|
|
| 18 |
"<assistant>": 6,
|
| 19 |
"<system>": 7
|
| 20 |
},
|
| 21 |
+
"engine_features": [
|
| 22 |
+
"persistent_memory",
|
| 23 |
+
"knowledge_graph",
|
| 24 |
+
"url_reader",
|
| 25 |
+
"file_reader",
|
| 26 |
+
"online_learning",
|
| 27 |
+
"multi_turn_memory",
|
| 28 |
+
"confidence_scoring",
|
| 29 |
+
"auto_summary",
|
| 30 |
+
"domain_aware_auto_learn",
|
| 31 |
+
"api_health_data_fetcher",
|
| 32 |
+
"geo_temporal_context",
|
| 33 |
+
"uncertainty_quantification",
|
| 34 |
+
"fragmented_processing",
|
| 35 |
+
"medical_protocol_compliance",
|
| 36 |
+
"recursive_self_correction",
|
| 37 |
+
"dual_core_verifier",
|
| 38 |
+
"native_geospatial_memory",
|
| 39 |
+
"medical_law_audit",
|
| 40 |
+
"ethno_medical_mapping",
|
| 41 |
+
"cultural_tone_switcher",
|
| 42 |
+
"drug_herb_interaction",
|
| 43 |
+
"placebo_empathy_layer",
|
| 44 |
+
"myth_fact_classifier"
|
| 45 |
+
],
|
| 46 |
"creator": "Emylton Leunufna",
|
| 47 |
"location": "Kota Langgur, Kabupaten Maluku Tenggara"
|
| 48 |
}
|
klinexa_el1_model.pt
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:5d680a8908a542622e32aa428cd83c5eb0005c98f334ecb20f98954e3529d0f0
|
| 3 |
+
size 953349707
|