LisaMegaWatts commited on
Commit
98ca52c
·
verified ·
1 Parent(s): 8924b60

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +89 -0
README.md ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ library_name: julia
5
+ pipeline_tag: text-generation
6
+ tags:
7
+ - philosophy
8
+ - classical-texts
9
+ - julia
10
+ - lux
11
+ - bpe
12
+ - rope
13
+ - rmsnorm
14
+ - swiglu
15
+ - small-language-model
16
+ - openai-compatible
17
+ datasets:
18
+ - LisaMegaWatts/philosophy-corpus
19
+ ---
20
+
21
+ # JuliaSLM
22
+
23
+ A ~5M parameter decoder-only transformer trained on classical philosophy and liberal arts texts. Built entirely in Julia with Lux.jl, featuring a modern architecture (RoPE, RMSNorm, SwiGLU, weight tying).
24
+
25
+ ## Architecture
26
+
27
+ | Component | Detail |
28
+ |---|---|
29
+ | Parameters | ~4.7M |
30
+ | Embedding dim | 256 |
31
+ | Layers | 6 |
32
+ | Attention heads | 4 |
33
+ | Head dim | 64 |
34
+ | FFN multiplier | 4x (SwiGLU) |
35
+ | Context length | 256 tokens |
36
+ | Positional encoding | Rotary (RoPE) |
37
+ | Normalization | RMSNorm (pre-norm) |
38
+ | Feed-forward | SwiGLU |
39
+ | Weight tying | Yes (embedding = output projection) |
40
+ | Tokenizer | BPE, 2000 subword tokens |
41
+
42
+ ## Training
43
+
44
+ - **Framework:** Lux.jl (pure Julia, explicit parameter/state management)
45
+ - **Optimizer:** AdamW (lr=6e-4, cosine decay to 6e-5, 500 warmup steps)
46
+ - **Precision:** Mixed F16/F32
47
+ - **Batch size:** 32
48
+ - **Steps:** 12,305 (Chinchilla-optimal: ~100M tokens at 20 tokens/param)
49
+ - **Gradient clipping:** max norm 1.0
50
+ - **Hardware:** RTX 3060 12GB
51
+
52
+ ## Training Data
53
+
54
+ Classical philosophy and liberal arts corpus (~2.4GB text) including:
55
+ - **Trivium** (grammar, logic, rhetoric): Aristotle, Plato, Cicero, Seneca, Marcus Aurelius, Epictetus
56
+ - **Quadrivium** (arithmetic, geometry, music, astronomy): Euclid, Ptolemy, Boethius
57
+ - **Bridging texts**: interdisciplinary classical works
58
+ - Supplementary WikiText data
59
+
60
+ Processed via a custom text pipeline with sentence-boundary chunking, Unicode normalization, and deduplication.
61
+
62
+ ## Files
63
+
64
+ - `final.jld2` — Model checkpoint (parameters in JLD2 format)
65
+ - `config.toml` — Model architecture configuration
66
+ - `vocab.json` — BPE vocabulary (2000 tokens, dict format)
67
+ - `merges.txt` — BPE merge rules
68
+
69
+ ## Inference
70
+
71
+ Served via an OpenAI-compatible API at [JuliaSLM Space](https://huggingface.co/spaces/LisaMegaWatts/JuliaSLM):
72
+
73
+ ```bash
74
+ curl -X POST https://lisamegawatts-juliaslm.hf.space/v1/chat/completions \
75
+ -H "Content-Type: application/json" \
76
+ -d '{"messages": [{"role": "user", "content": "the nature of"}], "stream": true, "temperature": 0.8, "top_k": 40}'
77
+ ```
78
+
79
+ Supports streaming (SSE), temperature, top-k, and top-p sampling. CPU-only inference with no Lux dependency at runtime (pure NNlib).
80
+
81
+ ## Lineage
82
+
83
+ Successor to [JuliaGPT](https://huggingface.co/LisaMegaWatts/JuliaGPT) (~5K params, character-level, scalar autograd). JuliaSLM upgrades to BPE tokenization, modern transformer components, and Chinchilla-optimal training at 1000x scale.
84
+
85
+ ## Links
86
+
87
+ - [Inference Space](https://huggingface.co/spaces/LisaMegaWatts/JuliaSLM)
88
+ - [Training data](https://huggingface.co/datasets/LisaMegaWatts/philosophy-corpus)
89
+ - [Source code](https://github.com/DavinciDreams/JuliaGPT)