LisaMegaWatts commited on
Commit
98da2a6
·
verified ·
1 Parent(s): 98ca52c

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +33 -45
README.md CHANGED
@@ -2,6 +2,7 @@
2
  language:
3
  - en
4
  library_name: julia
 
5
  pipeline_tag: text-generation
6
  tags:
7
  - philosophy
@@ -14,76 +15,63 @@ tags:
14
  - swiglu
15
  - small-language-model
16
  - openai-compatible
 
17
  datasets:
18
  - LisaMegaWatts/philosophy-corpus
19
  ---
20
 
21
- # JuliaSLM
22
 
23
- A ~5M parameter decoder-only transformer trained on classical philosophy and liberal arts texts. Built entirely in Julia with Lux.jl, featuring a modern architecture (RoPE, RMSNorm, SwiGLU, weight tying).
24
 
25
- ## Architecture
 
 
26
 
27
  | Component | Detail |
28
  |---|---|
29
- | Parameters | ~4.7M |
 
30
  | Embedding dim | 256 |
31
  | Layers | 6 |
32
- | Attention heads | 4 |
33
- | Head dim | 64 |
34
- | FFN multiplier | 4x (SwiGLU) |
35
  | Context length | 256 tokens |
36
- | Positional encoding | Rotary (RoPE) |
37
- | Normalization | RMSNorm (pre-norm) |
38
- | Feed-forward | SwiGLU |
39
- | Weight tying | Yes (embedding = output projection) |
40
  | Tokenizer | BPE, 2000 subword tokens |
41
-
42
- ## Training
43
-
44
- - **Framework:** Lux.jl (pure Julia, explicit parameter/state management)
45
- - **Optimizer:** AdamW (lr=6e-4, cosine decay to 6e-5, 500 warmup steps)
46
- - **Precision:** Mixed F16/F32
47
- - **Batch size:** 32
48
- - **Steps:** 12,305 (Chinchilla-optimal: ~100M tokens at 20 tokens/param)
49
- - **Gradient clipping:** max norm 1.0
50
- - **Hardware:** RTX 3060 12GB
51
-
52
- ## Training Data
53
-
54
- Classical philosophy and liberal arts corpus (~2.4GB text) including:
55
- - **Trivium** (grammar, logic, rhetoric): Aristotle, Plato, Cicero, Seneca, Marcus Aurelius, Epictetus
56
- - **Quadrivium** (arithmetic, geometry, music, astronomy): Euclid, Ptolemy, Boethius
57
- - **Bridging texts**: interdisciplinary classical works
58
- - Supplementary WikiText data
59
-
60
- Processed via a custom text pipeline with sentence-boundary chunking, Unicode normalization, and deduplication.
61
 
62
  ## Files
63
 
64
- - `final.jld2` — Model checkpoint (parameters in JLD2 format)
65
- - `config.toml` — Model architecture configuration
66
- - `vocab.json` — BPE vocabulary (2000 tokens, dict format)
67
- - `merges.txt` — BPE merge rules
 
 
68
 
69
  ## Inference
70
 
71
- Served via an OpenAI-compatible API at [JuliaSLM Space](https://huggingface.co/spaces/LisaMegaWatts/JuliaSLM):
72
 
73
  ```bash
 
74
  curl -X POST https://lisamegawatts-juliaslm.hf.space/v1/chat/completions \
75
  -H "Content-Type: application/json" \
76
  -d '{"messages": [{"role": "user", "content": "the nature of"}], "stream": true, "temperature": 0.8, "top_k": 40}'
77
- ```
78
-
79
- Supports streaming (SSE), temperature, top-k, and top-p sampling. CPU-only inference with no Lux dependency at runtime (pure NNlib).
80
 
81
- ## Lineage
 
 
 
 
82
 
83
- Successor to [JuliaGPT](https://huggingface.co/LisaMegaWatts/JuliaGPT) (~5K params, character-level, scalar autograd). JuliaSLM upgrades to BPE tokenization, modern transformer components, and Chinchilla-optimal training at 1000x scale.
84
 
85
- ## Links
86
 
87
- - [Inference Space](https://huggingface.co/spaces/LisaMegaWatts/JuliaSLM)
88
- - [Training data](https://huggingface.co/datasets/LisaMegaWatts/philosophy-corpus)
89
- - [Source code](https://github.com/DavinciDreams/JuliaGPT)
 
 
 
2
  language:
3
  - en
4
  library_name: julia
5
+ license: mit
6
  pipeline_tag: text-generation
7
  tags:
8
  - philosophy
 
15
  - swiglu
16
  - small-language-model
17
  - openai-compatible
18
+ - chinchilla
19
  datasets:
20
  - LisaMegaWatts/philosophy-corpus
21
  ---
22
 
23
+ # JuliaSLM — Inference Artifacts
24
 
25
+ Serving-ready artifacts for the 5M parameter JuliaSLM transformer, packaged for the [JuliaSLM Space](https://huggingface.co/spaces/LisaMegaWatts/JuliaSLM). This repo contains the checkpoint, tokenizer, and config needed by the OpenAI-compatible inference server.
26
 
27
+ For full model documentation, training details, loss curves, and usage instructions, see the canonical model repo: **[LisaMegaWatts/julia-slm](https://huggingface.co/LisaMegaWatts/julia-slm)**.
28
+
29
+ ## Model Summary
30
 
31
  | Component | Detail |
32
  |---|---|
33
+ | Parameters | 5,037,312 |
34
+ | Architecture | Decoder-only Transformer (RoPE, RMSNorm, SwiGLU) |
35
  | Embedding dim | 256 |
36
  | Layers | 6 |
37
+ | Attention heads | 4 (head dim 64) |
 
 
38
  | Context length | 256 tokens |
 
 
 
 
39
  | Tokenizer | BPE, 2000 subword tokens |
40
+ | Weight tying | Yes |
41
+ | Training | Chinchilla-optimal (~100M tokens), AdamW, F16 mixed precision |
42
+ | Final val loss | 3.54 (PPL 34.5) |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
 
44
  ## Files
45
 
46
+ | File | Description |
47
+ |---|---|
48
+ | `final.jld2` | Model parameters (JLD2 format, 58MB) |
49
+ | `config.toml` | Architecture config (from 5m-chinchilla) |
50
+ | `vocab.json` | BPE vocabulary (2000 tokens, dict format) |
51
+ | `merges.txt` | BPE merge rules |
52
 
53
  ## Inference
54
 
55
+ The [JuliaSLM Space](https://huggingface.co/spaces/LisaMegaWatts/JuliaSLM) serves this model via an OpenAI-compatible API:
56
 
57
  ```bash
58
+ # Streaming
59
  curl -X POST https://lisamegawatts-juliaslm.hf.space/v1/chat/completions \
60
  -H "Content-Type: application/json" \
61
  -d '{"messages": [{"role": "user", "content": "the nature of"}], "stream": true, "temperature": 0.8, "top_k": 40}'
 
 
 
62
 
63
+ # Non-streaming
64
+ curl -X POST https://lisamegawatts-juliaslm.hf.space/v1/chat/completions \
65
+ -H "Content-Type: application/json" \
66
+ -d '{"messages": [{"role": "user", "content": "the nature of"}], "max_tokens": 200}'
67
+ ```
68
 
69
+ Supports streaming (SSE), temperature, top-k, and top-p sampling. CPU-only inference using pure NNlib (no Lux dependency at runtime).
70
 
71
+ ## Related
72
 
73
+ - **[LisaMegaWatts/julia-slm](https://huggingface.co/LisaMegaWatts/julia-slm)** — Canonical model repo with full training details, loss curves, architecture diagrams, and code examples
74
+ - **[JuliaSLM Space](https://huggingface.co/spaces/LisaMegaWatts/JuliaSLM)** — Live inference endpoint
75
+ - **[LisaMegaWatts/philosophy-corpus](https://huggingface.co/datasets/LisaMegaWatts/philosophy-corpus)** — Training dataset
76
+ - **[LisaMegaWatts/JuliaGPT](https://huggingface.co/LisaMegaWatts/JuliaGPT)** — Predecessor (~5K params, character-level)
77
+ - **[Source code](https://github.com/DavinciDreams/JuliaGPT)** — GitHub repository