MK0727 commited on
Commit
b169cd4
·
verified ·
1 Parent(s): f1635e5

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +80 -0
README.md ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - ja
4
+ library_name: transformers
5
+ tags:
6
+ - myllm
7
+ - causal-lm
8
+ - custom-code
9
+ - safetensors
10
+ pipeline_tag: text-generation
11
+ ---
12
+
13
+ # lambda-160m
14
+
15
+ lambda-160m is an experimental Japanese causal language model created with a custom `myllm` decoder-only Transformer implementation.
16
+
17
+ All training code is publicly available at [KeisukeMiyamoto1324/myllm](https://github.com/KeisukeMiyamoto1324/myllm).
18
+
19
+ ## Model Details
20
+
21
+ | Item | Value |
22
+ |---|---:|
23
+ | Parameters | 164.5M |
24
+ | Architecture | Decoder-only Transformer |
25
+ | Model type | `myllm` |
26
+ | Context length | 1024 tokens |
27
+ | Tokenizer | Byte-level BPE |
28
+ | Vocabulary size | 65,536 |
29
+ | Layers | 16 |
30
+ | Hidden size | 768 |
31
+ | Attention heads | 12 |
32
+ | FFN size | 3,072 |
33
+
34
+ ## Training Data
35
+
36
+ The model was pretrained on a Japanese text mixture.
37
+
38
+ | Dataset | Share | Notes |
39
+ |---|---:|---|
40
+ | `hotchpotch/fineweb-2-edu-japanese` | 30% | Japanese web text, Wikipedia domains excluded |
41
+ | `MK0727/CleanedWiki-jp` | 70% | Japanese Wikipedia-style text, ramped from 50% training progress |
42
+
43
+ ## Training Setup
44
+
45
+ This model was trained on a single RTX PRO 6000.
46
+
47
+ | Item | Value |
48
+ |---|---:|
49
+ | Optimizer | AdamW |
50
+ | Learning rate | 2e-4 |
51
+ | LR schedule | Warmup cosine |
52
+ | Warmup steps | 2,000 |
53
+ | Minimum LR ratio | 0.1 |
54
+ | Batch size | 96 |
55
+ | Max steps | 40,960 |
56
+
57
+ ## Usage
58
+
59
+ This repository uses custom Transformers code, so `trust_remote_code=True` is required.
60
+
61
+ ```python
62
+ from transformers import AutoModelForCausalLM
63
+ from transformers import AutoTokenizer
64
+
65
+ repo_id = "MK0727/lambda-160m"
66
+
67
+ tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
68
+ model = AutoModelForCausalLM.from_pretrained(repo_id, trust_remote_code=True)
69
+
70
+ inputs = tokenizer("日本の首都は、", return_tensors="pt")
71
+ outputs = model.generate(**inputs, max_new_tokens=64)
72
+
73
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
74
+ ```
75
+
76
+ ## Limitations
77
+
78
+ This model is not instruction-tuned or safety-aligned. It may generate incorrect, biased, unsafe, or low-quality text.
79
+
80
+ The model was trained on a limited Japanese corpus mixture and has not been evaluated on standard benchmarks.