MK0727
/

lambda-160m

Text Generation

Model card Files Files and versions

MK0727 commited on 24 days ago

Commit

b169cd4

·

verified ·

1 Parent(s): f1635e5

Create README.md

Files changed (1) hide show

README.md +80 -0

README.md ADDED Viewed

	@@ -0,0 +1,80 @@

+---
+language:
+- ja
+library_name: transformers
+tags:
+- myllm
+- causal-lm
+- custom-code
+- safetensors
+pipeline_tag: text-generation
+---
+# lambda-160m
+lambda-160m is an experimental Japanese causal language model created with a custom `myllm` decoder-only Transformer implementation.
+All training code is publicly available at [KeisukeMiyamoto1324/myllm](https://github.com/KeisukeMiyamoto1324/myllm).
+## Model Details
+| Item | Value |
+|---|---:|
+| Parameters | 164.5M |
+| Architecture | Decoder-only Transformer |
+| Model type | `myllm` |
+| Context length | 1024 tokens |
+| Tokenizer | Byte-level BPE |
+| Vocabulary size | 65,536 |
+| Layers | 16 |
+| Hidden size | 768 |
+| Attention heads | 12 |
+| FFN size | 3,072 |
+## Training Data
+The model was pretrained on a Japanese text mixture.
+| Dataset | Share | Notes |
+|---|---:|---|
+| `hotchpotch/fineweb-2-edu-japanese` | 30% | Japanese web text, Wikipedia domains excluded |
+| `MK0727/CleanedWiki-jp` | 70% | Japanese Wikipedia-style text, ramped from 50% training progress |
+## Training Setup
+This model was trained on a single RTX PRO 6000.
+| Item | Value |
+|---|---:|
+| Optimizer | AdamW |
+| Learning rate | 2e-4 |
+| LR schedule | Warmup cosine |
+| Warmup steps | 2,000 |
+| Minimum LR ratio | 0.1 |
+| Batch size | 96 |
+| Max steps | 40,960 |
+## Usage
+This repository uses custom Transformers code, so `trust_remote_code=True` is required.
+```python
+from transformers import AutoModelForCausalLM
+from transformers import AutoTokenizer
+repo_id = "MK0727/lambda-160m"
+tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(repo_id, trust_remote_code=True)
+inputs = tokenizer("日本の首都は、", return_tensors="pt")
+outputs = model.generate(**inputs, max_new_tokens=64)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+## Limitations
+This model is not instruction-tuned or safety-aligned. It may generate incorrect, biased, unsafe, or low-quality text.
+The model was trained on a limited Japanese corpus mixture and has not been evaluated on standard benchmarks.