Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,97 @@
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
license: bigscience-openrail-m
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
tags:
|
| 5 |
+
- pytorch
|
| 6 |
+
- causal-lm
|
| 7 |
license: bigscience-openrail-m
|
| 8 |
---
|
| 9 |
+
|
| 10 |
+
|
| 11 |
+
[GeoV](https://github.com/geov-ai/geov)-9B-r2 is a 9 billion parameter causal language model.
|
| 12 |
+
|
| 13 |
+
It is still being trained and has the same architecture as the [GeoV-9b](https://huggingface.co/GeoV/GeoV-9b) model, but
|
| 14 |
+
the training data is sampled without replacement; (GeoV-9b models training data was sampled with replacement).
|
| 15 |
+
|
| 16 |
+
The GeoV model was designed by Georges Harik and uses
|
| 17 |
+
[Rotary Positional Embeddings with Relative distances (RoPER)](https://research.labml.ai/RoPER.html)
|
| 18 |
+
by [Georges Harik](https://twitter.com/gharik) and [Varuna Jayasiri](https://twitter.com/vpj).
|
| 19 |
+
|
| 20 |
+
[RoPER](https://research.labml.ai/RoPER.html),
|
| 21 |
+
in addition to using relative positions in the attention score calculation by RoPE embeddings,
|
| 22 |
+
adds relative positional information explicitly to value embeddings.
|
| 23 |
+
Specifically, it incorporates the relative positions of the tokens paid attention to.
|
| 24 |
+
RoPER has given better performance in some algorithmic tasks, and seems comparable to RoPE in language modeling.
|
| 25 |
+
|
| 26 |
+
## Model details
|
| 27 |
+
|
| 28 |
+
- Developed by: [Georges Harik](http://twitter.com/gharik)
|
| 29 |
+
- Model type: Transformer-based Language Model
|
| 30 |
+
- Language: English
|
| 31 |
+
|
| 32 |
+
<figure style="width:30em">
|
| 33 |
+
|
| 34 |
+
| Hyperparameter | Value |
|
| 35 |
+
| ---------------------- | ----------- |
|
| 36 |
+
| n<sub>parameters</sub> | 9B |
|
| 37 |
+
| n<sub>layers</sub> | 32 |
|
| 38 |
+
| d<sub>model</sub> | 5120 |
|
| 39 |
+
| n<sub>heads</sub> | 40 |
|
| 40 |
+
| d<sub>head</sub> | 128 |
|
| 41 |
+
| n<sub>vocab</sub> | 65500 |
|
| 42 |
+
| Sequence Length | 2048 |
|
| 43 |
+
</figure>
|
| 44 |
+
|
| 45 |
+
The current released weights were trained on ~39 billion tokens.
|
| 46 |
+
We plan to continue training up to 300 billion tokens.
|
| 47 |
+
This training run is monolingual and uses c4en and english wikipedia datasets.
|
| 48 |
+
|
| 49 |
+
## Test results
|
| 50 |
+
|
| 51 |
+
These are the results from [EleutherAI/lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) at 39B (tokens trained) checkpoint.
|
| 52 |
+
|
| 53 |
+
| Task |Version| Metric | Value | |Stderr|
|
| 54 |
+
|--------------|------:|--------|------:|---|-----:|
|
| 55 |
+
|anli_r1 | 0|acc | 0.3390|± |0.0150|
|
| 56 |
+
|anli_r2 | 0|acc | 0.3350|± |0.0149|
|
| 57 |
+
|anli_r3 | 0|acc | 0.3400|± |0.0137|
|
| 58 |
+
|hellaswag | 0|acc | 0.4332|± |0.0049|
|
| 59 |
+
| | |acc_norm| 0.5628|± |0.0050|
|
| 60 |
+
|lambada_openai| 0|ppl |13.2084|± |0.4599|
|
| 61 |
+
| | |acc | 0.4890|± |0.0070|
|
| 62 |
+
|mathqa | 0|acc | 0.2235|± |0.0076|
|
| 63 |
+
| | |acc_norm| 0.2275|± |0.0077|
|
| 64 |
+
|piqa | 0|acc | 0.7361|± |0.0103|
|
| 65 |
+
| | |acc_norm| 0.7399|± |0.0102|
|
| 66 |
+
|winogrande | 0|acc | 0.5596|± |0.0140|
|
| 67 |
+
|wsc | 0|acc | 0.3942|± |0.0482|
|
| 68 |
+
|
| 69 |
+
|
| 70 |
+
## Installation
|
| 71 |
+
|
| 72 |
+
```shell
|
| 73 |
+
pip install geov
|
| 74 |
+
```
|
| 75 |
+
|
| 76 |
+
## Generation
|
| 77 |
+
|
| 78 |
+
[](https://colab.research.google.com/github/geov-ai/geov/blob/master/notebooks/generate.ipynb)
|
| 79 |
+
|
| 80 |
+
```python
|
| 81 |
+
from geov import GeoVForCausalLM, GeoVTokenizer
|
| 82 |
+
|
| 83 |
+
model = GeoVForCausalLM.from_pretrained("GeoV/GeoV-9b-r2")
|
| 84 |
+
tokenizer = GeoVTokenizer.from_pretrained("GeoV/GeoV-9b-r2")
|
| 85 |
+
|
| 86 |
+
prompt = "In mathematics, topology is the study of"
|
| 87 |
+
|
| 88 |
+
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
|
| 89 |
+
|
| 90 |
+
gen_tokens = model.generate(
|
| 91 |
+
input_ids,
|
| 92 |
+
do_sample=True,
|
| 93 |
+
temperature=0.9,
|
| 94 |
+
max_length=100,
|
| 95 |
+
)
|
| 96 |
+
gen_text = tokenizer.batch_decode(gen_tokens)[0]
|
| 97 |
+
```
|