vpj commited on
Commit
c0844d6
·
1 Parent(s): d000bbe

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +94 -0
README.md CHANGED
@@ -1,3 +1,97 @@
1
  ---
 
 
 
 
 
2
  license: bigscience-openrail-m
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - pytorch
6
+ - causal-lm
7
  license: bigscience-openrail-m
8
  ---
9
+
10
+
11
+ [GeoV](https://github.com/geov-ai/geov)-9B-r2 is a 9 billion parameter causal language model.
12
+
13
+ It is still being trained and has the same architecture as the [GeoV-9b](https://huggingface.co/GeoV/GeoV-9b) model, but
14
+ the training data is sampled without replacement; (GeoV-9b models training data was sampled with replacement).
15
+
16
+ The GeoV model was designed by Georges Harik and uses
17
+ [Rotary Positional Embeddings with Relative distances (RoPER)](https://research.labml.ai/RoPER.html)
18
+ by [Georges Harik](https://twitter.com/gharik) and [Varuna Jayasiri](https://twitter.com/vpj).
19
+
20
+ [RoPER](https://research.labml.ai/RoPER.html),
21
+ in addition to using relative positions in the attention score calculation by RoPE embeddings,
22
+ adds relative positional information explicitly to value embeddings.
23
+ Specifically, it incorporates the relative positions of the tokens paid attention to.
24
+ RoPER has given better performance in some algorithmic tasks, and seems comparable to RoPE in language modeling.
25
+
26
+ ## Model details
27
+
28
+ - Developed by: [Georges Harik](http://twitter.com/gharik)
29
+ - Model type: Transformer-based Language Model
30
+ - Language: English
31
+
32
+ <figure style="width:30em">
33
+
34
+ | Hyperparameter | Value |
35
+ | ---------------------- | ----------- |
36
+ | n<sub>parameters</sub> | 9B |
37
+ | n<sub>layers</sub> | 32 |
38
+ | d<sub>model</sub> | 5120 |
39
+ | n<sub>heads</sub> | 40 |
40
+ | d<sub>head</sub> | 128 |
41
+ | n<sub>vocab</sub> | 65500 |
42
+ | Sequence Length | 2048 |
43
+ </figure>
44
+
45
+ The current released weights were trained on ~39 billion tokens.
46
+ We plan to continue training up to 300 billion tokens.
47
+ This training run is monolingual and uses c4en and english wikipedia datasets.
48
+
49
+ ## Test results
50
+
51
+ These are the results from [EleutherAI/lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) at 39B (tokens trained) checkpoint.
52
+
53
+ | Task |Version| Metric | Value | |Stderr|
54
+ |--------------|------:|--------|------:|---|-----:|
55
+ |anli_r1 | 0|acc | 0.3390|± |0.0150|
56
+ |anli_r2 | 0|acc | 0.3350|± |0.0149|
57
+ |anli_r3 | 0|acc | 0.3400|± |0.0137|
58
+ |hellaswag | 0|acc | 0.4332|± |0.0049|
59
+ | | |acc_norm| 0.5628|± |0.0050|
60
+ |lambada_openai| 0|ppl |13.2084|± |0.4599|
61
+ | | |acc | 0.4890|± |0.0070|
62
+ |mathqa | 0|acc | 0.2235|± |0.0076|
63
+ | | |acc_norm| 0.2275|± |0.0077|
64
+ |piqa | 0|acc | 0.7361|± |0.0103|
65
+ | | |acc_norm| 0.7399|± |0.0102|
66
+ |winogrande | 0|acc | 0.5596|± |0.0140|
67
+ |wsc | 0|acc | 0.3942|± |0.0482|
68
+
69
+
70
+ ## Installation
71
+
72
+ ```shell
73
+ pip install geov
74
+ ```
75
+
76
+ ## Generation
77
+
78
+ [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/geov-ai/geov/blob/master/notebooks/generate.ipynb)
79
+
80
+ ```python
81
+ from geov import GeoVForCausalLM, GeoVTokenizer
82
+
83
+ model = GeoVForCausalLM.from_pretrained("GeoV/GeoV-9b-r2")
84
+ tokenizer = GeoVTokenizer.from_pretrained("GeoV/GeoV-9b-r2")
85
+
86
+ prompt = "In mathematics, topology is the study of"
87
+
88
+ input_ids = tokenizer(prompt, return_tensors="pt").input_ids
89
+
90
+ gen_tokens = model.generate(
91
+ input_ids,
92
+ do_sample=True,
93
+ temperature=0.9,
94
+ max_length=100,
95
+ )
96
+ gen_text = tokenizer.batch_decode(gen_tokens)[0]
97
+ ```