Update README.md
Browse files
README.md
CHANGED
|
@@ -4,3 +4,11 @@ This Model is currently anonymized during the paper review process.
|
|
| 4 |
The AP-MAE transformer model design and configuration is available at: https://github.com/LaughingLogits/attention-astronaut
|
| 5 |
|
| 6 |
This version of AP-MAE is trained on attention heads generated by StarCoder2-3B during inference. The inference task used for generating attention outputs is FiM token prediction for a random 3-10 length masked section of Java code, with exactly 256 tokens of surrounding context.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
The AP-MAE transformer model design and configuration is available at: https://github.com/LaughingLogits/attention-astronaut
|
| 5 |
|
| 6 |
This version of AP-MAE is trained on attention heads generated by StarCoder2-3B during inference. The inference task used for generating attention outputs is FiM token prediction for a random 3-10 length masked section of Java code, with exactly 256 tokens of surrounding context.
|
| 7 |
+
|
| 8 |
+
# Usage:
|
| 9 |
+
```
|
| 10 |
+
from ap_mae import APMAE
|
| 11 |
+
model = APMAE.from_pretrained(
|
| 12 |
+
"LaughingLogits/AP-MAE-SC2-3B"
|
| 13 |
+
)
|
| 14 |
+
```
|