LaughingLogits
/

AP-MAE-SC2-3B

Model card Files Files and versions

LaughingLogits commited on Aug 15, 2024

Commit

f173236

·

verified ·

1 Parent(s): 92ff78f

Update README.md

Files changed (1) hide show

README.md +8 -0

README.md CHANGED Viewed

@@ -4,3 +4,11 @@ This Model is currently anonymized during the paper review process.
 The AP-MAE transformer model design and configuration is available at: https://github.com/LaughingLogits/attention-astronaut
 This version of AP-MAE is trained on attention heads generated by StarCoder2-3B during inference. The inference task used for generating attention outputs is FiM token prediction for a random 3-10 length masked section of Java code, with exactly 256 tokens of surrounding context.

 The AP-MAE transformer model design and configuration is available at: https://github.com/LaughingLogits/attention-astronaut
 This version of AP-MAE is trained on attention heads generated by StarCoder2-3B during inference. The inference task used for generating attention outputs is FiM token prediction for a random 3-10 length masked section of Java code, with exactly 256 tokens of surrounding context.
+# Usage:
+```
+from ap_mae import APMAE
+model = APMAE.from_pretrained(
+    "LaughingLogits/AP-MAE-SC2-3B"
+)
+```