AP-MAE Models
Collection
This collection comprises AP-MAE models trained on attention heads from the 3B, 7B, and 15B versions StarCoder2. Currently anonymized for paper review • 3 items • Updated
How to use LaughingLogits/AP-MAE-SC2-7B with Transformers:
# Load model directly
from transformers import APMAE
model = APMAE.from_pretrained("LaughingLogits/AP-MAE-SC2-7B", dtype="auto")This Model is currently anonymized during the paper review process.
The AP-MAE transformer model design and configuration is available in the reproduction package attached to the submission
This version of AP-MAE is trained on attention heads generated by StarCoder2-7B during inference. The inference task used for generating attention outputs is FiM token prediction for a random 3-10 length masked section of Java code, with exactly 256 tokens of surrounding context.
from ap_mae import APMAE
model = APMAE.from_pretrained(
"LaughingLogits/AP-MAE-SC2-7B"
)
# Load model directly from transformers import APMAE model = APMAE.from_pretrained("LaughingLogits/AP-MAE-SC2-7B", dtype="auto")