Memory-Augmented Llama Model (Llama-3-8B-Instruct)
This repository contains the base weights for Llama-3-8B-Instruct packaged with custom code for the InferenceMemoryWrapper.
This allows loading the model with memory capabilities using trust_remote_code=True.
Model Details
- Base Model: Llama-3-8B-Instruct
- Wrapper:
InferenceMemoryWrapper - Memory Size: 4096
- Memory Dims: 4096
- Memory Storage (approx): 64.0 MB (FP16, buffer + state) if buffer/state exist
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch # Added import for example
model_id = "your-username/your-repo-name" # Replace with your repo ID
# Load the model and tokenizer, allowing custom code execution
# Requires sufficient VRAM for the Llama 8B model + memory buffer
model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
torch_dtype=torch.float16, # Recommended for memory
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
# Example prompt
prompt = "What is the capital of France?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
# Generate using the custom method
# Note: The memory buffer is initially randomly initialized unless loaded separately.
# It will be updated during generation if update_rule is 'ema' or 'surprise'.
outputs = model.generate(
**inputs,
max_new_tokens=50,
use_memory=True,
update_rule='ema' # or 'surprise' or 'none'
)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
# To save user-specific memory state (after generation/updates):
# user_memory_state = model.memory_buffer.data.clone()
# user_surprise_state = model.surprise_state.clone()
# torch.save({'memory_buffer': user_memory_state, 'surprise_state': user_surprise_state}, 'user_memory.pt')
# To load user-specific memory state:
# loaded_state = torch.load('user_memory.pt')
# model.memory_buffer.data.copy_(loaded_state['memory_buffer'])
# model.surprise_state.copy_(loaded_state['surprise_state'])
Important: The memory_buffer and surprise_state in this packaged model are initialized randomly according to the InferenceMemoryWrapper code. They do not contain any pre-trained memory state unless you load it separately after initializing the model (see example above). You need to manage loading/saving the memory state per user externally.
- Downloads last month
- 14