Memory-Augmented Llama Model (Llama-3-8B-Instruct)

This repository contains the base weights for Llama-3-8B-Instruct packaged with custom code for the InferenceMemoryWrapper.

This allows loading the model with memory capabilities using trust_remote_code=True.

Model Details

  • Base Model: Llama-3-8B-Instruct
  • Wrapper: InferenceMemoryWrapper
  • Memory Size: 4096
  • Memory Dims: 4096
  • Memory Storage (approx): 64.0 MB (FP16, buffer + state) if buffer/state exist

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch # Added import for example

model_id = "your-username/your-repo-name" # Replace with your repo ID

# Load the model and tokenizer, allowing custom code execution
# Requires sufficient VRAM for the Llama 8B model + memory buffer
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.float16, # Recommended for memory
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Example prompt
prompt = "What is the capital of France?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generate using the custom method
# Note: The memory buffer is initially randomly initialized unless loaded separately.
# It will be updated during generation if update_rule is 'ema' or 'surprise'.
outputs = model.generate(
    **inputs,
    max_new_tokens=50,
    use_memory=True,
    update_rule='ema' # or 'surprise' or 'none'
)

generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

# To save user-specific memory state (after generation/updates):
# user_memory_state = model.memory_buffer.data.clone()
# user_surprise_state = model.surprise_state.clone()
# torch.save({'memory_buffer': user_memory_state, 'surprise_state': user_surprise_state}, 'user_memory.pt')

# To load user-specific memory state:
# loaded_state = torch.load('user_memory.pt')
# model.memory_buffer.data.copy_(loaded_state['memory_buffer'])
# model.surprise_state.copy_(loaded_state['surprise_state'])

Important: The memory_buffer and surprise_state in this packaged model are initialized randomly according to the InferenceMemoryWrapper code. They do not contain any pre-trained memory state unless you load it separately after initializing the model (see example above). You need to manage loading/saving the memory state per user externally.

Downloads last month
3
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support