Memory-Augmented Llama Model (Llama-3-8B-Instruct)

This repository contains the base weights for Llama-3-8B-Instruct packaged with custom code for the InferenceMemoryWrapper.

This allows loading the model with memory capabilities using trust_remote_code=True.

Model Details

  • Base Model: Llama-3-8B-Instruct
  • Wrapper: InferenceMemoryWrapper
  • Memory Size: 4096
  • Memory Dims: 4096
  • Memory Storage (approx): 64.0 MB (FP16, buffer + state) if buffer/state exist

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch # Added import for example

model_id = "your-username/your-repo-name" # Replace with your repo ID

# Load the model and tokenizer, allowing custom code execution
# Requires sufficient VRAM for the Llama 8B model + memory buffer
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.float16, # Recommended for memory
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Example prompt
prompt = "What is the capital of France?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generate using the custom method
# Note: The memory buffer is initially randomly initialized unless loaded separately.
# It will be updated during generation if update_rule is 'ema' or 'surprise'.
outputs = model.generate(
    **inputs,
    max_new_tokens=50,
    use_memory=True,
    update_rule='ema' # or 'surprise' or 'none'
)

generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

# To save user-specific memory state (after generation/updates):
# user_memory_state = model.memory_buffer.data.clone()
# user_surprise_state = model.surprise_state.clone()
# torch.save({'memory_buffer': user_memory_state, 'surprise_state': user_surprise_state}, 'user_memory.pt')

# To load user-specific memory state:
# loaded_state = torch.load('user_memory.pt')
# model.memory_buffer.data.copy_(loaded_state['memory_buffer'])
# model.surprise_state.copy_(loaded_state['surprise_state'])

Important: The memory_buffer and surprise_state in this packaged model are initialized randomly according to the InferenceMemoryWrapper code. They do not contain any pre-trained memory state unless you load it separately after initializing the model (see example above). You need to manage loading/saving the memory state per user externally.

Downloads last month
14
Safetensors
Model size
8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support