# Replicate Setup Instructions ## Prerequisites 1. Install Cog: https://github.com/replicate/cog ```bash sudo curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_`uname -s`_`uname -m` sudo chmod +x /usr/local/bin/cog ``` 2. Create a Replicate account: https://replicate.com ## Local Testing ```bash # Test the model locally cog predict -i prompt="What makes Monad blockchain unique?" # Build the Docker image cog build ``` ## Push to Replicate ```bash # Login to Replicate cog login # Push the model (replace with your username) cog push r8.im/YOUR_USERNAME/monad-mistral-7b ``` ## Model Structure - `cog.yaml`: Defines environment and dependencies - `predict.py`: Contains the Predictor class for inference - `monad-mistral-7b.gguf`: The model file (will be uploaded separately) ## Using the Model on Replicate Once deployed, you can use it via: ### Python ```python import replicate output = replicate.run( "YOUR_USERNAME/monad-mistral-7b:latest", input={ "prompt": "Explain Monad's parallel execution", "temperature": 0.7, "max_tokens": 200 } ) print(output) ``` ### cURL ```bash curl -s -X POST \ -H "Authorization: Token $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "version": "latest", "input": { "prompt": "What is Monad?" } }' \ https://api.replicate.com/v1/predictions ``` ## Notes - The GGUF file needs to be included in the model package - Replicate will automatically handle GPU allocation - The model uses llama-cpp-python for efficient GGUF inference - Context window is set to 4096 tokens