# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("abacaj/Replit-v2-CodeInstruct-3B-ggml", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("abacaj/Replit-v2-CodeInstruct-3B-ggml", trust_remote_code=True)Quick Links
This is a ggml quantized version of Replit-v2-CodeInstruct-3B. Quantized to 4bit -> q4_1. To run inference you can use ggml directly or ctransformers.
- Memory usage of model: 2GB~
- Repo to run the model using ctransformers: https://github.com/abacaj/replit-3B-inference
- Downloads last month
- 4
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="abacaj/Replit-v2-CodeInstruct-3B-ggml", trust_remote_code=True)