output = llm(
"Once upon a time,",
max_tokens=512,
echo=True
)
print(output)Original Model: GeneZC/MiniChat-2-3B
GGUF fp16 Version
Quantized Version Q8_0
Note: This is an Experiment and not Tested
- Downloads last month
- 12
Hardware compatibility
Log In to add your hardware
8-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="MoMonir/MiniChat-2-3B-GGUF", filename="", )