Instructions to use refactai/Refact-1_6B-fim with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use refactai/Refact-1_6B-fim with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="refactai/Refact-1_6B-fim", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("refactai/Refact-1_6B-fim", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use refactai/Refact-1_6B-fim with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "refactai/Refact-1_6B-fim" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "refactai/Refact-1_6B-fim", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/refactai/Refact-1_6B-fim
- SGLang
How to use refactai/Refact-1_6B-fim with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "refactai/Refact-1_6B-fim" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "refactai/Refact-1_6B-fim", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "refactai/Refact-1_6B-fim" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "refactai/Refact-1_6B-fim", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use refactai/Refact-1_6B-fim with Docker Model Runner:
docker model run hf.co/refactai/Refact-1_6B-fim
gguf format?
hi, can you please provide a gguf format, me and many others have code setup already for various things that consume models using llama.cpp (i love it). it will be faster for me to get started with your model if gguf was available. thanks.
so no one here for any "discussion"?
Some parts of the model are not supported by llama.cpp by now, but I guess it's resolvable, see the issues
https://github.com/ggerganov/llama.cpp/issues/3061
https://github.com/smallcloudai/refact/issues/77
Please, feel free to contribute!
A complete set of all quantisations with fully functional files is available at
https://huggingface.co/maddes8cht/smallcloudai-Refact-1_6B-fim-gguf
https://huggingface.co/maddes8cht/ contains an extensive collection of .gguf converted models with only truly free (Osi-compliant licences) open source LLMs.
The compilation is nicely organised in collections sorted by the source LLMs from which they were inherited.