Instructions to use Etherll/Mellum-4b-sft-rust with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Etherll/Mellum-4b-sft-rust with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Etherll/Mellum-4b-sft-rust")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Etherll/Mellum-4b-sft-rust")
model = AutoModelForCausalLM.from_pretrained("Etherll/Mellum-4b-sft-rust")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Etherll/Mellum-4b-sft-rust with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Etherll/Mellum-4b-sft-rust"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Etherll/Mellum-4b-sft-rust",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Etherll/Mellum-4b-sft-rust

SGLang

How to use Etherll/Mellum-4b-sft-rust with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Etherll/Mellum-4b-sft-rust" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Etherll/Mellum-4b-sft-rust",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Etherll/Mellum-4b-sft-rust" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Etherll/Mellum-4b-sft-rust",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Unsloth Studio

How to use Etherll/Mellum-4b-sft-rust with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Etherll/Mellum-4b-sft-rust to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Etherll/Mellum-4b-sft-rust to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Etherll/Mellum-4b-sft-rust to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Etherll/Mellum-4b-sft-rust",
    max_seq_length=2048,
)

Docker Model Runner
How to use Etherll/Mellum-4b-sft-rust with Docker Model Runner:
```
docker model run hf.co/Etherll/Mellum-4b-sft-rust
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Etherll/Mellum-4b-sft-rust

Etherll/Mellum-4b-sft-rust is a large language model (LLM) fine-tuned specifically for Rust code Fill-in-the-Middle (FIM) tasks. It is built upon JetBrains/Mellum-4b-base model.

This model has been fine-tuned on the Etherll/CodeFIM-Rust-Mellum dataset, which comprises approximately 57,000 Rust-specific FIM examples, to enhance its proficiency in completing Rust code snippets accurately and contextually.

A GGUF version for CPU inference is also available: Etherll/Mellum-4b-sft-rust-GGUF.

Model Description

This model leverages the LLaMA-style architecture of Mellum-4b-base (4 billion parameters) and its extensive pre-training on over 4 trillion tokens. The fine-tuning process focused on adapting the model to the nuances of Rust syntax and common coding patterns for FIM tasks.

Key Features:

Specialized for Rust: Optimized for Fill-in-the-Middle tasks in Rust.
Based on Mellum-4b-base: Benefits from JetBrains' robust base model.
Efficient: Suitable for both cloud and local deployment.
IDE Integration Ready: Designed for use in developer tooling, and works particularly well with Continue.dev for an enhanced coding assistant experience.

Fine-tuning Data

Dataset: Etherll/CodeFIM-Rust-Mellum
Size: ~57,000 rows
Focus: Rust code Fill-in-the-Middle

FIM Format

This model is trained to recognize a specific format for Fill-in-the-Middle tasks. When providing input for FIM, please use the following structure:

<filename>{{{filename}}}
<fim_suffix>{{{suffix_code}}}<fim_prefix>{{{prefix_code}}}<fim_middle>

How to Use

With Continue.dev

For the best integrated development experience, it's highly recommended to use this model with Continue.dev.

Refer to the Continue.dev documentation for instructions on how to add custom LLMs.

GGUF Version

A GGUF version is available at Etherll/Mellum-4b-sft-rust-GGUF. This format is suitable for local inference on CPU (and GPU with appropriate llama.cpp/Ollama builds) using tools like:

Support & Community

If you need any help, have questions, or just want to chat, feel free to message me on Discord: etherl

Downloads last month: 15

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for Etherll/Mellum-4b-sft-rust

Base model

JetBrains/Mellum-4b-base

Finetuned

(7)

this model

Quantizations

2 models

Etherll
/

Mellum-4b-sft-rust