Instructions to use gbueno86/Cathallama-70B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use gbueno86/Cathallama-70B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="gbueno86/Cathallama-70B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("gbueno86/Cathallama-70B")
model = AutoModelForCausalLM.from_pretrained("gbueno86/Cathallama-70B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Local Apps Settings

vLLM

How to use gbueno86/Cathallama-70B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "gbueno86/Cathallama-70B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "gbueno86/Cathallama-70B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/gbueno86/Cathallama-70B

SGLang

How to use gbueno86/Cathallama-70B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "gbueno86/Cathallama-70B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "gbueno86/Cathallama-70B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "gbueno86/Cathallama-70B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "gbueno86/Cathallama-70B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use gbueno86/Cathallama-70B with Docker Model Runner:
```
docker model run hf.co/gbueno86/Cathallama-70B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Cathallama

Awesome model, my new daily driver.

Edit: I am seeing a lot of token generations pointing to unknown unicode addresses that didn't show up during testing for this model, so I have stopped using it and I am working on a new version.

Notable Performance

9% overall success rate increase on MMLU-PRO over LLaMA 3.1 70b at Q4_0
Strong performance in MMLU-PRO categories overall
Great performance during manual testing

Creation workflow

Models merged

meta-llama/Meta-Llama-3.1-70B-Instruct
turboderp/Cat-Llama-3-70B-instruct
Nexusflow/Athene-70B

flowchart TD
    A[Nexusflow_Athene] -->|Merge with| B[Meta-Llama-3.1]
    C[turboderp_Cat] -->|Merge with| D[Meta-Llama-3.1]
    B -->| | E[Merge]
    D -->| | E[Merge]
    E[Merge] -->|Result| F[Cathallama]

Testing

Hyperparameters

Temperature: 0.0 for automated, 0.9 for manual
Penalize repeat sequence: 1.05
Consider N tokens for penalize: 256
Penalize repetition of newlines
Top-K sampling: 40
Top-P sampling: 0.95
Min-P sampling: 0.05

LLaMAcpp Version

b3527-2-g2d5dd7bb
-fa -ngl -1 -ctk f16 --no-mmap

Tested Files

Cathallama-70B.Q4_0.gguf
Nexusflow_Athene-70B.Q4_0.gguf
turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf
Meta-Llama-3.1-70B-Instruct.Q4_0.gguf

Tests

Manual testing

Category	Test Case	Cathallama-70B.Q4_0.gguf	Nexusflow_Athene-70B.Q4_0.gguf	turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf	Meta-Llama-3.1-70B-Instruct.Q4_0.gguf
Common Sense	Ball on cup	OK	KO	KO	OK
	Big duck small horse	KO	OK	KO	OK
	Killers	OK	OK	KO	OK
	Strawberry r's	OK	KO	KO	KO
	9.11 or 9.9 bigger	KO	OK	OK	KO
	Dragon or lens	KO	KO	KO	KO
	Shirts	OK	OK	KO	KO
	Sisters	OK	KO	KO	KO
	Jane faster	OK	OK	OK	OK
Programming	JSON	OK	OK	OK	OK
	Python snake game	OK	KO	KO	KO
Math	Door window combination	OK	OK	KO	KO
Smoke	Poem	OK	OK	OK	OK
	Story	OK	OK	KO	OK

Note: See sample_generations.txt on the main folder of the repo for the raw generations.

MMLU-PRO

Model	Success %
Cathallama-70B.Q4_0.gguf	51.0%
turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf	37.0%
Nexusflow_Athene-70B.Q4_0.gguf	41.0%
Meta-Llama-3.1-70B-Instruct.Q4_0.gguf	42.0%

MMLU-PRO category	Cathallama-70B.Q4_0.gguf	Nexusflow_Athene-70B.Q4_0.gguf	turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf	Meta-Llama-3.1-70B-Instruct.Q4_0.gguf
Business	50.0%	45.0%	20.0%	40.0%
Law	40.0%	30.0%	30.0%	35.0%
Psychology	85.0%	80.0%	70.0%	75.0%
Biology	80.0%	70.0%	85.0%	80.0%
Chemistry	55.0%	40.0%	35.0%	35.0%
History	65.0%	60.0%	55.0%	65.0%
Other	55.0%	50.0%	45.0%	50.0%
Health	75.0%	40.0%	60.0%	65.0%
Economics	80.0%	75.0%	65.0%	70.0%
Math	45.0%	35.0%	15.0%	40.0%
Physics	50.0%	45.0%	45.0%	45.0%
Computer Science	60.0%	55.0%	55.0%	60.0%
Philosophy	55.0%	60.0%	45.0%	50.0%
Engineering	35.0%	40.0%	25.0%	35.0%

Note: MMLU-PRO Overall tested with 100 questions. Categories testes with 20 questions from each category.

PubmedQA

Model Name	Success%
Cathallama-70B.Q4_0.gguf	73.00%
turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf	76.00%
Nexusflow_Athene-70B.Q4_0.gguf	67.00%
Meta-Llama-3.1-70B-Instruct.Q4_0.gguf	72.00%

Request

If you are hiring in the EU or can sponsor a visa, PM me :D

PS. Thank you mradermacher for the GGUFs!

Downloads last month: 24

Safetensors

Model size

71B params

Tensor type

BF16

Model tree for gbueno86/Cathallama-70B

Nexusflow/Athene-70B

meta-llama/Llama-3.1-70B-Instruct

turboderp/Cat-Llama-3-70B-instruct

Merge model

this model

Merges

1 model

Quantizations

3 models