How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf cstr/salamandra-7b-instruct-GGUF:F32
# Run inference directly in the terminal:
llama-cli -hf cstr/salamandra-7b-instruct-GGUF:F32
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf cstr/salamandra-7b-instruct-GGUF:F32
# Run inference directly in the terminal:
llama-cli -hf cstr/salamandra-7b-instruct-GGUF:F32
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf cstr/salamandra-7b-instruct-GGUF:F32
# Run inference directly in the terminal:
./llama-cli -hf cstr/salamandra-7b-instruct-GGUF:F32
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf cstr/salamandra-7b-instruct-GGUF:F32
# Run inference directly in the terminal:
./build/bin/llama-cli -hf cstr/salamandra-7b-instruct-GGUF:F32
Use Docker
docker model run hf.co/cstr/salamandra-7b-instruct-GGUF:F32
Quick Links

GGUF quants

Experimental GGUF quantization of BSC-LT/salamandra-7b-instruct from llama.cpp (older version b2750).

Use with common ChatLM template.

Below the start of the original Model Card, check it for more details.

Salamandra Model Card

Salamandra comes in three different sizes — 2B, 7B and 40B parameters — with their respective base and instruction-tuned variants. This model card corresponds to the 7B instructed version.

To visit the model cards of other Salamandra versions, please refer to the Model Index.

The entire Salamandra family is released under a permissive Apache 2.0 license. Along with the open weights, all training scripts and configuration files are made publicly available in this GitHub repository.

DISCLAIMER: This model is a first proof-of-concept designed to demonstrate the instruction-following capabilities of recently released base models. It has been optimized to engage in conversation but has NOT been aligned through RLHF to filter or avoid sensitive topics. As a result, it may generate harmful or inappropriate content. The team is actively working to enhance its performance through further instruction and alignment with RL techniques.

Downloads last month
771
GGUF
Model size
8B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cstr/salamandra-7b-instruct-GGUF

Quantized
(16)
this model