How to use from
llama.cppInstall from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf LoSboccacc/ThinkExtension-4x7:# Run inference directly in the terminal:
llama-cli -hf LoSboccacc/ThinkExtension-4x7:Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf LoSboccacc/ThinkExtension-4x7:# Run inference directly in the terminal:
./llama-cli -hf LoSboccacc/ThinkExtension-4x7:Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf LoSboccacc/ThinkExtension-4x7:# Run inference directly in the terminal:
./build/bin/llama-cli -hf LoSboccacc/ThinkExtension-4x7:Use Docker
docker model run hf.co/LoSboccacc/ThinkExtension-4x7:Quick Links
MOE with Mergekit/Mixtral:
base_model: mistralai/Mistral-7B-Instruct-v0.2
gate_mode: hidden # one of "hidden", "cheap_embed", or "random"
dtype: bfloat16 # output dtype (float32, float16, or bfloat16)
experts:
- source_model: SanjiWatsuki/Silicon-Maid-7B
positive_prompts:
- "roleplay"
- "story telling"
- "fantasy"
- "dreaming"
- source_model: teknium/OpenHermes-2.5-Mistral-7B
positive_prompts:
- "chat"
- "flow chart"
- "diagrams"
- "reasoning"
- "explanation"
- source_model: Nondzu/Mistral-7B-Instruct-v0.2-code-ft
positive_prompts:
- "programming"
- "code debugging"
- "data transformation"
- "data structures"
negative_prompt:
- "chat"
- source_model: meta-math/MetaMath-Mistral-7B
positive_prompts:
- "math"
- "arithmetic"
- "algebra"
chatml prompt
- Downloads last month
- 12
Hardware compatibility
Log In to add your hardware
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Install from brew
# Start a local OpenAI-compatible server with a web UI: llama-server -hf LoSboccacc/ThinkExtension-4x7:# Run inference directly in the terminal: llama-cli -hf LoSboccacc/ThinkExtension-4x7: