How to use from
llama.cppInstall from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Fischerboot/2b-gguf-tiny-llama:F16# Run inference directly in the terminal:
llama-cli -hf Fischerboot/2b-gguf-tiny-llama:F16Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Fischerboot/2b-gguf-tiny-llama:F16# Run inference directly in the terminal:
./llama-cli -hf Fischerboot/2b-gguf-tiny-llama:F16Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Fischerboot/2b-gguf-tiny-llama:F16# Run inference directly in the terminal:
./build/bin/llama-cli -hf Fischerboot/2b-gguf-tiny-llama:F16Use Docker
docker model run hf.co/Fischerboot/2b-gguf-tiny-llama:F16Quick Links
GGUF!
merge
This is a merge of pre-trained language models created using mergekit.
SOMEHOW ITS AAAACTUALLY USEABLE
Merge Details
Merge Method
This model was merged using the passthrough merge method.
Models Merged
The following models were included in the merge:
Configuration
The following YAML configuration was used to produce this model:
dtype: bfloat16
merge_method: passthrough
slices:
- sources:
- layer_range: [0, 16] # angepasst von [0, 24] auf [0, 16]
model: concedo/KobbleTinyV2-1.1B
- sources:
- layer_range: [5, 16] # angepasst von [8, 24] auf [5, 16]
model: concedo/KobbleTinyV2-1.1B
parameters:
scale:
- filter: o_proj
value: 0.0
- filter: down_proj
value: 0.0
- value: 1.0
- sources:
- layer_range: [5, 16] # angepasst von [8, 24] auf [5, 16]
model: concedo/KobbleTinyV2-1.1B
parameters:
scale:
- filter: o_proj
value: 0.0
- filter: down_proj
value: 0.0
- value: 1.0
- sources:
- layer_range: [16, 22] # angepasst von [24, 32] auf [16, 22]
model: concedo/KobbleTinyV2-1.1B
- Downloads last month
- 12
Hardware compatibility
Log In to add your hardware
16-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for Fischerboot/2b-gguf-tiny-llama
Base model
concedo/KobbleTinyV2-1.1B
Install from brew
# Start a local OpenAI-compatible server with a web UI: llama-server -hf Fischerboot/2b-gguf-tiny-llama:F16# Run inference directly in the terminal: llama-cli -hf Fischerboot/2b-gguf-tiny-llama:F16