Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf LH-Tech-AI/Apex-1.5-Instruct-350M# Run inference directly in the terminal:
llama-cli -hf LH-Tech-AI/Apex-1.5-Instruct-350MUse pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf LH-Tech-AI/Apex-1.5-Instruct-350M# Run inference directly in the terminal:
./llama-cli -hf LH-Tech-AI/Apex-1.5-Instruct-350MBuild from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf LH-Tech-AI/Apex-1.5-Instruct-350M# Run inference directly in the terminal:
./build/bin/llama-cli -hf LH-Tech-AI/Apex-1.5-Instruct-350MUse Docker
docker model run hf.co/LH-Tech-AI/Apex-1.5-Instruct-350MApex 1.5: Improved reasoning and logic. Fixed wrong facts and hallucinations by increasing FineWeb-Edu ratio while finetuning to 4:1.
Update: Due to the great community feedback on Apex 1.0, I've trained Apex 1.5 with a focus on world knowledge (FineWeb-Edu) and coding logic. Enjoy the massive jump in reasoning!
How to train it
You can train it, using the base model of LH-Tech-AI/Apex-1-Instruct-350M (you need to train that base model first!). Then, use the prepare-script and the finetuning script in the files list of this HF model.
How to use it
You can download the apex_1.5.gguf or use ollama run hf.co/LH-Tech-AI/Apex-1.5-Instruct-350M. And you can also use it in LM Studio for example, just by searching for "Apex 1.5".
You can directly download the final model as ONNX format - so it runs without the need to install a huge Python environment with PyTorch, CUDA, etc... - as INT8 and in full precision.
Use inference.py for local inference on CUDA or CPU! First, install pip install onnxruntime-gpu tiktoken numpy nvidia-cudnn-cu12 nvidia-cublas-cu12 on your system (in a Python VENV for Linux users).
Have fun! :D
Recommendation of newer model - Apex 1.5 Coder
Don't get confused by the name - "Coder" doesn't mean, that the model can code very well! But it is finetuned based on this finetuned model (Apex 1.5 Instruct) with CodeAlpaca. Link: https://huggingface.co/LH-Tech-AI/Apex-1.5-Coder-Instruct-350M/ - Have fun with it - it is the newest and best model from LH-Tech AI.
- Downloads last month
- 524
We're not able to determine the quantization variants.
Install from brew
# Start a local OpenAI-compatible server with a web UI: llama-server -hf LH-Tech-AI/Apex-1.5-Instruct-350M# Run inference directly in the terminal: llama-cli -hf LH-Tech-AI/Apex-1.5-Instruct-350M