Instructions to use srs6901/Vikras-MixP with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use srs6901/Vikras-MixP with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="srs6901/Vikras-MixP") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("srs6901/Vikras-MixP", dtype="auto") - llama-cpp-python
How to use srs6901/Vikras-MixP with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="srs6901/Vikras-MixP", filename="Vikra-HCT-YeAM-3_3.2_QweLLa-1.7B_Q8_K.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use srs6901/Vikras-MixP with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf srs6901/Vikras-MixP:Q8_0 # Run inference directly in the terminal: llama-cli -hf srs6901/Vikras-MixP:Q8_0
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf srs6901/Vikras-MixP:Q8_0 # Run inference directly in the terminal: llama-cli -hf srs6901/Vikras-MixP:Q8_0
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf srs6901/Vikras-MixP:Q8_0 # Run inference directly in the terminal: ./llama-cli -hf srs6901/Vikras-MixP:Q8_0
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf srs6901/Vikras-MixP:Q8_0 # Run inference directly in the terminal: ./build/bin/llama-cli -hf srs6901/Vikras-MixP:Q8_0
Use Docker
docker model run hf.co/srs6901/Vikras-MixP:Q8_0
- LM Studio
- Jan
- vLLM
How to use srs6901/Vikras-MixP with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "srs6901/Vikras-MixP" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "srs6901/Vikras-MixP", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/srs6901/Vikras-MixP:Q8_0
- SGLang
How to use srs6901/Vikras-MixP with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "srs6901/Vikras-MixP" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "srs6901/Vikras-MixP", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "srs6901/Vikras-MixP" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "srs6901/Vikras-MixP", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use srs6901/Vikras-MixP with Ollama:
ollama run hf.co/srs6901/Vikras-MixP:Q8_0
- Unsloth Studio new
How to use srs6901/Vikras-MixP with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for srs6901/Vikras-MixP to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for srs6901/Vikras-MixP to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for srs6901/Vikras-MixP to start chatting
- Pi new
How to use srs6901/Vikras-MixP with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf srs6901/Vikras-MixP:Q8_0
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "srs6901/Vikras-MixP:Q8_0" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use srs6901/Vikras-MixP with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf srs6901/Vikras-MixP:Q8_0
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default srs6901/Vikras-MixP:Q8_0
Run Hermes
hermes
- Docker Model Runner
How to use srs6901/Vikras-MixP with Docker Model Runner:
docker model run hf.co/srs6901/Vikras-MixP:Q8_0
- Lemonade
How to use srs6901/Vikras-MixP with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull srs6901/Vikras-MixP:Q8_0
Run and chat with the model
lemonade run user.Vikras-MixP-Q8_0
List all available models
lemonade list
- Vikras โ Experimental Family of Language Models
- ะกะพะดะตัะถะฐะฝะธะต
- ะะพัะพัะบะพ ะพ ะฟัะพะตะบัะต
- ะขะตะบััะธะน ัะตะปะธะท: HCT/YeAM
- HCT (ะฐัั
ะธัะตะบัััะฐ) / YeAM (ะธะฝะฒะฐัะธะฐะฝั ัะตะฐะปะธะทะฐัะธะธ)
- ะัะตะดัะดััะธะน ัะตะปะธะท: Vikra MixedPrc (MixP_4.9b_S)
- MixP_4.9b_S: ะดะตัะฐะปะธ
- ะะปะฐะฝั ัะฐะทะฒะธัะธั
- ะัะฟะพะปัะทะพะฒะฐะฝะธะต
- ะะฐะบะปััะตะฝะธะต
- ะกะพะดะตัะถะฐะฝะธะต
- Vikras โ Experimental Family of Language Models (EN)
Vikras โ Experimental Family of Language Models
ะกะพะดะตัะถะฐะฝะธะต
- ะะพัะพัะบะพ ะพ ะฟัะพะตะบัะต
- ะขะตะบััะธะน ัะตะปะธะท: HCT/YeAM
- HCT (ะฐัั ะธัะตะบัััะฐ) / YeAM (ะธะฝะฒะฐัะธะฐะฝั ัะตะฐะปะธะทะฐัะธะธ)
- ะัะตะดัะดััะธะน ัะตะปะธะท: Vikra MixedPrc (MixP_4.9b_S)
- MixP_4.9b_S: ะดะตัะฐะปะธ
- ะะปะฐะฝั ัะฐะทะฒะธัะธั
- ะัะฟะพะปัะทะพะฒะฐะฝะธะต
- ะะฐะบะปััะตะฝะธะต
ะะพัะพัะบะพ ะพ ะฟัะพะตะบัะต
Vikra โ ัะบัะฟะตัะธะผะตะฝัะฐะปัะฝะพะต ัะตะผะตะนััะฒะพ ัะทัะบะพะฒัั ะผะพะดะตะปะตะน, ะธััะปะตะดัััะตะต ะฒะปะธัะฝะธะต:
- ะณะตะพะผะตััะธะธ ะฟัะตะดััะฐะฒะปะตะฝะธะน
- ะบะฒะฐะฝัะพะฒะฐะฝะธั
- ะณะธะฑัะธะดะฝัั ะผะตัะดะถะตะน
ะฝะฐ ัะธัะปะตะฝะฝัั ะดะธะฝะฐะผะธะบั ััะฐะฝััะพัะผะตัะพะฒ.
ะัะพะตะบั Vikras ะฝะต ะพะณัะฐะฝะธัะธะฒะฐะตััั ะพะดะฝะพะน ะฑะฐะทะพะน ะธะปะธ ะพะดะฝะพะน ะฐัั ะธัะตะบัััะพะน: ััะพ ัะตะผะตะนััะฒะพ ะผะพะดะตะปะตะน, ะพะฑัะตะดะธะฝัะฝะฝัั ะธะดะตะตะน ัะธัะปะตะฝะฝะพะน ะธะฝะฒะฐัะธะฐะฝัะฝะพััะธ ัะบัะฟะตัะธะผะตะฝัะฐ.
- Vikra_% โ ะธะผั ะบะพะฝะบัะตัะฝะพะน ะผะพะดะตะปะธ
- Vikras โ ัะตะผะตะนััะฒะพ ัะบัะฟะตัะธะผะตะฝัะพะฒ
- S / M / L โ ััะตะฟะตะฝั ะฐะณัะตััะธะฒะฝะพััะธ ะธ ัะฐัะฟัะตะดะตะปะตะฝะธั ะฑะธัะฝะพััะธ
- MixP / FullP / HCT โ ัั ะตะผั ะธ ะธะฝะฒะฐัะธะฐะฝัั ะบะฒะฐะฝัะพะฒะฐะฝะธั/ะผะตัะดะถะตะน
ะขะตะบััะธะน ัะตะปะธะท: HCT/YeAM
ะ ะตะปะธะทั
- Vikra-HCT-YeAM-PhiMma-1B
- Vikra-HCT-YeAM-LLaGemma-1B
- Vikra-HCT-YeAM-3_3.2_QweLLa-1.7B
- Vikra-HCT-YeAM-Vikhr-NemoGemma-12B_plus_1B
HCT (ะฐัั ะธัะตะบัััะฐ) / YeAM (ะธะฝะฒะฐัะธะฐะฝั ัะตะฐะปะธะทะฐัะธะธ)
HCT โ ะฐัั ะธัะตะบัััะฝัะน ะธะฝะฒะฐัะธะฐะฝั: ะฟัะฐะบัะธัะตัะบะธะน ัะฟะพัะพะฑ ัะพะฑะธัะฐัั ัะพะฒะผะตััะธะผัะต ะผะพะดะตะปะธ ะธ ะฟัะพะธะทะฒะพะดะฝัะต ัะตะปะธะทั ะฟัะธ ะฟะตัะตะฝะพัะต ะผะตะถะดั ะฑะฐะทะฐะผะธ/ัะตะผะตะนััะฒะฐะผะธ.
YeAM (Yet Another Merge) โ ะธะฝะฒะฐัะธะฐะฝั ัะตะฐะปะธะทะฐัะธะธ HCT ะธ ัะฐะผะพััะพััะตะปัะฝะฐั ัั ะตะผะฐ ะผะตัะดะถะฐ HFโHF: ััะพ ะฝะต ยซะตัั ะพะดะธะฝ SLERP/DARE/TILESยป ะธ ะฝะต ะบะพัะผะตัะธัะตัะบะฐั ะฒะฐัะธะฐัะธั ัััะตะดะฝะตะฝะธั.
YeAM ะฒัะดะฐัั ััะฐะฝะดะฐััะฝัะน HF-ัะตะทัะปััะฐั (safetensors + index) ะธ ะฟะพะดะดะตัะถะธะฒะฐะตั:
- ะฟััะผะพะน weight-to-weight ะผะตัะดะถ
- ะฝะฐะฟัะฐะฒะปะตะฝะฝะพะต ะดะพะฑะฐะฒะปะตะฝะธะต ะทะฝะฐะฝะธะน ะฒ ะฒัะฑัะฐะฝะฝัั ะผะพะดะตะปั (knowledge distillation / knowledge injection), ัะพะณะปะฐัะพะฒะฐะฝะฝะพะต ะฟะพ ะฝะตัะบะพะปัะบะธะผ ะธััะพัะฝะธะบะฐะผ
- ะดะพะฟะพะปะฝะธัะตะปัะฝัะน ะผะตัะดะถ Attention-ัะปะพัะฒ ะบะฐะบ ะพัะดะตะปัะฝัั ัะตั ะฝะธะบั ะฟะพะฒะตัั YeAM
- ะผะตัะดะถ ะผะตะฝััะธั ะผะพะดะตะปะตะน ะฒ ะฑะพะปะตะต ะบััะฟะฝัะต (scale-up merge) ะฟัะธ ัะพั ัะฐะฝะตะฝะธะธ ัะพะฒะผะตััะธะผะพะณะพ HF-ัะพัะผะฐัะฐ
ะะฐัะตะผะฐัะธัะตัะบะธ YeAM ัะฐะฑะพัะฐะตั ะฒ ัะตะฐะปัะฝะพะน 4D-ะฟะพััะฐะฝะพะฒะบะต: ะพะฑะฝะพะฒะปะตะฝะธั ะบะพะดะธัััััั ะณะตะพะผะตััะธัะตัะบะธ ะธ ัะพะณะปะฐัััััั ัะตัะตะท ะฟะตัะตัะตัะตะฝะธั ะปััะตะน ะฒ ะฟัะพัััะฐะฝััะฒะต ะฟะฐัะฐะผะตััะพะฒ. ะญัะพ ะดะฐัั ัะฟัะฐะฒะปัะตะผัะน ะผะตัะดะถ ั ัะพั ัะฐะฝะตะฝะธะตะผ ััััะบัััั ะธ ะฑะตะท ะฒััะพะถะดะตะฝะธั ะฒ ะฝะฐะธะฒะฝะพะต ัััะตะดะฝะตะฝะธะต.
ะัะตะดัะดััะธะน ัะตะปะธะท: Vikra MixedPrc (MixP_4.9b_S)
ะัะฐัะบะพะต ะพะฟะธัะฐะฝะธะต
12.25B Mistral-based language model
Hybrid mixed-precision merged GGUF quantization
ะญะบัะฟะตัะธะผะตะฝัะฐะปัะฝัะน ัะตะถะธะผ ะฐะฝะธะทะพััะพะฟะฝะพะณะพ ะบะฒะฐะฝัะพะฒะฐะฝะธั
ะะพะปะฝะฐั ะฒะตััะธั ะผะตัะดะถะฐ (ะฑะตะท ะบะฒะฐะฝัะพะฒะฐะฝะธั): https://huggingface.co/srs6901/Vikras-MixP/tree/main/Vikra-FullP
GGUF-ะบะฒะฐะฝั: https://huggingface.co/srs6901/Vikras-MixP/blob/main/Vikra-MixP_4.9b_S.gguf
MixP_4.9b_S: ะดะตัะฐะปะธ
ะัั ะธัะตะบัััะฐ (ะดะปั MixP ัะตะปะธะทะฐ)
| ะะฐัะฐะผะตัั | ะะฝะฐัะตะฝะธะต |
|---|---|
| Architecture | Mistral-based |
| Params | ~12.25B |
| Layers | 40 |
| Hidden size | 5120 |
| FFN size | 14336 |
| Heads | 32 (8 KV heads, GQA) |
| Context | 1,024,000 |
| Vocab | 131,072 (Tekken BPE) |
| RoPE theta | 1,000,000 |
MixP_4.9b_S โ ัั ะตะผะฐ ะบะฒะฐะฝัะพะฒะฐะฝะธั
ะะธะฑัะธะดะฝะฐั mixed precision ัั ะตะผะฐ ั ะฟะพะบะพะผะฟะพะฝะตะฝัะฝัะผ ัะฐัะฟัะตะดะตะปะตะฝะธะตะผ ัะธะฟะพะฒ.
| Tensor group | Quant type | BPW |
|---|---|---|
| token_embd, output | BF16 | 16 |
| attn_norm, ffn_norm, output_norm | F32 | 32 |
| attn_q | Q4_K | 4.5 |
| attn_k | Q5_K | 5.5 |
| attn_v | Q3_K | 3.44 |
| attn_output | Q4_K | 4.5 |
| ffn_gate | Q3_K | 3.44 |
| ffn_up | Q5_K | 5.5 |
| ffn_down | Q5_K / Q6_K | 5.5โ6.56 |
ะัะพะณะพ:
- Quantized layers only: ~4.89 BPW
- Full model average: ~6.11 BPW
- File size: ~8.71 GB
ะะปััะตะฒะฐั ะธะดะตั MixP
MixP โ ััะพ ะฝะต ยซัะถะฐัั ะฒัั ะพะดะธะฝะฐะบะพะฒะพยป.
ะญัะพ ะฐะฝะธะทะพััะพะฟะฝะพะต ะบะฒะฐะฝัะพะฒะฐะฝะธะต ะธะฝัะพัะผะฐัะธะพะฝะฝัั ะบะฐะฝะฐะปะพะฒ:
โข Q/K ัะพั ัะฐะฝััััั ะฒ ะฑะพะปะตะต ะฒััะพะบะพะน ัะพัะฝะพััะธ โข V ะธ gate ะฝะฐะผะตัะตะฝะฝะพ ะบะฒะฐะฝัะพะฒะฐะฝั ะดะพ Q3_K โข ะะพัะผั ะธ ะฒัั ะพะดะฝะพะน ัะปะพะน ะพััะฐัััั ะฒ ะฒััะพะบะพะน ัะพัะฝะพััะธ
ะขะฐะบะพะต ัะฐัะฟัะตะดะตะปะตะฝะธะต ะธะทะผะตะฝัะตั ัะธัะปะตะฝะฝัั ะดะธะฝะฐะผะธะบั ะผะพะดะตะปะธ:
โข ััะธะปะธะฒะฐะตััั ััััะบัััะฝะฐั sparsification โข ะผะตะฝัะตััั ัะฐัะฟัะตะดะตะปะตะฝะธะต ะฝะพัะผ ัะบััััั ะฟัะตะดััะฐะฒะปะตะฝะธะน โข ะผะตะฝัะตััั ัะฝััะพะฟะธั ะปะพะณะธัะพะฒ โข ะฟะพัะฒะปัะตััั ัะตะถะธะผะฝะฐั ััะฒััะฒะธัะตะปัะฝะพััั
ะญัะพ ะฝะต ะฝะพะฒะฐั ะฐัั ะธัะตะบัััะฐ. ะญัะพ ะธะทะผะตะฝะตะฝะธะต ัะธัะปะตะฝะฝะพะน ะณะตะพะผะตััะธะธ ัััะตััะฒัััะตะน.
ะะฐะฑะปัะดะฐะตะผัะต ัััะตะบัั
- ัะพั ัะฐะฝะตะฝะธะต top-1 ะฟัะตะดัะบะฐะทะฐะฝะธะน ะฝะฐ ะฟัะพัััั ะทะฐะดะฐัะฐั
- ัะพัั entropy ะฑะตะท ัะฐะทัััะตะฝะธั ะผะฐะบัะธะผะฐะปัะฝะพะน ะฒะตัะพััะฝะพััะธ
- ัะฐััะธัะตะฝะธะต hidden norm ะฝะฐ ัะปะพะถะฝัั ะทะฐะดะฐัะฐั
- ะฑะธัััะบะฐัะธั ัะตะถะธะผะพะฒ: ะฟัะพัััะต ะทะฐะดะฐัะธ โ ะธะฝะฒะฐัะธะฐะฝัะฝั, ัะปะพะถะฝัะต โ ััะฒััะฒะธัะตะปัะฝั
ะญัะธ ัััะตะบัั ะพะฟะธััะฒะฐัััั ะบะฐะบ ะณะตะพะผะตััะธัะตัะบะธะน ัะดะฒะธะณ ะฟัะตะดััะฐะฒะปะตะฝะธะน, ะฐ ะฝะต ะบะฐะบ ัะฝะธะฒะตััะฐะปัะฝะพะต ัะปัััะตะฝะธะต ะบะฐัะตััะฒะฐ.
math_subattention (ัะฐะฑะพัะฐั ะณะธะฟะพัะตะทะฐ)
ะ ัะบัะฟะตัะธะผะตะฝัะฐั ะฝะฐะฑะปัะดะฐะตััั ัััะตะบั, ััะปะพะฒะฝะพ ะพะฑะพะทะฝะฐัะตะฝะฝัะน ะบะฐะบ:
โmath_subattentionโ
ะะพะด ััะธะผ ะฟะพะดัะฐะทัะผะตะฒะฐะตััั:
โข ัะผะตะฝััะตะฝะธะต ะฒะบะปะฐะดะฐ ะผะตะปะบะธั ะบะพะผะฟะพะฝะตะฝั V โข ััะธะปะตะฝะธะต ะดะพะผะธะฝะธััััะธั ะฝะฐะฟัะฐะฒะปะตะฝะธะน residual stream โข ะฟะพะฒััะตะฝะฝะฐั ะธะฝะตััะธั ะฟัะตะดัะดััะตะณะพ ัะพะบะตะฝะฐ โข ัะฝะธะถะตะฝะธะต ัะฐััะพัั ะผะตะปะบะธั ะฟะตัะตะบะปััะตะฝะธะน ะปะพะณะธัะพะฒ
ะญัะพ ะฝะต claim ะพ ะฝะพะฒะพะน ะฐัั ะธัะตะบัััะต. ะญัะพ ัะฐะฑะพัะฐั ะณะธะฟะพัะตะทะฐ ะพ ะดะธะฝะฐะผะธะบะต, ะฒะพะทะฝะธะบะฐััะตะน ะฟัะธ Q3_K symmetric quantization.
ะขะตัะผะธะฝ ะธัะฟะพะปัะทัะตััั ะพะฟะธัะฐัะตะปัะฝะพ.
ะะตัะฟะปะตะบัะธั
ะะตััะธะบะฐ ะธะทะผะตัะตะฝะฐ ะฝะฐ wikitext-2-raw-test (full):
| Model | Precision | PPL |
|---|---|---|
| Vikra MixP_4.9b_S | 6.11 BPW | 5.50 ยฑ 0.03 |
| Baseline BF16 | Full | 6.02 ยฑ 0.03 |
ะะปะฐะฝั ัะฐะทะฒะธัะธั
ะะปะฐะฝะธัััััั ะฟะพะดัะตะผะตะนััะฒะฐ:
- MixP โ Mixed Precision
- FullP โ Full Precision ะฒะตััะธะธ
- HCT โ multi-merge ัะบัะฟะตัะธะผะตะฝัั
- S / M / L โ ะฒะฐัะธะฐะฝัั ัะฐัะฟัะตะดะตะปะตะฝะธั ะฑะธัะฝะพััะธ
ะัะต ะผะพะดะตะปะธ ัะตะผะตะนััะฒะฐ ะฝะฐะทัะฒะฐัััั Vikra. ะ ะตะฟะพะทะธัะพัะธะน โ Vikras.
ะัะฟะพะปัะทะพะฒะฐะฝะธะต
llama-cli -m Vikra-MixP_4.9b_S.gguf -ngl 99 -c 4096
llama-server -m Vikra-MixP_4.9b_S.gguf -ngl 99 -c 4096
ะะฐะบะปััะตะฝะธะต
Vikras โ ะธััะปะตะดะพะฒะฐัะตะปััะบะธะน ะฟัะพะตะบั.
ะะฝ ะธััะปะตะดัะตั, ะบะฐะบ ะผะตะฝัะตััั ะฟะพะฒะตะดะตะฝะธะต ััะฐะฝััะพัะผะตัะฐ, ะตัะปะธ ะตะณะพ:
- ัะถะธะผะฐัั
- ัะผะตัะธะฒะฐัั
- ะธะทะผะตะฝััั ัะธัะปะตะฝะฝัั ะณะตะพะผะตััะธั
ะัะปะธ ะฒะฐะผ ะธะฝัะตัะตัะฝั hidden space dynamics / regime sensitivity / anisotropic quantization โ ะดะพะฑัะพ ะฟะพะถะฐะปะพะฒะฐัั.
Vikras โ Experimental Family of Language Models (EN)
Table of Contents
- Project overview
- Current Release: HCT/YeAM
- HCT (architecture) / YeAM (implementation invariant)
- Previous Release: Vikra MixedPrc (MixP_4.9b_S)
- MixP_4.9b_S: details
- Roadmap
- Usage
- Closing
Project overview
Vikra is an experimental family of language models exploring how:
- representation geometry
- quantization
- hybrid merges
affect transformer numerical dynamics.
The Vikras project is not tied to a single base model or architecture. It is a family of models unified by a numerical invariance philosophy of experimentation.
- Vikra_% โ a specific model
- Vikras โ the experimental family
- S / M / L โ aggressiveness and bit allocation variants
- MixP / FullP / HCT โ quantization / merge invariants
Current Release: HCT/YeAM
Releases
- Vikra-HCT-YeAM-PhiMma-1B
- Vikra-HCT-YeAM-LLaGemma-1B
- Vikra-HCT-YeAM-3_3.2_QweLLa-1.7B
- Vikra-HCT-YeAM-Vikhr-NemoGemma-12B_plus_1B
HCT (architecture) / YeAM (implementation invariant)
HCT is an architectural invariant. In English: Heterogeneous Compatibility Transfer โ a practical way to assemble compatible checkpoints and derived releases while moving across bases / model families.
YeAM (Yet Another Merge) is an implementation invariant of HCT and a standalone HFโHF merge scheme: it is not โjust another SLERP/DARE/TILESโ and not a cosmetic variant of averaging.
YeAM produces a standard HF output (safetensors + index) and supports:
- direct weight-to-weight merging
- targeted knowledge injection into a chosen model (knowledge distillation mode), aligned across multiple sources
- an additional Attention-layer merge as a second technique on top of YeAM
- merging smaller models into larger ones (scale-up merge) while keeping a compatible HF format
YeAM operates in a real 4D formulation: updates are encoded geometrically and aligned via ray intersections in parameter space. This produces controlled merges that preserve structure instead of collapsing into naive averaging.
Previous Release: Vikra MixedPrc (MixP_4.9b_S)
Short Description
12.25B Mistral-based language model
Hybrid mixed-precision merged GGUF quantization
Experimental anisotropic quantization regime
Full merge version (non-quantized): https://huggingface.co/srs6901/Vikras-MixP/tree/main/Vikra-FullP
GGUF quant: https://huggingface.co/srs6901/Vikras-MixP/blob/main/Vikra-MixP_4.9b_S.gguf
MixP_4.9b_S: details
Architecture (for the MixP release)
| Parameter | Value |
|---|---|
| Architecture | Mistral-based |
| Params | ~12.25B |
| Layers | 40 |
| Hidden size | 5120 |
| FFN size | 14336 |
| Heads | 32 (8 KV heads, GQA) |
| Context | 1,024,000 |
| Vocab | 131,072 (Tekken BPE) |
| RoPE theta | 1,000,000 |
MixP_4.9b_S โ Quantization Scheme
A hybrid mixed-precision scheme with per-tensor type allocation.
| Tensor group | Quant type | BPW |
|---|---|---|
| token_embd, output | BF16 | 16 |
| attn_norm, ffn_norm, output_norm | F32 | 32 |
| attn_q | Q4_K | 4.5 |
| attn_k | Q5_K | 5.5 |
| attn_v | Q3_K | 3.44 |
| attn_output | Q4_K | 4.5 |
| ffn_gate | Q3_K | 3.44 |
| ffn_up | Q5_K | 5.5 |
| ffn_down | Q5_K / Q6_K | 5.5โ6.56 |
Totals:
- Quantized layers only: ~4.89 BPW
- Full model average: ~6.11 BPW
- File size: ~8.71 GB
Core idea of MixP
MixP is not โcompress everything equallyโ.
It is anisotropic quantization of information channels:
- Q/K remain in higher precision
- V and gate are intentionally quantized down to Q3_K
- norms and the output layer remain in higher precision
This redistribution changes the numerical dynamics of the model:
- increased structural sparsification
- shifts in hidden norm distribution
- changes in logit entropy
- regime sensitivity
This is not a new architecture. It is a modification of the numerical geometry of an existing one.
Observed effects
- preservation of top-1 predictions on simple tasks
- increased entropy without collapse of maximum probability
- expansion of hidden norms on complex tasks
- mode bifurcation: simple tasks โ invariant, complex tasks sensitive
These effects are interpreted as a geometric shift of representations rather than a universal quality improvement.
math_subattention (working hypothesis)
In experiments, an effect informally referred to as:
โmath_subattentionโ
This describes:
- reduced contribution of small V components
- dominance of stronger residual directions
- increased inertia from previous token state
- reduced frequency of small logit switching
This is not an architectural claim. It is a working hypothesis of dynamics emerging from Q3_K symmetric quantization.
The term is used descriptively.
Perplexity
Measured on wikitext-2-raw-test (full):
| Model | Precision | PPL |
|---|---|---|
| Vikra MixP_4.9b_S | 6.11 BPW | 5.50 ยฑ 0.03 |
| Baseline BF16 | Full | 6.02 ยฑ 0.03 |
Roadmap
Planned subfamilies:
- MixP โ Mixed Precision
- FullP โ Full Precision variants
- HCT โ multi-merge experiments
- S / M / L โ different bit allocation regimes
All models in the family are called Vikra. The repository is Vikras.
Usage
llama-cli -m Vikra-MixP_4.9b_S.gguf -ngl 99 -c 4096
llama-server -m Vikra-MixP_4.9b_S.gguf -ngl 99 -c 4096
Closing
Vikras is a research project.
It explores how transformer behavior changes when we:
- compress
- merge
- alter numerical geometry
If you are interested in hidden space dynamics / regime sensitivity / anisotropic quantization โ welcome.
- Downloads last month
- 185
docker model run hf.co/srs6901/Vikras-MixP:Q8_0