Instructions to use srs6901/Vikras-MixP with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use srs6901/Vikras-MixP with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="srs6901/Vikras-MixP")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("srs6901/Vikras-MixP", dtype="auto")

llama-cpp-python

How to use srs6901/Vikras-MixP with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="srs6901/Vikras-MixP",
	filename="Vikra-HCT-YeAM-3_3.2_QweLLa-1.7B_Q8_K.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use srs6901/Vikras-MixP with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf srs6901/Vikras-MixP:Q8_0
# Run inference directly in the terminal:
llama-cli -hf srs6901/Vikras-MixP:Q8_0

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf srs6901/Vikras-MixP:Q8_0
# Run inference directly in the terminal:
llama-cli -hf srs6901/Vikras-MixP:Q8_0

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf srs6901/Vikras-MixP:Q8_0
# Run inference directly in the terminal:
./llama-cli -hf srs6901/Vikras-MixP:Q8_0

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf srs6901/Vikras-MixP:Q8_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf srs6901/Vikras-MixP:Q8_0

Use Docker

docker model run hf.co/srs6901/Vikras-MixP:Q8_0

LM Studio
Jan

vLLM

How to use srs6901/Vikras-MixP with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "srs6901/Vikras-MixP"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "srs6901/Vikras-MixP",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/srs6901/Vikras-MixP:Q8_0

SGLang

How to use srs6901/Vikras-MixP with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "srs6901/Vikras-MixP" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "srs6901/Vikras-MixP",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "srs6901/Vikras-MixP" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "srs6901/Vikras-MixP",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use srs6901/Vikras-MixP with Ollama:
```
ollama run hf.co/srs6901/Vikras-MixP:Q8_0
```

Unsloth Studio new

How to use srs6901/Vikras-MixP with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for srs6901/Vikras-MixP to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for srs6901/Vikras-MixP to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for srs6901/Vikras-MixP to start chatting

Pi new

How to use srs6901/Vikras-MixP with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf srs6901/Vikras-MixP:Q8_0

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Vikras-MixP"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Docker Model Runner
How to use srs6901/Vikras-MixP with Docker Model Runner:
```
docker model run hf.co/srs6901/Vikras-MixP:Q8_0
```

Lemonade

How to use srs6901/Vikras-MixP with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull srs6901/Vikras-MixP:Q8_0

Run and chat with the model

lemonade run user.Vikras-MixP-Q8_0

List all available models

lemonade list

Vikras — Experimental Family of Language Models

EN below

Содержание

Коротко о проекте
Текущий релиз: HCT/YeAM
HCT (архитектура) / YeAM (инвариант реализации)
Предыдущий релиз: Vikra MixedPrc (MixP_4.9b_S)
MixP_4.9b_S: детали
Планы развития
Использование
Заключение

Коротко о проекте

Vikra — экспериментальное семейство языковых моделей, исследующее влияние:

геометрии представлений
квантования
гибридных мерджей

на численную динамику трансформеров.

Проект Vikras не ограничивается одной базой или одной архитектурой: это семейство моделей, объединённых идеей численной инвариантности эксперимента.

Vikra_% — имя конкретной модели
Vikras — семейство экспериментов
S / M / L — степень агрессивности и распределения битности
MixP / FullP / HCT — схемы и инварианты квантования/мерджей

Текущий релиз: HCT/YeAM

Релизы

Vikra-HCT-YeAM-PhiMma-1B
- HF: https://huggingface.co/srs6901/Vikras-MixP/tree/main/Vikra-HCT-YeAM-PhiMma-1B
- GGUF: https://huggingface.co/srs6901/Vikras-MixP/blob/main/Vikra-HCT-YeAM-PhiMma-1B-Q8_0.gguf
Vikra-HCT-YeAM-LLaGemma-1B
- HF: https://huggingface.co/srs6901/Vikras-MixP/tree/main/Vikra-HCT-YeAM-LLaGemma-1B
- GGUF: https://huggingface.co/srs6901/Vikras-MixP/blob/main/Vikra-HCT-YeAM-LLaGemma-1B-Q8_0.gguf
Vikra-HCT-YeAM-3_3.2_QweLLa-1.7B
- HF: https://huggingface.co/srs6901/Vikras-MixP/tree/main/Vikra-HCT-YeAM-3_3.2_QweLLa-1.7B
- GGUF: https://huggingface.co/srs6901/Vikras-MixP/blob/main/Vikra-HCT-YeAM-3_3.2_QweLLa-1.7B_Q8_K.gguf
Vikra-HCT-YeAM-Vikhr-NemoGemma-12B_plus_1B
- HF: https://huggingface.co/srs6901/Vikras-MixP/tree/main/Vikra-HCT-YeAM-Vikhr-NemoGemma-12B_plus_1B
- GGUF: https://huggingface.co/srs6901/Vikras-MixP/blob/main/Vikra-HCT-YeAM-Vikhr-NemoGemma-12B_plus_1B-Q6_K.gguf

HCT (архитектура) / YeAM (инвариант реализации)

HCT — архитектурный инвариант: практический способ собирать совместимые модели и производные релизы при переносе между базами/семействами.

YeAM (Yet Another Merge) — инвариант реализации HCT и самостоятельная схема мерджа HF→HF: это не «ещё один SLERP/DARE/TILES» и не косметическая вариация усреднения.

YeAM выдаёт стандартный HF-результат (safetensors + index) и поддерживает:

прямой weight-to-weight мердж
направленное добавление знаний в выбранную модель (knowledge distillation / knowledge injection), согласованное по нескольким источникам
дополнительный мердж Attention-слоёв как отдельную технику поверх YeAM
мердж меньших моделей в более крупные (scale-up merge) при сохранении совместимого HF-формата

Математически YeAM работает в реальной 4D-постановке: обновления кодируются геометрически и согласуются через пересечения лучей в пространстве параметров. Это даёт управляемый мердж с сохранением структуры и без вырождения в наивное усреднение.

Предыдущий релиз: Vikra MixedPrc (MixP_4.9b_S)

Краткое описание

12.25B Mistral-based language model
Hybrid mixed-precision merged GGUF quantization
Экспериментальный режим анизотропного квантования

Полная версия мерджа (без квантования): https://huggingface.co/srs6901/Vikras-MixP/tree/main/Vikra-FullP

GGUF-квант: https://huggingface.co/srs6901/Vikras-MixP/blob/main/Vikra-MixP_4.9b_S.gguf

MixP_4.9b_S: детали

Архитектура (для MixP релиза)

Параметр	Значение
Architecture	Mistral-based
Params	~12.25B
Layers	40
Hidden size	5120
FFN size	14336
Heads	32 (8 KV heads, GQA)
Context	1,024,000
Vocab	131,072 (Tekken BPE)
RoPE theta	1,000,000

MixP_4.9b_S — схема квантования

Гибридная mixed precision схема с покомпонентным распределением типов.

Tensor group	Quant type	BPW
token_embd, output	BF16	16
attn_norm, ffn_norm, output_norm	F32	32
attn_q	Q4_K	4.5
attn_k	Q5_K	5.5
attn_v	Q3_K	3.44
attn_output	Q4_K	4.5
ffn_gate	Q3_K	3.44
ffn_up	Q5_K	5.5
ffn_down	Q5_K / Q6_K	5.5–6.56

Итого:

Quantized layers only: ~4.89 BPW
Full model average: ~6.11 BPW
File size: ~8.71 GB

Ключевая идея MixP

MixP — это не «сжать всё одинаково».

Это анизотропное квантование информационных каналов:

• Q/K сохраняются в более высокой точности • V и gate намеренно квантованы до Q3_K • Нормы и выходной слой остаются в высокой точности

Такое распределение изменяет численную динамику модели:

• усиливается структурная sparsification • меняется распределение норм скрытых представлений • меняется энтропия логитов • появляется режимная чувствительность

Это не новая архитектура. Это изменение численной геометрии существующей.

Наблюдаемые эффекты

сохранение top-1 предсказаний на простых задачах
рост entropy без разрушения максимальной вероятности
расширение hidden norm на сложных задачах
бифуркация режимов: простые задачи ≈ инвариантны, сложные — чувствительны

Эти эффекты описываются как геометрический сдвиг представлений, а не как универсальное улучшение качества.

math_subattention (рабочая гипотеза)

В экспериментах наблюдается эффект, условно обозначенный как:

“math_subattention”

Под этим подразумевается:

• уменьшение вклада мелких компонент V • усиление доминирующих направлений residual stream • повышенная инерция предыдущего токена • снижение частоты мелких переключений логитов

Это не claim о новой архитектуре. Это рабочая гипотеза о динамике, возникающей при Q3_K symmetric quantization.

Термин используется описательно.

Перплексия

Метрика измерена на wikitext-2-raw-test (full):

Model	Precision	PPL
Vikra MixP_4.9b_S	6.11 BPW	5.50 ± 0.03
Baseline BF16	Full	6.02 ± 0.03

Планы развития

Планируются подсемейства:

MixP — Mixed Precision
FullP — Full Precision версии
HCT — multi-merge эксперименты
S / M / L — варианты распределения битности

Все модели семейства называются Vikra. Репозиторий — Vikras.

Использование

llama-cli -m Vikra-MixP_4.9b_S.gguf -ngl 99 -c 4096

llama-server -m Vikra-MixP_4.9b_S.gguf -ngl 99 -c 4096

Заключение

Vikras — исследовательский проект.

Он исследует, как меняется поведение трансформера, если его:

сжимать
смешивать
изменять численную геометрию

Если вам интересны hidden space dynamics / regime sensitivity / anisotropic quantization — добро пожаловать.

Vikras — Experimental Family of Language Models (EN)

Project overview
Current Release: HCT/YeAM
HCT (architecture) / YeAM (implementation invariant)
Previous Release: Vikra MixedPrc (MixP_4.9b_S)
MixP_4.9b_S: details
Roadmap
Usage
Closing

Project overview

Vikra is an experimental family of language models exploring how:

representation geometry
quantization
hybrid merges

affect transformer numerical dynamics.

The Vikras project is not tied to a single base model or architecture. It is a family of models unified by a numerical invariance philosophy of experimentation.

Vikra_% — a specific model
Vikras — the experimental family
S / M / L — aggressiveness and bit allocation variants
MixP / FullP / HCT — quantization / merge invariants

Current Release: HCT/YeAM

Releases

Vikra-HCT-YeAM-PhiMma-1B
- HF: https://huggingface.co/srs6901/Vikras-MixP/tree/main/Vikra-HCT-YeAM-PhiMma-1B
- GGUF: https://huggingface.co/srs6901/Vikras-MixP/blob/main/Vikra-HCT-YeAM-PhiMma-1B-Q8_0.gguf
Vikra-HCT-YeAM-LLaGemma-1B
- HF: https://huggingface.co/srs6901/Vikras-MixP/tree/main/Vikra-HCT-YeAM-LLaGemma-1B
- GGUF: https://huggingface.co/srs6901/Vikras-MixP/blob/main/Vikra-HCT-YeAM-LLaGemma-1B-Q8_0.gguf
Vikra-HCT-YeAM-3_3.2_QweLLa-1.7B
- HF: https://huggingface.co/srs6901/Vikras-MixP/tree/main/Vikra-HCT-YeAM-3_3.2_QweLLa-1.7B
- GGUF: https://huggingface.co/srs6901/Vikras-MixP/blob/main/Vikra-HCT-YeAM-3_3.2_QweLLa-1.7B_Q8_K.gguf
Vikra-HCT-YeAM-Vikhr-NemoGemma-12B_plus_1B
- HF: https://huggingface.co/srs6901/Vikras-MixP/tree/main/Vikra-HCT-YeAM-Vikhr-NemoGemma-12B_plus_1B
- GGUF: https://huggingface.co/srs6901/Vikras-MixP/blob/main/Vikra-HCT-YeAM-Vikhr-NemoGemma-12B_plus_1B-Q6_K.gguf

HCT (architecture) / YeAM (implementation invariant)

HCT is an architectural invariant. In English: Heterogeneous Compatibility Transfer — a practical way to assemble compatible checkpoints and derived releases while moving across bases / model families.

YeAM (Yet Another Merge) is an implementation invariant of HCT and a standalone HF→HF merge scheme: it is not “just another SLERP/DARE/TILES” and not a cosmetic variant of averaging.

YeAM produces a standard HF output (safetensors + index) and supports:

direct weight-to-weight merging
targeted knowledge injection into a chosen model (knowledge distillation mode), aligned across multiple sources
an additional Attention-layer merge as a second technique on top of YeAM
merging smaller models into larger ones (scale-up merge) while keeping a compatible HF format

YeAM operates in a real 4D formulation: updates are encoded geometrically and aligned via ray intersections in parameter space. This produces controlled merges that preserve structure instead of collapsing into naive averaging.

Previous Release: Vikra MixedPrc (MixP_4.9b_S)

Short Description

12.25B Mistral-based language model
Hybrid mixed-precision merged GGUF quantization
Experimental anisotropic quantization regime

Full merge version (non-quantized): https://huggingface.co/srs6901/Vikras-MixP/tree/main/Vikra-FullP

GGUF quant: https://huggingface.co/srs6901/Vikras-MixP/blob/main/Vikra-MixP_4.9b_S.gguf

MixP_4.9b_S: details

Architecture (for the MixP release)

Parameter	Value
Architecture	Mistral-based
Params	~12.25B
Layers	40
Hidden size	5120
FFN size	14336
Heads	32 (8 KV heads, GQA)
Context	1,024,000
Vocab	131,072 (Tekken BPE)
RoPE theta	1,000,000

MixP_4.9b_S — Quantization Scheme

A hybrid mixed-precision scheme with per-tensor type allocation.

Tensor group	Quant type	BPW
token_embd, output	BF16	16
attn_norm, ffn_norm, output_norm	F32	32
attn_q	Q4_K	4.5
attn_k	Q5_K	5.5
attn_v	Q3_K	3.44
attn_output	Q4_K	4.5
ffn_gate	Q3_K	3.44
ffn_up	Q5_K	5.5
ffn_down	Q5_K / Q6_K	5.5–6.56

Totals:

Quantized layers only: ~4.89 BPW
Full model average: ~6.11 BPW
File size: ~8.71 GB

Core idea of MixP

MixP is not “compress everything equally”.

It is anisotropic quantization of information channels:

Q/K remain in higher precision
V and gate are intentionally quantized down to Q3_K
norms and the output layer remain in higher precision

This redistribution changes the numerical dynamics of the model:

increased structural sparsification
shifts in hidden norm distribution
changes in logit entropy
regime sensitivity

This is not a new architecture. It is a modification of the numerical geometry of an existing one.

Observed effects

preservation of top-1 predictions on simple tasks
increased entropy without collapse of maximum probability
expansion of hidden norms on complex tasks
mode bifurcation: simple tasks ≈ invariant, complex tasks sensitive

These effects are interpreted as a geometric shift of representations rather than a universal quality improvement.

math_subattention (working hypothesis)

In experiments, an effect informally referred to as:

“math_subattention”

This describes:

reduced contribution of small V components
dominance of stronger residual directions
increased inertia from previous token state
reduced frequency of small logit switching

This is not an architectural claim. It is a working hypothesis of dynamics emerging from Q3_K symmetric quantization.

The term is used descriptively.

Perplexity

Measured on wikitext-2-raw-test (full):

Model	Precision	PPL
Vikra MixP_4.9b_S	6.11 BPW	5.50 ± 0.03
Baseline BF16	Full	6.02 ± 0.03

Roadmap

Planned subfamilies:

MixP — Mixed Precision
FullP — Full Precision variants
HCT — multi-merge experiments
S / M / L — different bit allocation regimes

All models in the family are called Vikra. The repository is Vikras.

Usage

llama-cli -m Vikra-MixP_4.9b_S.gguf -ngl 99 -c 4096

llama-server -m Vikra-MixP_4.9b_S.gguf -ngl 99 -c 4096

Closing

Vikras is a research project.

It explores how transformer behavior changes when we:

compress
merge
alter numerical geometry

If you are interested in hidden space dynamics / regime sensitivity / anisotropic quantization — welcome.

Downloads last month: 392

GGUF

Model size

2B params

Architecture

qwen3

Hardware compatibility

6-bit

8-bit

View +3 variants

Collection including srs6901/Vikras-MixP

Vikras family

Collection

3 items • Updated 3 days ago • 1

srs6901
/

Vikras-MixP