Instructions to use pixas/Miner-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use pixas/Miner-8B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="pixas/Miner-8B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("pixas/Miner-8B")
model = AutoModelForCausalLM.from_pretrained("pixas/Miner-8B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use pixas/Miner-8B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "pixas/Miner-8B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pixas/Miner-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/pixas/Miner-8B

SGLang

How to use pixas/Miner-8B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "pixas/Miner-8B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pixas/Miner-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "pixas/Miner-8B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pixas/Miner-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use pixas/Miner-8B with Docker Model Runner:
```
docker model run hf.co/pixas/Miner-8B
```

pixas commited on Apr 9

Commit

d3c5eb2

verified ·

1 Parent(s): 8107c54

Create README.md

Browse files

Files changed (1) hide show

README.md +136 -0

README.md ADDED Viewed

	@@ -0,0 +1,136 @@

+---
+license: apache-2.0
+language:
+- en
+pipeline_tag: text-generation
+library_name: transformers
+tags:
+- transformers
+- reasoning
+- reinforcement-learning
+- rlvr
+- math
+- miner
+- qwen3
+- causal-lm
+model-index:
+- name: Miner-8B
+  results: []
+datasets:
+- agentica-org/DeepScaleR-Preview-Dataset
+base_model:
+- Qwen/Qwen3-8B-Base
+---
+# Miner-8B
+This repository hosts the Hugging Face Transformers checkpoint for **MINER**: *Mining Intrinsic Mastery for Data-Efficient RL in Large Reasoning Models*.
+- Paper: https://arxiv.org/pdf/2601.04731
+- Code: https://github.com/pixas/Miner
+## Model Description
+Miner-8B is a reasoning model trained with **MINER**, a reinforcement learning method designed to improve data efficiency for large reasoning models. MINER targets the inefficiency of critic-free RL methods on positive homogeneous prompts, where all sampled rollouts are correct and standard relative-advantage training provides little or no learning signal. Instead, MINER leverages the policy’s intrinsic uncertainty as a self-supervised reward signal, without requiring auxiliary reward models or additional inference-time overhead. :contentReference[oaicite:1]{index=1}
+The MINER framework introduces two central ideas:
+1. **Token-level focal credit assignment**, which amplifies learning on uncertain and critical tokens while suppressing overconfident ones.
+2. **Adaptive advantage calibration**, which integrates intrinsic and verifiable rewards in a stable way. :contentReference[oaicite:2]{index=2}
+According to the paper, MINER is evaluated on six reasoning benchmarks using Qwen3-8B-Base and Qwen3-8B-Base, and reports stronger sample efficiency and accuracy than several baseline methods including GRPO variants. :contentReference[oaicite:3]{index=3}
+## Intended Use
+This model is intended for **research and experimental use** in:
+- reasoning and problem solving
+- reinforcement learning for language models
+- mathematical and verifiable reasoning tasks
+- post-training and evaluation of large reasoning models
+Potential use cases include:
+- academic research on RL for reasoning models
+- evaluation on reasoning benchmarks
+- ablation and reproduction studies based on the MINER framework
+- further finetuning or post-training from this checkpoint
+## How to Use
+### Transformers
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+model_name = "pixas/Miner-8B"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype="auto",
+    device_map="auto"
+)
+prompt = [{"role": "user", "content": "What is 2+3?"}]
+inputs = tokenizer(tokenizer.apply_chat_template(prompt, add_generation_prompt=True, tokenize=False), return_tensors='pt').to(model.device)
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=8192,
+    do_sample=True
+)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+````
+### vLLM
+```python
+from vllm import LLM, SamplingParams
+llm = LLM(model="pixas/Miner-8B")
+sampling_params = SamplingParams(
+    temperature=0.6,
+    max_tokens=8192
+)
+prompt = [{"role": "user", "content": "What is 2+3?"}]
+inputs = tokenizer.apply_chat_template(prompt, add_generation_prompt=True, tokenize=False)
+outputs = llm.generate(
+    inputs,
+    sampling_params
+)
+print(outputs[0].outputs[0].text)
+```
+## Limitations
+This model is a research checkpoint and may have several limitations:
+* It may produce incorrect, incomplete, or overconfident reasoning outputs.
+* Performance may depend heavily on prompt format and decoding setup.
+* Results reported in the paper may not transfer exactly to this released checkpoint unless the same base model, data mixture, and evaluation pipeline are used.
+* The model is not intended as a substitute for expert judgment in high-stakes domains.
+## Bias, Risks, and Safety
+Like other large language models, this model may reflect biases present in its training data and may generate harmful, misleading, or factually incorrect outputs. Additional care is required before deployment in user-facing or safety-critical applications.
+## Citation
+If you use this model, please cite:
+```bibtex
+@article{jiang2026miner,
+  title={Miner: Mining Intrinsic Mastery for Data-Efficient RL in Large Reasoning Models},
+  author={Jiang, Shuyang and Wang, Yuhao and Zhang, Ya and Wang, Yanfeng and Wang, Yu},
+  journal={arXiv preprint arXiv:2601.04731},
+  year={2026}
+}
+```
+## Acknowledgements
+This model card is based on the official MINER paper and code repository:
+* Paper: [https://arxiv.org/pdf/2601.04731](https://arxiv.org/pdf/2601.04731)
+* Code: [https://github.com/pixas/Miner](https://github.com/pixas/Miner)