Pylox Crypto Agent 8B

A function-calling LoRA adapter for meta-llama/Llama-3.1-8B-Instruct, fine-tuned on multi-turn function-calling traces from NousResearch/hermes-function-calling-v1. Designed to be served with NVFP4 weight-quantization and an off-the-shelf EAGLE-3 speculative-decoding head for low-latency tool-use on a single Grace Blackwell.

This is the seed (SFT) model of the Pylox Forge crypto-agent line. It targets BFCL v3 core categories; results below the v3 ≥ 60 target are reported honestly.

Model details


Adapter	LoRA (PEFT, rank 32, α 64, dropout 0.1, target=`q,k,v,o,gate,up,down`)
Base model	`meta-llama/Llama-3.1-8B-Instruct`
Recommended serving base	`nvidia/Llama-3.1-8B-Instruct-NVFP4`
Speculative head	`RedHatAI/Llama-3.1-8B-Instruct-speculator.eagle3`
License	Llama 3.1 Community License (inherited from base)
Hardware trained on	Single NVIDIA Grace Blackwell (GB10)

Training

Data: NousResearch/hermes-function-calling-v1, ~10k multi-turn function-calling traces.
Format: messages; tool catalog kept in the system / first user turn.
Technique: QLoRA SFT (4-bit NF4 base + bf16 compute), 3 epochs, cosine LR (2e-4 peak), max_seq_length=2048, paged AdamW 8-bit, gradient checkpointing.
Final train loss: 0.235 · token accuracy: 0.959.
Steps: 849, examples: 8,546.

BFCL v3 evaluation

Run with bfcl-eval (v4 data, core v3 categories, single-turn) against this adapter served on nvidia/Llama-3.1-8B-Instruct-NVFP4 via vLLM.

Category	Passed / Total	Accuracy
`simple_python`	270 / 400	67.5%
`multiple`	140 / 200	70.0%
`parallel`	123 / 200	61.5%
`parallel_multiple`	74 / 200	37.0%
`irrelevance`	38 / 240	15.8%
`live_simple`	111 / 258	43.0%
`live_multiple`	478 / 1,053	45.4%
`live_parallel`	9 / 16	56.2%
`live_parallel_multiple`	6 / 24	25.0%
`live_irrelevance`	558 / 884	63.1%
`live_relevance`	8 / 16	50.0%
`simple_java` (out-of-scope)	15 / 100	15.0%
`simple_javascript` (out-of-scope)	11 / 50	22.0%

Headline

Prompt-weighted accuracy: 50.6% (1,841 / 3,641).
Unweighted category mean: 44.0%.
Spec target was BFCL v3 ≥ 60. Result is below target.

Why the gap: a non-trivial slice of BFCL prompts get refused by the model under safety policies (S1/S6/S7/S8) — those count as decode errors in BFCL, even when the refusal is correct behavior. The honest read is that this seed adapter wins on simple_python / multiple / parallel (Hermes's distribution) but is over-conservative on irrelevance and live categories. A DPO-aligned follow-up trained against BFCL refusal patterns is the planned next step.

Java and JavaScript categories are documented as out-of-scope in the spec (Hermes is Python-shaped) and reported here only for completeness.

Safety

Red-team probe: 75.56% block rate (0% false-positive rate) on a 90-prompt suite covering harmful instructions, jailbreak attempts, and policy-evading function calls.

Usage

PEFT (research / batch)

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = "meta-llama/Llama-3.1-8B-Instruct"
adapter = "pyloxsystems/pylox-crypto-agent-8b"

tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)

vLLM with EAGLE-3 speculative decoding

vllm serve nvidia/Llama-3.1-8B-Instruct-NVFP4 \
    --enable-lora \
    --lora-modules pylox-crypto-agent-8b=pyloxsystems/pylox-crypto-agent-8b \
    --speculative-config '{"method":"eagle3","model":"RedHatAI/Llama-3.1-8B-Instruct-speculator.eagle3"}'

Scope and limitations

In-scope: Python tool/function calls, multi-turn agent loops with a JSON tool catalog, irrelevance abstention.
Out-of-scope: Java / JavaScript callable surfaces; multi-turn long-horizon tasks (BFCL multi_turn_*); long-context retrieval (memory_*); web-search categories; format-sensitivity categories.
Crypto / web3 framing: the Hermes corpus contains web3-friendly tool catalogs (DEXs, on-chain queries). The model is not connected to any chain or wallet — it produces structured tool calls only.
Not a finished agent: this is an SFT seed for a portfolio line. Behavior on real production catalogs should be measured before deployment.

Citation

If you use this model, please cite Pylox Forge:

@misc{pylox_crypto_agent_8b_2026,
  title  = {Pylox Crypto Agent 8B},
  author = {Pylox Systems},
  year   = {2026},
  url    = {https://huggingface.co/pyloxsystems/pylox-crypto-agent-8b}
}

Portfolio: pyloxforge.com

Downloads last month: 72

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pyloxsystems/pylox-crypto-agent-8b

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Adapter

(2225)

this model