Pylox Crypto Agent 8B

A function-calling LoRA adapter for meta-llama/Llama-3.1-8B-Instruct, fine-tuned on multi-turn function-calling traces from NousResearch/hermes-function-calling-v1. Designed to be served with NVFP4 weight-quantization and an off-the-shelf EAGLE-3 speculative-decoding head for low-latency tool-use on a single Grace Blackwell.

This is the seed (SFT) model of the Pylox Forge crypto-agent line. It targets BFCL v3 core categories; results below the v3 ≥ 60 target are reported honestly.

Model details

Adapter LoRA (PEFT, rank 32, α 64, dropout 0.1, target=q,k,v,o,gate,up,down)
Base model meta-llama/Llama-3.1-8B-Instruct
Recommended serving base nvidia/Llama-3.1-8B-Instruct-NVFP4
Speculative head RedHatAI/Llama-3.1-8B-Instruct-speculator.eagle3
License Llama 3.1 Community License (inherited from base)
Hardware trained on Single NVIDIA Grace Blackwell (GB10)

Training

  • Data: NousResearch/hermes-function-calling-v1, ~10k multi-turn function-calling traces.
  • Format: messages; tool catalog kept in the system / first user turn.
  • Technique: QLoRA SFT (4-bit NF4 base + bf16 compute), 3 epochs, cosine LR (2e-4 peak), max_seq_length=2048, paged AdamW 8-bit, gradient checkpointing.
  • Final train loss: 0.235 · token accuracy: 0.959.
  • Steps: 849, examples: 8,546.

BFCL v3 evaluation

Run with bfcl-eval (v4 data, core v3 categories, single-turn) against this adapter served on nvidia/Llama-3.1-8B-Instruct-NVFP4 via vLLM.

Category Passed / Total Accuracy
simple_python 270 / 400 67.5%
multiple 140 / 200 70.0%
parallel 123 / 200 61.5%
parallel_multiple 74 / 200 37.0%
irrelevance 38 / 240 15.8%
live_simple 111 / 258 43.0%
live_multiple 478 / 1,053 45.4%
live_parallel 9 / 16 56.2%
live_parallel_multiple 6 / 24 25.0%
live_irrelevance 558 / 884 63.1%
live_relevance 8 / 16 50.0%
simple_java (out-of-scope) 15 / 100 15.0%
simple_javascript (out-of-scope) 11 / 50 22.0%

Headline

  • Prompt-weighted accuracy: 50.6% (1,841 / 3,641).
  • Unweighted category mean: 44.0%.
  • Spec target was BFCL v3 ≥ 60. Result is below target.

Why the gap: a non-trivial slice of BFCL prompts get refused by the model under safety policies (S1/S6/S7/S8) — those count as decode errors in BFCL, even when the refusal is correct behavior. The honest read is that this seed adapter wins on simple_python / multiple / parallel (Hermes's distribution) but is over-conservative on irrelevance and live categories. A DPO-aligned follow-up trained against BFCL refusal patterns is the planned next step.

Java and JavaScript categories are documented as out-of-scope in the spec (Hermes is Python-shaped) and reported here only for completeness.

Safety

Red-team probe: 75.56% block rate (0% false-positive rate) on a 90-prompt suite covering harmful instructions, jailbreak attempts, and policy-evading function calls.

Usage

PEFT (research / batch)

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = "meta-llama/Llama-3.1-8B-Instruct"
adapter = "pyloxsystems/pylox-crypto-agent-8b"

tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)

vLLM with EAGLE-3 speculative decoding

vllm serve nvidia/Llama-3.1-8B-Instruct-NVFP4 \
    --enable-lora \
    --lora-modules pylox-crypto-agent-8b=pyloxsystems/pylox-crypto-agent-8b \
    --speculative-config '{"method":"eagle3","model":"RedHatAI/Llama-3.1-8B-Instruct-speculator.eagle3"}'

Scope and limitations

  • In-scope: Python tool/function calls, multi-turn agent loops with a JSON tool catalog, irrelevance abstention.
  • Out-of-scope: Java / JavaScript callable surfaces; multi-turn long-horizon tasks (BFCL multi_turn_*); long-context retrieval (memory_*); web-search categories; format-sensitivity categories.
  • Crypto / web3 framing: the Hermes corpus contains web3-friendly tool catalogs (DEXs, on-chain queries). The model is not connected to any chain or wallet — it produces structured tool calls only.
  • Not a finished agent: this is an SFT seed for a portfolio line. Behavior on real production catalogs should be measured before deployment.

Citation

If you use this model, please cite Pylox Forge:

@misc{pylox_crypto_agent_8b_2026,
  title  = {Pylox Crypto Agent 8B},
  author = {Pylox Systems},
  year   = {2026},
  url    = {https://huggingface.co/pyloxsystems/pylox-crypto-agent-8b}
}

Portfolio: pyloxforge.com

Downloads last month
72
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pyloxsystems/pylox-crypto-agent-8b

Adapter
(2225)
this model