Instructions to use pyloxsystems/pylox-crypto-agent-8b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use pyloxsystems/pylox-crypto-agent-8b with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct") model = PeftModel.from_pretrained(base_model, "pyloxsystems/pylox-crypto-agent-8b") - Notebooks
- Google Colab
- Kaggle
Pylox Crypto Agent 8B
A function-calling LoRA adapter for meta-llama/Llama-3.1-8B-Instruct, fine-tuned on multi-turn function-calling traces from NousResearch/hermes-function-calling-v1. Designed to be served with NVFP4 weight-quantization and an off-the-shelf EAGLE-3 speculative-decoding head for low-latency tool-use on a single Grace Blackwell.
This is the seed (SFT) model of the Pylox Forge crypto-agent line. It targets BFCL v3 core categories; results below the v3 ≥ 60 target are reported honestly.
Model details
| Adapter | LoRA (PEFT, rank 32, α 64, dropout 0.1, target=q,k,v,o,gate,up,down) |
| Base model | meta-llama/Llama-3.1-8B-Instruct |
| Recommended serving base | nvidia/Llama-3.1-8B-Instruct-NVFP4 |
| Speculative head | RedHatAI/Llama-3.1-8B-Instruct-speculator.eagle3 |
| License | Llama 3.1 Community License (inherited from base) |
| Hardware trained on | Single NVIDIA Grace Blackwell (GB10) |
Training
- Data:
NousResearch/hermes-function-calling-v1, ~10k multi-turn function-calling traces. - Format: messages; tool catalog kept in the system / first user turn.
- Technique: QLoRA SFT (4-bit NF4 base + bf16 compute), 3 epochs, cosine LR (2e-4 peak),
max_seq_length=2048, paged AdamW 8-bit, gradient checkpointing. - Final train loss: 0.235 · token accuracy: 0.959.
- Steps: 849, examples: 8,546.
BFCL v3 evaluation
Run with bfcl-eval (v4 data, core v3 categories, single-turn) against this adapter served on nvidia/Llama-3.1-8B-Instruct-NVFP4 via vLLM.
| Category | Passed / Total | Accuracy |
|---|---|---|
simple_python |
270 / 400 | 67.5% |
multiple |
140 / 200 | 70.0% |
parallel |
123 / 200 | 61.5% |
parallel_multiple |
74 / 200 | 37.0% |
irrelevance |
38 / 240 | 15.8% |
live_simple |
111 / 258 | 43.0% |
live_multiple |
478 / 1,053 | 45.4% |
live_parallel |
9 / 16 | 56.2% |
live_parallel_multiple |
6 / 24 | 25.0% |
live_irrelevance |
558 / 884 | 63.1% |
live_relevance |
8 / 16 | 50.0% |
simple_java (out-of-scope) |
15 / 100 | 15.0% |
simple_javascript (out-of-scope) |
11 / 50 | 22.0% |
Headline
- Prompt-weighted accuracy: 50.6% (1,841 / 3,641).
- Unweighted category mean: 44.0%.
- Spec target was BFCL v3 ≥ 60. Result is below target.
Why the gap: a non-trivial slice of BFCL prompts get refused by the model under safety policies (S1/S6/S7/S8) — those count as decode errors in BFCL, even when the refusal is correct behavior. The honest read is that this seed adapter wins on simple_python / multiple / parallel (Hermes's distribution) but is over-conservative on irrelevance and live categories. A DPO-aligned follow-up trained against BFCL refusal patterns is the planned next step.
Java and JavaScript categories are documented as out-of-scope in the spec (Hermes is Python-shaped) and reported here only for completeness.
Safety
Red-team probe: 75.56% block rate (0% false-positive rate) on a 90-prompt suite covering harmful instructions, jailbreak attempts, and policy-evading function calls.
Usage
PEFT (research / batch)
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = "meta-llama/Llama-3.1-8B-Instruct"
adapter = "pyloxsystems/pylox-crypto-agent-8b"
tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)
vLLM with EAGLE-3 speculative decoding
vllm serve nvidia/Llama-3.1-8B-Instruct-NVFP4 \
--enable-lora \
--lora-modules pylox-crypto-agent-8b=pyloxsystems/pylox-crypto-agent-8b \
--speculative-config '{"method":"eagle3","model":"RedHatAI/Llama-3.1-8B-Instruct-speculator.eagle3"}'
Scope and limitations
- In-scope: Python tool/function calls, multi-turn agent loops with a JSON tool catalog, irrelevance abstention.
- Out-of-scope: Java / JavaScript callable surfaces; multi-turn long-horizon tasks (BFCL
multi_turn_*); long-context retrieval (memory_*); web-search categories; format-sensitivity categories. - Crypto / web3 framing: the Hermes corpus contains web3-friendly tool catalogs (DEXs, on-chain queries). The model is not connected to any chain or wallet — it produces structured tool calls only.
- Not a finished agent: this is an SFT seed for a portfolio line. Behavior on real production catalogs should be measured before deployment.
Citation
If you use this model, please cite Pylox Forge:
@misc{pylox_crypto_agent_8b_2026,
title = {Pylox Crypto Agent 8B},
author = {Pylox Systems},
year = {2026},
url = {https://huggingface.co/pyloxsystems/pylox-crypto-agent-8b}
}
Portfolio: pyloxforge.com
- Downloads last month
- 72
Model tree for pyloxsystems/pylox-crypto-agent-8b
Base model
meta-llama/Llama-3.1-8B