File size: 3,973 Bytes
b4ff45c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9cc7ef3
 
 
 
 
 
b4ff45c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
---
base_model: google/functiongemma-270m-it
library_name: transformers
pipeline_tag: text-generation
license: gemma
tags:
- intercomswap
- function-calling
- tool-calling
- lightning
- solana
- gemma
---

# functiongemma-270m-it-intercomswap-v3

IntercomSwap fine-tuned FunctionGemma model for deterministic tool-calling in BTC Lightning <-> USDT Solana swap workflows.

## What Is IntercomSwap

Intercom Swap is a fork of upstream Intercom that keeps the Intercom stack intact and adds a non-custodial swap harness for BTC over Lightning <> USDT on Solana via a shared escrow program, with deterministic operator tooling, recovery, and unattended end-to-end tests.

GitHub: https://github.com/TracSystems/intercom-swap

Base model: [google/functiongemma-270m-it](https://huggingface.co/google/functiongemma-270m-it)

## Model Purpose

- Convert natural-language operator prompts into validated tool calls.
- Enforce buy/sell direction mapping for swap intents.
- Support repeat/autopost workflows used by IntercomSwap prompt routing.

## Repository Layout

- `./`:
  - merged HF checkpoint (Transformers format)
- `./nvfp4`:
  - NVFP4-quantized checkpoint for TensorRT-LLM serving
- `./gguf`:
  - `functiongemma-v3-f16.gguf`
  - `functiongemma-v3-q8_0.gguf`

## Startup By Flavor

### 1) Base HF checkpoint (Transformers)

```bash
python -m vllm.entrypoints.openai.api_server \
  --model TracNetwork/functiongemma-270m-it-intercomswap-v3 \
  --host 0.0.0.0 \
  --port 8000 \
  --dtype auto \
  --max-model-len 8192
```

Lower memory mode example:

```bash
python -m vllm.entrypoints.openai.api_server \
  --model TracNetwork/functiongemma-270m-it-intercomswap-v3 \
  --host 0.0.0.0 \
  --port 8000 \
  --dtype auto \
  --max-model-len 4096 \
  --max-num-seqs 8
```

### 2) NVFP4 checkpoint (`./nvfp4`)

TensorRT-LLM example with explicit headroom (avoid consuming all VRAM):

```bash
trtllm-serve serve ./nvfp4 \
  --backend pytorch \
  --host 0.0.0.0 \
  --port 8012 \
  --max_batch_size 8 \
  --max_num_tokens 16384 \
  --kv_cache_free_gpu_memory_fraction 0.05
```

Memory tuning guidance:

- Decrease `--max_num_tokens` first.
- Then reduce `--max_batch_size`.
- Keep `--kv_cache_free_gpu_memory_fraction` around `0.05` to preserve safety headroom.

### 3) GGUF checkpoint (`./gguf`)

Q8_0 (recommended default balance):

```bash
llama-server \
  -m ./gguf/functiongemma-v3-q8_0.gguf \
  --host 0.0.0.0 \
  --port 8014 \
  --ctx-size 8192 \
  --batch-size 256 \
  --ubatch-size 64 \
  --gpu-layers 12
```

F16 (higher quality, higher memory):

```bash
llama-server \
  -m ./gguf/functiongemma-v3-f16.gguf \
  --host 0.0.0.0 \
  --port 8014 \
  --ctx-size 8192 \
  --batch-size 256 \
  --ubatch-size 64 \
  --gpu-layers 12
```

Memory tuning guidance:

- Lower `--gpu-layers` to reduce VRAM usage.
- Lower `--ctx-size` to reduce RAM+VRAM KV-cache usage.
- Use `q8_0` for general deployment, `f16` for quality-first offline tests.

## Training Snapshot

- Base family: FunctionGemma 270M instruction-tuned.
- Fine-tune objective: IntercomSwap tool-call routing and argument shaping.
- Corpus profile: operations + intent-routing + tool-calling examples.

## Evaluation Snapshot

From held-out evaluation for this release line:

- Train examples: `6263`
- Eval examples: `755`
- Train loss: `0.01348`
- Eval loss: `0.02012`

## Intended Use

- Local or private deployments where tool execution is validated server-side.
- Deterministic operator workflows for swap infra.

## Out-of-Scope Use

- Autonomous financial decision-making.
- Direct execution of unvalidated user text as shell/actions.
- Safety-critical usage without host-side policy/validation.

## Safety Notes

- Always validate tool name + argument schema server-side.
- Treat network-side payloads as untrusted input.
- Keep wallet secrets and API credentials outside model context.

## Provenance

- Derived from: `google/functiongemma-270m-it`
- Integration target: IntercomSwap prompt-mode tool routing