--- base_model: google/functiongemma-270m-it library_name: transformers pipeline_tag: text-generation license: gemma tags: - intercomswap - function-calling - tool-calling - lightning - solana - gemma --- # functiongemma-270m-it-intercomswap-v3 IntercomSwap fine-tuned FunctionGemma model for deterministic tool-calling in BTC Lightning <-> USDT Solana swap workflows. ## What Is IntercomSwap Intercom Swap is a fork of upstream Intercom that keeps the Intercom stack intact and adds a non-custodial swap harness for BTC over Lightning <> USDT on Solana via a shared escrow program, with deterministic operator tooling, recovery, and unattended end-to-end tests. GitHub: https://github.com/TracSystems/intercom-swap Base model: [google/functiongemma-270m-it](https://huggingface.co/google/functiongemma-270m-it) ## Model Purpose - Convert natural-language operator prompts into validated tool calls. - Enforce buy/sell direction mapping for swap intents. - Support repeat/autopost workflows used by IntercomSwap prompt routing. ## Repository Layout - `./`: - merged HF checkpoint (Transformers format) - `./nvfp4`: - NVFP4-quantized checkpoint for TensorRT-LLM serving - `./gguf`: - `functiongemma-v3-f16.gguf` - `functiongemma-v3-q8_0.gguf` ## Startup By Flavor ### 1) Base HF checkpoint (Transformers) ```bash python -m vllm.entrypoints.openai.api_server \ --model TracNetwork/functiongemma-270m-it-intercomswap-v3 \ --host 0.0.0.0 \ --port 8000 \ --dtype auto \ --max-model-len 8192 ``` Lower memory mode example: ```bash python -m vllm.entrypoints.openai.api_server \ --model TracNetwork/functiongemma-270m-it-intercomswap-v3 \ --host 0.0.0.0 \ --port 8000 \ --dtype auto \ --max-model-len 4096 \ --max-num-seqs 8 ``` ### 2) NVFP4 checkpoint (`./nvfp4`) TensorRT-LLM example with explicit headroom (avoid consuming all VRAM): ```bash trtllm-serve serve ./nvfp4 \ --backend pytorch \ --host 0.0.0.0 \ --port 8012 \ --max_batch_size 8 \ --max_num_tokens 16384 \ --kv_cache_free_gpu_memory_fraction 0.05 ``` Memory tuning guidance: - Decrease `--max_num_tokens` first. - Then reduce `--max_batch_size`. - Keep `--kv_cache_free_gpu_memory_fraction` around `0.05` to preserve safety headroom. ### 3) GGUF checkpoint (`./gguf`) Q8_0 (recommended default balance): ```bash llama-server \ -m ./gguf/functiongemma-v3-q8_0.gguf \ --host 0.0.0.0 \ --port 8014 \ --ctx-size 8192 \ --batch-size 256 \ --ubatch-size 64 \ --gpu-layers 12 ``` F16 (higher quality, higher memory): ```bash llama-server \ -m ./gguf/functiongemma-v3-f16.gguf \ --host 0.0.0.0 \ --port 8014 \ --ctx-size 8192 \ --batch-size 256 \ --ubatch-size 64 \ --gpu-layers 12 ``` Memory tuning guidance: - Lower `--gpu-layers` to reduce VRAM usage. - Lower `--ctx-size` to reduce RAM+VRAM KV-cache usage. - Use `q8_0` for general deployment, `f16` for quality-first offline tests. ## Training Snapshot - Base family: FunctionGemma 270M instruction-tuned. - Fine-tune objective: IntercomSwap tool-call routing and argument shaping. - Corpus profile: operations + intent-routing + tool-calling examples. ## Evaluation Snapshot From held-out evaluation for this release line: - Train examples: `6263` - Eval examples: `755` - Train loss: `0.01348` - Eval loss: `0.02012` ## Intended Use - Local or private deployments where tool execution is validated server-side. - Deterministic operator workflows for swap infra. ## Out-of-Scope Use - Autonomous financial decision-making. - Direct execution of unvalidated user text as shell/actions. - Safety-critical usage without host-side policy/validation. ## Safety Notes - Always validate tool name + argument schema server-side. - Treat network-side payloads as untrusted input. - Keep wallet secrets and API credentials outside model context. ## Provenance - Derived from: `google/functiongemma-270m-it` - Integration target: IntercomSwap prompt-mode tool routing