SolarHive E4B — LoRA Adapters (Apply Over Base)

LoRA fine-tuned Gemma 4 E4B (8B) — adapter weights only. Apply over the open-source base model at runtime via Unsloth's FastVisionModel.from_pretrained(...). Smallest download in the E4B family (~200 MB) and the canonical input to the merge → safetensors → GGUF deployment chain.

For zero-dependency loads via plain transformers, use solarhive-e4b-ollama instead — that repo ships the pre-merged BF16 safetensors (~16 GB) so you can AutoModelForCausalLM.from_pretrained(...) directly with no PEFT/Unsloth dependency. For Ollama / llama.cpp edge deployment on a 16 GB CPU laptop, use solarhive-e4b-gguf — it ships the 5.34 GB Q4_K_M GGUF (standard Q6_K-PLE recipe) plus the 992 MB mmproj-BF16.gguf companion (vision + audio), with Modelfiles ready for ollama create and a 10/10 score on the single-pass SolarHive project-held-out 10-prompt parity check.

This repo is the source of truth. The merged safetensors and the GGUFs are derived from it. If you fine-tune further or replicate the recipe, this is the artifact you would update.

This repository serves three roles:

Application via Unsloth — load through FastVisionModel.from_pretrained("Truthseeker87/solarhive-e4b-lora") to apply the adapters over the base model in-memory, no merge step needed. Smallest download path.
Re-merge for deployment — feed into solarhive_merge_e4b.ipynb to produce the standalone BF16 safetensors at solarhive-e4b-ollama. The notebook resolves these adapters automatically from a local cache or, as fallback, from this HF repo.
Reference for further fine-tuning — extend the LoRA on additional data using Unsloth FastVisionModel. The adapter_config.json records the canonical training hyperparameters (r=16, α=16, all-linear, BF16).

Built for the Gemma 4 Good Hackathon (Google DeepMind x Kaggle).


Base Model	google/gemma-4-e4b-it
Architecture	Dense + PLE — 8B total, 4.5B effective
Fine-Tuning	LoRA via Unsloth (BF16)
Training Data	1,727 examples (solarhive-community-solar-multimodal) — 1,713 text + 14 image-grounded; text-only fine-tune; VQA at inference uses the base Gemma 4 vision encoder (~150M params), unmodified by our LoRA per the Vertex AI SFT recipe
Converged Loss	0.9218 (last-20-step rolling average)
Benchmark	10/10 (5/5 domain Q&A + 5/5 tool calling) — May 2026 final run, sole 10/10 winner across 5 ran variants (see Benchmark Results below)
Training Time	420 seconds (~7 minutes)
Adapter Size	~200 MB (`adapter_model.safetensors`)
Compute	Google Colab Pro (G4 VM, RTX PRO 6000 Blackwell 102 GB)
License	MIT (adapters) / Gemma Terms (base model)

Model Overview

SolarHive E4B is the edge companion to SolarHive 26B A4B. The 26B model powers cloud inference with full multimodal VQA; the E4B family delivers the same domain expertise to laptop and edge hardware.

This repo holds the LoRA adapter weights only. The deployable artifacts — merged BF16 safetensors and Q4_K_M GGUF — are derived from these adapters and live at the solarhive-e4b-ollama and solarhive-e4b-gguf companion repos. To use the SolarHive E4B family on a 16 GB CPU laptop, go to the GGUF repo. To use it via plain transformers with no PEFT or Unsloth dependency, go to the merged-safetensors repo. This adapters-only repo is for two audiences: (a) developers who load via Unsloth's FastVisionModel to apply the delta in-memory (smallest download, ~200 MB) and (b) anyone extending the fine-tune with additional training data.

Privacy-first: Running Gemma 4 locally — whether via these adapters in transformers, the merged safetensors, or the Q4_K_M GGUF — keeps community energy data in the neighborhood. No cloud dependency, no internet requirement, no data privacy concerns. A village in rural India, a suburb in Michigan, and a coastal town recovering from a hurricane all get the same intelligence.

Training Details

Parameter	Value
Method	LoRA via Unsloth `FastVisionModel` (BF16)
LoRA rank	16
LoRA alpha	16
LoRA dropout	0
Bias	none
Target modules	All linear layers (`target_modules="all-linear"`)
Vision tower	Frozen (`finetune_vision_layers=False`)
Language tower	Trained (`finetune_language_layers=True`)
Attention modules	Trained (`finetune_attention_modules=True`)
MLP modules	Trained (`finetune_mlp_modules=True`)
Gradient checkpointing	`unsloth` (memory-optimized)
Loss masking	`train_on_responses_only` (loss masked to assistant outputs only)
Learning rate	2e-4
LR scheduler	cosine
Optimizer	AdamW 8-bit
Weight decay	0.001
Warmup steps	5
Epochs	3
Max sequence length	2048
Per-device batch	4
Gradient accumulation	4 (effective batch size 16)
Precision	BF16 (auto-detected via `torch.cuda.is_bf16_supported()`)
Seed	3407
Trainable parameters	41,222,144 / 8,037,378,592 (0.51%)

Training Loss

Metric	Value
Converged loss (last 20 steps)	0.9218
Final step loss	0.0635
Minimum loss	0.0635
HF Trainer running average (all steps)	1.7153
Total steps	324
Training time	420 seconds (~7 min)

Canonical metric: the bolded Converged loss (last 20 steps) is the only smoothed convergence indicator. The HF Trainer running average across all steps (1.7153) reflects the high early-step values during warmup; the converged value is the load-bearing number for fine-tune quality. Final step and Minimum are single-batch point statistics — mini-batch loss is noisy step-to-step, so one easy batch can drop a point estimate well below the rolling-average trend.

Training Data

Same canonical training corpus as the 26B A4B model — solarhive-community-solar-multimodal, 1,727 rows = 1,713 text + 14 image-grounded:

413 hand-crafted examples spanning 15+ US cities and 9 energy domains
~1,117 API-grounded examples from live Open-Meteo, PVWatts, OWM, and EIA data
183 tool-calling examples following the When2Call taxonomy (106 should-call, 53 should-not-call, 10 unable-to-answer, 6 follow-up clarification, 8 failure-recovery)
14 image-grounded Q&A turns from 7 manually-labeled Ann Arbor sky photographs

Image rows are skipped at the data-prep layer — the training pipeline pre-renders only text rows for TRL's default text collator. See Fine-Tuning Architecture below for the rationale.

Hardware

GPU: NVIDIA RTX PRO 6000 Blackwell Server Edition (102 GB VRAM total, 94.97 GB max usable per Unsloth, 85 GB free at training start)
Platform: Google Colab Pro (G4 VM)

Benchmark Results — May 2026 Final Run

End-to-end inference run on Colab Pro G4. The E4B LoRA + base variant is loaded via Unsloth FastVisionModel.from_pretrained(...) from a local cache (16.9 GB VRAM utilization, ~3-5 minutes wall time), with the agentic loop and live API tools wired up exactly as in production.

The 10-question parity benchmark is split into 5 domain Q&A probes (no tool expected) and 5 tool-calling probes (specific expected tool, or no tool when the question is general):

Domain Q&A — 5/5

Probe	Question	Expected behavior	Result
Q1	"What happens to solar production when humidity exceeds 80%?"	Direct answer, no tool	✅
Q2	"At what battery SOC should we stop exporting to the grid?"	Direct answer, no tool	✅
Q3	"Home #3 has been underperforming by 22% for three weeks. What's the diagnostic checklist?"	Direct answer, no tool	✅
Q4	"It's winter in Ann Arbor and panels have snow. Prioritize actions."	Direct answer, no tool	✅
Q5	"Grid frequency dropped to 59.8 Hz. What does that mean for our microgrid?"	Direct answer, no tool	✅

Tool Calling — 5/5

Probe	Question	Expected	Called	Status
TQ1	"What's the current battery state?"	`get_battery_state`	`get_battery_state`	✅
TQ2	"What's the current weather in Ann Arbor and how does it affect solar production?"	`get_weather`	`get_weather`	✅
TQ3	"What are the general maintenance tips for panels?"	None (no tool)	None	✅
TQ4	"What's the grid pricing right now and what's the renewable mix?"	`get_grid_status`	`get_grid_status`	✅
TQ5	"Compare today's irradiance forecast across Ann Arbor, Phoenix, and Seattle."	≥2 of `{get_solar_production, get_weather}` (`min_calls=2`)	`get_weather × 3` (one per city)	✅

The multi-call probe (TQ5) is the load-bearing differentiator. This LoRA + base variant chained three get_weather calls (one per city), satisfying the min_calls=2 threshold. The other 4 ran variants in the same notebook returned 0 or 1 call on TQ5 and scored 9/10. This is the only variant in the run that scored a perfect 10/10 — see the cross-variant table below.

Multi-Variant Deployment Validation

Cross-variant comparison from the May 2026 final inference run, all loaded sequentially on the same Colab Pro G4 GPU, same 10 prompts, same agentic-loop harness:

Variant	Repo	Q&A	Tool	Total	TQ5 behavior
E4B LoRA + base (this repo)	`solarhive-e4b-lora`	5/5	5/5	10/10	3 chained `get_weather` calls
E4B merged BF16	`solarhive-e4b-ollama`	5/5	4/5	9/10	1 call only
A4B LoRA + base	`solarhive-26b-a4b-lora`	5/5	4/5	9/10	no tool call
A4B merged BF16	`solarhive-26b-a4b-merged`	5/5	4/5	9/10	no tool call
A4B NF4	`solarhive-26b-a4b-nf4`	5/5	4/5	9/10	no tool call

This is a single-trial result; whether the multi-call advantage is reproducible at temperature=1.0 or stochastic deserves multi-trial follow-up. Either way, this LoRA path is the lightest deployment configuration (~200 MB adapters over the kagglehub-cached base) and produces strong generalization on the held-out tool-routing benchmark.

When2Call Refusal / Follow-up Probes

Three held-out probes from Ross, H., Mahabaleshwarkar, A. S., & Suhara, Y. (2025). When2Call: When (not) to Call Tools. arXiv:2504.18851 cover 3 of the 4 failure-mode categories the paper documents (the paper measures 9–67% tool-hallucination rates on (c) and (d) in untrained community models):

Category	Question	Expected behavior
(b) Well-specified, in-scope	"What's the current grid rate?"	Call `get_grid_status`
(c) Under-specified	"How much will a 10 kW array produce today?"	Follow-up question (does NOT auto-fill location default)
(d) Out-of-scope	"What's the current air quality index in Ann Arbor?"	Refusal + redirect (does NOT hallucinate a tool)

Measurement scope. In the May 2026 inference run, the When2Call suite was directly measured on (i) the A4B LoRA + base variant — score 3/3 — and (ii) the E4B merged BF16 variant — score 2/3 (passes (b) + (c), fails (d) by calling get_weather instead of disclaiming). The When2Call suite was not re-run separately on this E4B LoRA + base variant.

Inferred score for this variant: 2/3, matching E4B merged. Merging is mathematically lossless on weights — save_pretrained_merged("merged_16bit") produces standalone BF16 safetensors with identical numerical content. Since the LoRA + base variant and the merged variant share the same fine-tuned weights at inference, their When2Call decision boundary is identical. We label this score inferred to distinguish it from the directly-measured 3/3 on A4B LoRA and 2/3 on E4B merged. The inference is supported indirectly by the matching parity-benchmark Q&A profile (5/5 each across both E4B variants on the questions both ran).

Why E4B trails A4B by 1/3 on When2Call. A4B LoRA scored 3/3; E4B merged scored 2/3 (failed (d)). This regression at smaller parameter scale was the pre-stated hypothesis going in, per the official Google Gemma 4 Core docs "Parameter sizes and quantization" section: "Models with higher parameters and bit counts (higher precision) are generally more capable, but are more expensive to run." E4B (8B total / 4.5B effective / ~150M vision encoder) vs the A4B variant (25.2B total / 3.8B active MoE / ~550M vision encoder) reflect a deliberate ~3× capacity gap on the dimension that drives reasoning-heavy refusal/follow-up behavior.

Quantitative reinforcement from Unsloth's published Gemma 4 benchmarks: E4B trails the 26B A4B variant by 13.2 pp on MMLU Pro (69.4% vs 82.6%), 21.2 pp on MMMU Pro (52.6% vs 73.8%), and 45.8 pp on AIME 2026 (42.5% vs 88.3%). These reasoning-benchmark gaps predict the (d) regression we observe.

SolarHive's deployment routes well-specified queries to E4B at the edge and escalates under-specified or out-of-scope queries to A4B in the cloud — architecture-aware tier selection per the documented scaling.

Core Capabilities

1. Multimodal Visual Question Answering (3 Modes)

Available when loaded over the base Gemma 4 E4B (vision encoder ~150M params, frozen during fine-tune):

Mode	Input	Output
Sky Analysis	Sky photograph	Cloud coverage %, production forecast, storage recommendation
Panel Inspection	Panel photograph	Dirt/damage/shading detection, efficiency impact estimate
Neighborhood Assessment	Aerial/satellite image	Panel inventory, expansion priorities, shading analysis

2. Native Function Calling (5 Tools — all 3 keyed APIs wired)

Tool	API	Returns
`get_weather(location)`	OpenWeatherMap (`OWM_API_KEY`)	Temperature, clouds %, wind, humidity, sunrise/sunset
`get_solar_production(clouds_pct, temp_f)`	Open-Meteo GHI (keyless)	Production kW, efficiency %, GHI W/m², temp derating
`get_battery_state()`	Community BMS (sim)	State of charge, capacity, charging status
`get_grid_status()`	EIA Open Data (`EIA_API_KEY`)	Pricing period, rate/kWh, renewable %, CO2 intensity
`get_nrel_pvwatts_baseline()`	NREL PVWatts v8 (`NREL_API_KEY`)	Annual + current-month typical kWh + avg kW for the 72 kW array

Tool results feed back as a 2-message sequence matching the training distribution: {"role": "assistant", "tool_calls": [...]} then {"role": "tool", "name": "<fn>", "content": json.dumps(result)} per call. This format is shared across the data-generation pipeline, the fine-tune SFT preprocessing, and the inference agentic loop — inference matches the training distribution exactly.

3. Selective Tool Reasoning

The model decides when to call tools — it does not blindly invoke all of them:

"What time does peak pricing start?"
→ Calls: get_grid_status() only

"Is today's production above typical for January?"
→ Calls: get_solar_production() + get_nrel_pvwatts_baseline()

"Should I run my pool heater now?"
→ Calls: get_weather() + get_solar_production() + get_battery_state() + get_grid_status()

"What are general maintenance tips for panels?"
→ Calls: none (answers from training knowledge)

4. Refusal and Follow-up Behavior

The training corpus includes 16 explicit refusal/follow-up examples (10 unable-to-answer + 6 follow-up clarification) following the When2Call taxonomy. See the When2Call section above for measured behavior.

How to Use

Apply via Unsloth (recommended — smallest download path)

from unsloth import FastVisionModel
import torch

model, processor = FastVisionModel.from_pretrained(
    "Truthseeker87/solarhive-e4b-lora",  # This repo (LoRA adapters, ~200 MB)
    dtype=torch.bfloat16,
    load_in_4bit=False,
)
FastVisionModel.for_inference(model)

Unsloth resolves the base model from adapter_config.json's base_model_name_or_path field and applies the LoRA in-memory. No separate merge step; no PEFT dependency complications around Gemma4ClippableLinear.

Re-merge to Standalone Safetensors

For deployments that want plain transformers.AutoModelForCausalLM.from_pretrained() without an Unsloth dependency, re-merge via the solarhive_merge_e4b.ipynb notebook:

# Inside the merge notebook (excerpt)
model.save_pretrained_merged("/content/e4b_merged", processor, save_method="merged_16bit")
api.upload_folder(folder_path="/content/e4b_merged",
                  repo_id="Truthseeker87/solarhive-e4b-ollama",
                  repo_type="model",
                  delete_patterns=["*.safetensors", "*.safetensors.index.json", "model-*"])

The pre-merged result lives at solarhive-e4b-ollama and is the upstream input for GGUF conversion.

Edge Deployment — use the GGUF repo

For Ollama or llama.cpp on a 16 GB CPU laptop, download the GGUF artifacts from solarhive-e4b-gguf:

hf download Truthseeker87/solarhive-e4b-gguf \
  solarhive-e4b-q4_k_m.gguf Modelfile \
  --local-dir ./solarhive-gguf
cd ./solarhive-gguf
ollama create solarhive -f Modelfile
ollama run solarhive "What's the best time to run my dishwasher today?"

The GGUF repo also includes the Standard Q4_K_M variant (Colab-produced, Q6_K PLE) and a 992 MB mmproj-BF16.gguf for full multimodal via llama-server --mmproj.

Further Fine-Tuning

The adapters can be extended on additional data using Unsloth's FastVisionModel:

from unsloth import FastVisionModel
model, processor = FastVisionModel.from_pretrained(
    "Truthseeker87/solarhive-e4b-lora", dtype=torch.bfloat16, load_in_4bit=False,
)
FastVisionModel.for_training(model)
# ... configure SFTTrainer with your additional dataset ...

The adapter_config.json records the original hyperparameters (r=16, α=16, all-linear) — extending the LoRA preserves the SolarHive domain expertise while adding new behaviors.

Community Model

Parameter	Value
Location	Ann Arbor, Michigan (42.2808°N, 83.7430°W)
Community size	12 homes
Total panel capacity	72 kW
Shared battery storage	100 kWh
Grid region	MISO (Midcontinent Independent System Operator)

Technical Notes

LoRA adapters only. This repo ships ~200 MB of adapter weights. The base Gemma 4 E4B is ~16 GB and must be downloaded separately (Unsloth resolves it automatically from adapter_config.json).
Vision tower frozen during fine-tune. finetune_vision_layers=False — VQA at inference uses the base model's pretrained vision encoder unmodified, matching the Vertex AI SFT recipe which freezes both vision and audio towers during text-focused fine-tuning.
Loss masked to assistant outputs. train_on_responses_only ensures the model only learns to generate assistant-side content, not user prompts or system messages.
Two-step tokenization at inference. Single-step tokenize=True crashes in transformers 5.5.x on messages without a content key (e.g., tool_calls messages). Always render text first (tokenize=False) then tokenize separately.
Sampling. temperature=1.0, top_p=0.95, top_k=64 (Kaggle-recommended Gemma 4 defaults).
Chat template. gemma-4 (per Unsloth Tip #1 for E2B/E4B). The gemma-4-thinking template is reserved for 26B/31B reasoning-class variants. The simpler template is more robust across downstream Ollama / llama.cpp runtimes that don't expose enable_thinking=False at the runtime layer.

Limitations

Prototype scope. Tested on a single community model (12 homes, Ann Arbor, MI). Real-world deployment requires validation across diverse geographies and community sizes.
Smaller model, weaker refusal/follow-up. When2Call (d) regression vs the A4B LoRA baseline (2/3 vs 3/3 — see Benchmark Results above). Route under-specified or out-of-scope queries to the A4B cloud variant for correct refusal + follow-up behavior.
Occasional capacity hallucination. The base model's prior occasionally surfaces "60 kW" instead of the correct 72 kW community capacity in direct (no-tool) responses. The tool-calling path (which queries actual capacity from get_nrel_pvwatts_baseline) avoids this.
External API dependence. Tool responses depend on Open-Meteo, OWM, EIA, and PVWatts availability. OWM free tier allows 1,000 calls/day; Open-Meteo allows 10K/day; EIA and PVWatts have project-specific limits.
Battery state is simulated. get_battery_state() is a deterministic in-memory simulator for demonstrations — real deployment requires integration with actual battery management systems.
Single-trial multi-variant validation. The 10/10 vs 9/10 multi-call advantage on TQ5 was measured in one inference run; a multi-trial bootstrap would strengthen the claim against temperature-1.0 stochasticity.
When2Call score for this variant is inferred, not directly measured. See the Benchmark Results section for the full reasoning.

Future Iteration — Multi-Token Prediction (MTP) Drafters

Not in the measured numbers above. Google announced Gemma 4 MTP drafters on May 5, 2026 (blog, overview, HF collection, Kaggle, @GoogleGemma) — after this artifact's final benchmark was captured. The benchmarks above reflect standard autoregressive decoding only. MTP integration is documented here as future iteration; no measured speedup is claimed in this release.

Theoretical foundation. Speculative decoding (Leviathan, Kalman & Matias, Fast Inference from Transformers via Speculative Decoding, ICML 2023, arXiv:2211.17192) accelerates generation without changing the output distribution under argmax decoding: a smaller drafter proposes γ candidate tokens, the target verifies all γ in a single parallel forward pass, accepted tokens are kept, and any rejection is resampled from a corrected distribution. The output distribution is preserved exactly regardless of drafter quality; only acceptance rate α, and therefore walltime speedup, varies.

What Google released on May 5, 2026. Paired drafter checkpoints for all four IT-tuned Gemma 4 variants — gemma-4-E2B-it-assistant, gemma-4-E4B-it-assistant, gemma-4-26B-A4B-it-assistant, gemma-4-31B-it-assistant — discoverable via the google/gemma-4 Hugging Face collection and on Kaggle Models. The drafters share the input embedding table with their paired target and consume the target's last-layer activations (architecture per the MTP overview). For the E4B target family the paired drafter is google/gemma-4-E4B-it-assistant (78.8 M params). Google reports up to 3× decode speedup with no quality degradation on the headline 26B-A4B configuration and **2.2×** on Apple Silicon at batch sizes 4–8; per-variant E4B numbers were not enumerated in the announcement, and E4B is smaller and dense, so the speedup curve will differ. Tested runtimes named in the blog: LiteRT-LM, MLX, Hugging Face Transformers, vLLM, SGLang, Ollama.

Integration via Hugging Face Transformers + Unsloth (the load path this repo documents):

from unsloth import FastVisionModel
from transformers import AutoModelForCausalLM

target, processor = FastVisionModel.from_pretrained("Truthseeker87/solarhive-e4b-lora", ...)
assistant = AutoModelForCausalLM.from_pretrained("google/gemma-4-E4B-it-assistant", dtype=torch.bfloat16, ...)
target.generate(**inputs, assistant_model=assistant)  # MTP enabled

Integration caveat to verify on first run. Unsloth's FastVisionModel-wrapped target may need target.base_model.generate(...) rather than target.generate(...) to expose the Hugging Face Transformers assistant_model= kwarg cleanly. The exact surface is unvetted against the May 5 release; this is the integration risk to validate before propagating any measurement.

Open question specific to this LoRA-adapter target. Per the 2023 speculative-sampling guarantee, correctness is invariant to drafter quality — the target's verification step preserves the exact output distribution regardless of what the drafter proposes. What varies is acceptance rate α, since Google's released drafter was trained against the base gemma-4-E4B-it, not against this LoRA-adapter-on-top target. Measured α against a domain-specialized E4B target is the planned post-hackathon contribution at the edge tier; a cloud-tier measurement against the A4B merged target is captured by the gated future-iteration cell in solarhive_inference.py §14.

Companion Repositories

Model	Repository	Purpose
SolarHive 26B A4B LoRA	solarhive-26b-a4b-lora	Cloud inference with full multimodal + function calling (LoRA adapters)
SolarHive 26B A4B Merged	solarhive-26b-a4b-merged	Full BF16 cloud model (~48 GB) — production inference, no PEFT/Unsloth dep
SolarHive 26B A4B NF4	solarhive-26b-a4b-nf4	Pre-quantized 4-bit cloud model for HF Spaces / 24 GB+ GPUs
SolarHive E4B LoRA	This repo	E4B adapter weights (~200 MB) — apply over base via Unsloth
SolarHive E4B Safetensors	solarhive-e4b-ollama	Source safetensors for transformers research / GGUF conversion via llama.cpp
SolarHive E4B GGUF	solarhive-e4b-gguf	Edge deployment — Q4_K_M GGUF + mmproj for Ollama / llama.cpp on 16 GB CPU laptop. 10/10 project-held-out check.
SolarHive Dataset	solarhive-community-solar-multimodal	1,727 training examples (1,713 text + 14 image-grounded)
LiteRT-LM Python edge runtime	`solarhive_e4b_litert_v3.1.ipynb`	LiteRT Special Tech Track entry — runs upstream base `litert-community/gemma-4-E4B-it-litert-lm` `.litertlm` (3.66 GB) + SolarHive UX layer + on-device agentic loop with native Gemma 4 function calling. Q&A 8/8 on Colab Pro CPU + High-RAM. Fine-tuned LiteRT-LM bundle is a planned next iteration once upstream `gemma4` example module lands in `ai_edge_torch.generative.examples/`.
GitHub	the-gemma4-good-hackathon-solarhive	Full source code, training & quantization notebooks, data principles

Fine-Tuning Architecture — Text-Only on the Multimodal-Capable Corpus

The shipped fine-tune is text-only on the canonical solarhive-community-solar-multimodal corpus (1,727 rows = 1,713 text + 14 image-grounded). Image rows are skipped at the data-prep layer; the training pipeline pre-renders only text rows for TRL's default text collator. Multimodal fine-tuning is deferred post-hackathon — a real image corpus and a held-out VQA benchmark would be prerequisites; the dataset's image schema is preserved so a future multimodal fine-tune can re-enable image rows without changing the corpus.

VQA at inference time uses the base Gemma 4 E4B model's pretrained vision encoder (~150M params per the official model card). Our LoRA targets only the language-model linear layers (target=all-linear); the vision tower is not modified. This matches the Vertex AI Gemma 4 SFT recipe documented in the Hugging Face blog, which explicitly freezes both vision and audio towers during text-focused fine-tuning.

Companion 26B A4B LoRA is published at Truthseeker87/solarhive-26b-a4b-lora.

The dataset uses the project archive for its 14 image-grounded Q&A turns (7 Ann Arbor sky photos × 2 turns). Image-source planning pivoted twice: the SWIM corpora (NUS) were rejected for CC BY-NC licensing, and NREL SRRL was rejected because the legacy MIDC SkyCam image archive ended May 2017 (modern ASI-16 only exposes derived measurements). The shipped dataset uses the project archive only — fewer images, but every label is human-confirmed and every paired Q&A traces back to the same GHI / temperature-derating formula used elsewhere in the dataset.

The fine-tune notebook has been pre-aligned with the official Unsloth Gemma 4 documentation (train guide, bug fixes & tips): explicit loader arguments (max_seq_length, dtype, full_finetuning=False), explicit SFTConfig arguments (weight_decay, lr_scheduler_type, max_grad_norm), and chat_template="gemma-4" per Tip #1 (the simpler template is recommended for E2B/E4B; gemma-4-thinking is reserved for 26B/31B reasoning-class variants). The change makes the embedded chat template more robust across downstream Ollama / llama.cpp runtimes that don't expose enable_thinking=False at the runtime layer.

Citation

@misc{solarhive2026,
  title={SolarHive: AI-Powered Community Solar Energy Intelligence},
  author={Youshen Lim},
  year={2026},
  url={https://github.com/youshen-lim/the-gemma4-good-hackathon-solarhive},
  note={Gemma 4 Good Hackathon submission — Google DeepMind x Kaggle}
}

Dataset used to train Truthseeker87/solarhive-e4b-lora

Papers for Truthseeker87/solarhive-e4b-lora

When2Call: When (not) to Call Tools

Paper • 2504.18851 • Published Apr 26, 2025

Fast Inference from Transformers via Speculative Decoding

Paper • 2211.17192 • Published Nov 30, 2022 • 11

Evaluation results

Accuracy
self-reported

1.000
Accuracy
self-reported

1.000

Truthseeker87
/

solarhive-e4b-lora