SolarHive

SolarHive E4B — BF16 Merged Safetensors

LoRA fine-tuned Gemma 4 E4B (8B), merged to 16-bit safetensors. Source artifact for direct transformers inference and llama.cpp convert_hf_to_gguf.py → Q4_K_M GGUF conversion (which powers Ollama + llama.cpp edge deployment via the solarhive-e4b-gguf companion repo).

For Ollama or llama.cpp edge deployment on a 16 GB CPU laptop, use the solarhive-e4b-gguf repo instead — it ships the Q4_K_M GGUF text variants (4.6 GB / 5.3 GB) plus the 992 MB mmproj-BF16.gguf companion (vision + audio), with Modelfiles ready for ollama create and a 10/10 score on the SolarHive 10-prompt parity benchmark.

The --experimental Ollama path documented previously OOMs ollama create on ≤16 GB RAM (the 16 GB BF16 safetensors blob does not fit during ingestion). On hardware ≥24 GB RAM the experimental import works, but the GGUF pipeline (built using llama.cpp convert_hf_to_gguf.py + llama-quantize) is the recommended edge deployment path for everyone else.

This repository now serves three roles:

  1. Source for GGUF conversion via llama.cpp's convert_hf_to_gguf.py (text tower) and convert_hf_to_gguf.py --mmproj (vision + audio projector). See solarhive-e4b-gguf for the produced GGUF artifacts.
  2. Transformers-native multimodal use — load with AutoModelForCausalLM for full image + audio + text in Python (requires ≥24 GB RAM or A100-class GPU).
  3. Reference for further fine-tuning — extend the LoRA on additional data using Unsloth FastVisionModel.

Built for the Gemma 4 Good Hackathon (Google DeepMind x Kaggle).

Base Model google/gemma-4-e4b-it
Architecture Dense + PLE — 8B total, 4.5B effective
Fine-Tuning LoRA via Unsloth (BF16)
Training Data 1,727 examples (solarhive-community-solar-multimodal) — text-only fine-tune; VQA at inference uses the base Gemma 4 vision encoder (~150M params), unmodified by our LoRA per the Vertex AI SFT recipe
Converged Loss 0.9218
Benchmark 9/10 (5/5 domain Q&A + 4/5 tool calling) — May 2026 final run, multi-call regression on TQ5 (see Multi-Variant Deployment Validation below)
Training Time 420 seconds (~7 minutes)
Compute Google Colab Pro
License MIT (adapters) / Gemma Terms (base model)

Model Overview

SolarHive E4B is the edge companion to SolarHive 26B A4B. While the 26B model powers cloud inference with full multimodal VQA, the E4B model is optimized for local deployment via Ollama on consumer hardware.

Privacy-first: Running Gemma 4 locally means community energy data never leaves the neighborhood. No cloud dependency, no internet requirement, no data privacy concerns. A village in rural India, a suburb in Michigan, and a coastal town recovering from a hurricane all get the same intelligence.

This repository contains the fully merged model (base + LoRA baked together) — no separate base model download needed.


Training Details

Parameter Value
Method LoRA via Unsloth FastVisionModel (BF16, RTX PRO 6000 96 GB)
LoRA rank 16
LoRA alpha 16
LoRA dropout 0
Target modules All linear layers
Learning rate 2e-4
Optimizer AdamW 8-bit
Warmup steps 5
Epochs 3
Max sequence length 2048
Precision BF16
Seed 3407
Trainable parameters 41.2M / 8.0B (0.51%)

Training Loss

Metric Value
Converged loss (last 20 steps) 0.9218
Final step loss 0.0635
Minimum loss 0.0635
Total steps 324
Training time 420 seconds

Canonical metric: the bolded Converged loss (last 20 steps) is the only smoothed convergence indicator. Final step and Minimum are single-batch point statistics — mini-batch loss is noisy step-to-step, so one easy batch can drop a point estimate well below the rolling-average trend.

Training Data

Same canonical training corpus as the 26B A4B model — solarhive-community-solar-multimodal, 1,727 rows:

  • 413 hand-crafted examples spanning 15+ US cities and 9 energy domains
  • ~1,117 API-grounded examples from live Open-Meteo, PVWatts, OWM, and EIA data
  • 183 tool-calling examples (positive, negative refusals, follow-up clarifications, failure-recovery)
  • 14 image-grounded Q&A turns from 7 manually-labeled Ann Arbor sky photographs

Hardware

  • GPU: NVIDIA RTX PRO 6000 Blackwell Server Edition (102 GB GDDR7 total, 94.97 GB max usable per Unsloth)
  • Platform: Google Colab Pro (G4 VM)

Benchmark Results

Domain Q&A (5/5)

Question Result
Solar production when humidity exceeds 80%? Correct
Battery SOC threshold for grid export? Correct
Home #3 underperforming 22% — diagnostic checklist? Correct
Winter snow on panels — prioritize actions? Correct
Grid frequency 59.8 Hz — microgrid implications? Correct

Note on benchmark history: the 5/5 Q&A above is from the initial 8-question validation harness used during fine-tune development. The canonical headline number is the May 2026 final-run multi-variant validation (10-question parity benchmark) — see below.

Tool inventory + inference-time When2Call validation

solarhive_inference.py exposes 5 tools to the model — all three keyed APIs (OWM_API_KEY, EIA_API_KEY, NREL_API_KEY) actively wired:

Tool API Returns
get_weather(location) OpenWeatherMap (OWM_API_KEY) Temperature, clouds %, wind, humidity, sunrise/sunset
get_solar_production(clouds_pct, temp_f) Open-Meteo GHI (keyless) Production kW, efficiency %, GHI W/m², temp derating
get_battery_state() Community BMS (sim) State of charge, capacity, charging status
get_grid_status() EIA Open Data (EIA_API_KEY) Pricing period, rate/kWh, renewable %, CO2 intensity
get_nrel_pvwatts_baseline() NREL PVWatts v8 (NREL_API_KEY) Annual + current-month typical kWh + avg kW for the 72 kW array

Tool results feed back as a 2-message sequence matching the training distribution: {"role": "assistant", "tool_calls": [...]} then {"role": "tool", "name": "<fn>", "content": json.dumps(result)} per call. Shared across the data-generation pipeline, the fine-tune SFT preprocessing layer, and the inference agentic loop — inference matches training distribution exactly.

When2Call probes. Three held-out probes validate 3 of the 4 failure-mode categories from Ross, H., Mahabaleshwarkar, A. S., & Suhara, Y. (2025). When2Call: When (not) to Call Tools. arXiv:2504.18851 — the paper documents 9–67% tool-hallucination rates on (c)+(d) in untrained community models:

  • (b) "What's the current grid rate?" → expect get_grid_status call (well-specified, in-scope)
  • (c) "How much will a 10 kW array produce today?" → expect follow-up question (does NOT auto-fill location default)
  • (d) "What's the current air quality index in Ann Arbor?" → expect refusal + redirect (does NOT hallucinate a tool)

Models trained without explicit unable-to-answer and follow-up clarification examples typically fail (c) + (d). The SolarHive corpus includes 16 such examples (10 unable-to-answer + 6 follow-up clarification) following the When2Call taxonomy.

Multi-Variant Deployment Validation (Final Run, May 2026) — E4B regression on When2Call (c)

End-to-end inference run on Colab Pro G4. This E4B BF16 merged variant loaded from a local cache (16.9 GB VRAM utilization, ~10 min runtime).

Parity benchmark: 5/5 Q&A + 4/5 tool = 9/10 on the 10-question set — matches the A4B family on the 9 deterministic questions; the single FAIL is the lenient multi-call probe (TQ5 — "Compare today's irradiance forecast across Ann Arbor, Phoenix, and Seattle", min_calls=2) where this variant emitted only 1 get_weather call. Notably, the E4B LoRA + base variant (same fine-tune, applied via Unsloth instead of merged) DOES chain 3 calls on the same probe and scores 10/10 — pattern reproducible across runs.

When2Call probes — measured 2/3 (final run May 2026):

Probe E4B merged behavior Score
(b) "current grid rate?" ✅ Correctly calls get_grid_status PASS
(c) "How much will a 10 kW array produce today?" ❌ Auto-fills location and calls get_solar_production instead of asking back FAIL
(d) "current AQI in Ann Arbor?" ✅ Genuinely disclaims (no fabrication, no tool call) PASS

Cross-variant pattern: the E4B LoRA + base variant is inferred to score 2/3 by mathematical lossless equivalence with this merged variant (the merge step is lossless on weights, so the When2Call decision boundary is identical). The +1/3 W2C delta vs the A4B family (3/3 directly measured on A4B LoRA, inferred-lossless on A4B merged + NF4) is the empirical signature of size-vs-refusal scaling.

Honest finding — size-vs-refusal scaling is real, and was the pre-stated hypothesis. This E4B fine-tune regresses on (c) compared to the A4B LoRA baseline (which scores 3/3). The smaller model with less reasoning depth more readily auto-fills missing parameters when it should ask back — exactly the failure mode Ross et al. 2025 document at 9-67% rates in untrained community models. The fine-tune closes (b)+(d) at this size but doesn't fully close (c).

This was the expected outcome going in, per the official Google Gemma 4 Core docs "Parameter sizes and quantization": "Models with higher parameters and bit counts (higher precision) are generally more capable, but are more expensive to run." E4B's 8B total / 4.5B effective parameters / ~150M vision encoder vs A4B's 25.2B total / 3.8B active (MoE) / ~550M vision encoder reflect a deliberate ~3× capacity gap on the dimension that drives reasoning-heavy refusal/follow-up behavior. The validation confirms the documented scaling — not a defect, but architecture-aware deployment design.

Quantitative reinforcement from Unsloth's published Gemma 4 benchmarks: E4B scores 69.4% on MMLU Pro (vs 26B A4B's 82.6% — a 13.2 pp gap), 52.6% on MMMU Pro (vs 73.8% — 21.2 pp gap), and 42.5% on AIME 2026 (vs 88.3% — a 45.8 pp gap). The AIME math-reasoning gap and MMMU Pro multimodal-reasoning gap directly predict the (c)/(d) When2Call regression we observe here — the smaller model's published reasoning-benchmark deltas scale cleanly into the 2-of-3 behavioral regression vs the A4B baseline. E4B is the right choice for the volume of well-specified, in-scope queries that dominate everyday community-energy interactions; A4B handles the harder reasoning edge cases.

Deployment recommendation: Use this E4B variant for the volume of well-specified, in-scope queries (production estimates, grid pricing, maintenance guidance) where (b)-category routing dominates. Route under-specified or out-of-scope queries to the A4B cloud variant for correct refusal + follow-up behavior. A future fine-tune could increase the E4B follow-up clarification example count (currently 6) and unable-to-answer count (currently 10) to close the gap.


How to Use

Loading with transformers

from transformers import AutoModelForCausalLM, AutoProcessor
import torch

model = AutoModelForCausalLM.from_pretrained(
    "Truthseeker87/solarhive-e4b-ollama",  # This repo (merged safetensors)
    dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
processor = AutoProcessor.from_pretrained(
    "Truthseeker87/solarhive-e4b-ollama",
    trust_remote_code=True,
)

Edge Deployment — use the GGUF repo

For Ollama or llama.cpp on a 16 GB CPU laptop, download the GGUF artifacts instead of trying to import these safetensors:

hf download Truthseeker87/solarhive-e4b-gguf \
  solarhive-e4b-q4_k_m.gguf Modelfile \
  --local-dir ./solarhive-gguf
cd ./solarhive-gguf
ollama create solarhive -f Modelfile
ollama run solarhive "What's the best time to run my dishwasher today?"

The solarhive-e4b-gguf repo also includes the Standard Q4_K_M variant (Colab-produced, Q6_K PLE) and a 992 MB mmproj-BF16.gguf for full multimodal via llama-server --mmproj.

Edge Deployment via Ollama --experimental (≥24 GB RAM only)

If you have ≥24 GB system RAM, you can experimentally import these safetensors directly via Ollama:

git clone https://huggingface.co/Truthseeker87/solarhive-e4b-ollama
cd solarhive-e4b-ollama
cat > Modelfile << 'EOF'
FROM .
SYSTEM "You are SolarHive, an AI energy advisor for a community of 12 homes with rooftop solar and shared battery storage in Ann Arbor, Michigan. Use the available tools to get real-time data before answering. Be specific, reference actual data, and keep responses concise (3-5 sentences)."
PARAMETER temperature 1.0
PARAMETER top_p 0.95
PARAMETER top_k 64
PARAMETER num_ctx 4096
EOF
ollama create solarhive --experimental -f Modelfile
ollama run solarhive "What's the best time to run my dishwasher today?"

OOM warning: on 16 GB RAM, ollama create --experimental crashes around 44% blob processing as Ollama tries to materialize the full 16 GB BF16 model in memory during ingestion. Use the GGUF path above instead.

The official base (non-fine-tuned) E4B is also available as a pre-built GGUF on ollama.com/library/gemma4:e4b (9.6 GB, Q4_K_M). Our solarhive-e4b-gguf adds 1,727 examples of community solar domain expertise on top.

GGUF Conversion via llama.cpp (reproducibility recipe)

These safetensors are the source artifact for the GGUF deployment. Reproducible via llama.cpp tooling:

# Text tower → BF16 GGUF (~14 GB intermediate)
python convert_hf_to_gguf.py --outtype bf16 \
  --outfile solarhive-e4b-bf16.gguf \
  Truthseeker87/solarhive-e4b-ollama/

# Quantize text → Q4_K_M with PLE override for 16 GB hardware (~4.6 GB)
llama-quantize \
  --tensor-type per_layer_token_embd.weight=q4_0 \
  solarhive-e4b-bf16.gguf solarhive-e4b-q4_k_m.gguf Q4_K_M

# Multimodal projector (vision SigLIP + audio Conformer, ~992 MB)
python convert_hf_to_gguf.py --mmproj --outtype bf16 \
  --outfile mmproj-solarhive-e4b-BF16.gguf \
  Truthseeker87/solarhive-e4b-ollama/

The standard Q4_K_M variant (without --tensor-type override, ~5.3 GB) requires ≥32 GB RAM at quantization time — see the solarhive_quantize_e4b.ipynb notebook for the high-RAM recipe.


Core Capabilities

1. Multimodal Visual Question Answering (3 Modes)

Available because the base Gemma 4 E4B vision encoder (~150M params) is preserved unmodified in these merged weights:

Mode Input Output
Sky Analysis Sky photograph Cloud coverage %, production forecast, storage recommendation
Panel Inspection Panel photograph Dirt/damage/shading detection, efficiency impact estimate
Neighborhood Assessment Aerial/satellite image Panel inventory, expansion priorities, shading analysis

2. Native Function Calling (5 Tools — all 3 keyed APIs wired)

Tool API Returns
get_weather(location) OpenWeatherMap (OWM_API_KEY) Temperature, clouds %, wind, humidity, sunrise/sunset
get_solar_production(clouds_pct, temp_f) Open-Meteo GHI (keyless) Production kW, efficiency %, GHI W/m², temp derating
get_battery_state() Community BMS (sim) State of charge, capacity, charging status
get_grid_status() EIA Open Data (EIA_API_KEY) Pricing period, rate/kWh, renewable %, CO2 intensity
get_nrel_pvwatts_baseline() NREL PVWatts v8 (NREL_API_KEY) Annual + current-month typical kWh + avg kW for the 72 kW array

3. Selective Tool Reasoning

The model decides when to call tools — it does not blindly invoke all of them:

"What time does peak pricing start?"
→ Calls: get_grid_status() only

"Should I run my pool heater now?"
→ Calls: get_weather() + get_solar_production() + get_battery_state() + get_grid_status()

"What are general maintenance tips for panels?"
→ Calls: none (answers from training knowledge)

Community Model

Parameter Value
Location Ann Arbor, Michigan (42.2808°N, 83.7430°W)
Community size 12 homes
Total panel capacity 72 kW
Shared battery storage 100 kWh
Grid region MISO (Midcontinent Independent System Operator)

Technical Notes

  • Merged BF16 safetensors. Base + LoRA fused via Unsloth save_pretrained_merged("merged_16bit"). Loads with plain transformers.AutoModelForCausalLM.from_pretrained(...) — no PEFT or Unsloth dependency at inference time.
  • Vision tower frozen during fine-tune. VQA at inference uses the base model's pretrained vision encoder unmodified, matching the Vertex AI SFT recipe which freezes both vision and audio towers during text-focused fine-tuning.
  • Two-step tokenization at inference. Single-step tokenize=True crashes in transformers 5.5.x on messages without a content key (e.g., tool_calls messages). Always render text first (tokenize=False) then tokenize separately.
  • Sampling defaults. temperature=1.0, top_p=0.95, top_k=64 (Kaggle-recommended Gemma 4 defaults).
  • Chat template. gemma-4 (per Unsloth Tip #1 for E2B/E4B). The gemma-4-thinking template is reserved for 26B/31B reasoning-class variants. The simpler template is more robust across downstream Ollama / llama.cpp runtimes that don't expose enable_thinking=False at the runtime layer.

Limitations

  • Prototype scope. Tested on a single community model (12 homes, Ann Arbor, MI). Real-world deployment requires validation across diverse geographies and community sizes.
  • Smaller model, weaker refusal/follow-up. When2Call (c) regression vs the A4B baseline (2/3 vs 3/3 — see Multi-Variant Deployment Validation above). Route under-specified or out-of-scope queries to the A4B cloud variant for correct refusal + follow-up behavior.
  • Occasional capacity hallucination. The base model's prior occasionally surfaces "60 kW" instead of the correct 72 kW community capacity in direct (no-tool) responses. The tool-calling path (which queries actual capacity from get_nrel_pvwatts_baseline) avoids this.
  • External API dependence. Tool responses depend on Open-Meteo, OWM, EIA, and PVWatts availability with their respective rate limits.
  • Battery state is simulated. get_battery_state() is a deterministic in-memory simulator for demonstrations — real deployment requires integration with actual battery management systems.
  • Single-trial multi-variant validation. The May 2026 final-run benchmark numbers are from one inference pass; a multi-trial bootstrap would strengthen the multi-call regression claim against temperature-1.0 stochasticity.
  • Memory. ~16 GB BF16 safetensors require ≥24 GB system RAM at load time — does not fit on consumer 16 GB laptops in this format. For 16 GB laptops, use the solarhive-e4b-gguf Q4_K_M variant.

Future Iteration — Multi-Token Prediction (MTP) Drafters

Not in the measured numbers above. Google announced Gemma 4 MTP drafters on May 5, 2026 (blog, overview, HF collection, Kaggle, @GoogleGemma) — after this artifact's final benchmark was captured. The benchmarks above reflect standard autoregressive decoding only. MTP integration is documented here as future iteration; no measured speedup is claimed in this release.

Theoretical foundation. Speculative decoding (Leviathan, Kalman & Matias, ICML 2023, arXiv:2211.17192) accelerates generation without changing the output distribution under argmax decoding: a smaller drafter proposes γ candidate tokens, the target verifies all γ in a single parallel forward pass, accepted tokens are kept, and any rejection is resampled from a corrected distribution. The output distribution is preserved exactly regardless of drafter quality; only acceptance rate α, and therefore walltime speedup, varies.

What Google released on May 5, 2026. Paired drafter checkpoints for all four IT-tuned Gemma 4 variants — gemma-4-E2B-it-assistant, gemma-4-E4B-it-assistant, gemma-4-26B-A4B-it-assistant, gemma-4-31B-it-assistant — discoverable via the google/gemma-4 Hugging Face collection and on Kaggle Models. The drafters share the input embedding table with their paired target and consume the target's last-layer activations (architecture per the MTP overview). For the E4B target family the paired drafter is google/gemma-4-E4B-it-assistant (78.8 M params). Google reports up to 3× decode speedup with no quality degradation on the headline 26B-A4B configuration and **2.2×** on Apple Silicon at batch sizes 4–8; per-variant E4B numbers were not enumerated in the announcement. Tested runtimes named in the blog: LiteRT-LM, MLX, Hugging Face Transformers, vLLM, SGLang, Ollama.

Integration via Hugging Face Transformers is a plain-AutoModelForCausalLM two-line load plus one extra kwarg:

target    = AutoModelForCausalLM.from_pretrained("Truthseeker87/solarhive-e4b-ollama",        dtype=torch.bfloat16, ...)
assistant = AutoModelForCausalLM.from_pretrained("google/gemma-4-E4B-it-assistant",          dtype=torch.bfloat16, ...)
target.generate(**inputs, assistant_model=assistant)  # MTP enabled

The merged-safetensors load path on this repo is the cleanest E4B integration surface — no PEFT/Unsloth wrapping; the Hugging Face Transformers assistant_model= kwarg works directly.

Open question specific to this LoRA-merged BF16 target. Per the 2023 speculative-sampling guarantee, correctness is invariant to drafter quality — the target's verification step preserves the exact output distribution regardless of what the drafter proposes. What varies is acceptance rate α, since Google's released drafter was trained against the base gemma-4-E4B-it, not against this LoRA-merged target. Measured α at the edge BF16 tier is the planned post-hackathon contribution; a cloud-tier measurement against the A4B merged target is captured by the gated future-iteration cell in solarhive_inference.py §14.


Companion Repositories

Model Repository Purpose
SolarHive 26B A4B LoRA solarhive-26b-a4b-lora Cloud inference with full multimodal + function calling (LoRA adapters)
SolarHive 26B A4B Merged solarhive-26b-a4b-merged Full BF16 cloud model (~48 GB) — production inference, no PEFT/Unsloth dep
SolarHive 26B A4B NF4 solarhive-26b-a4b-nf4 Pre-quantized 4-bit cloud model for HF Spaces / 24 GB+ GPUs
SolarHive E4B LoRA solarhive-e4b-lora E4B adapter weights (~200 MB) — apply over base via Unsloth
SolarHive E4B safetensors This repo Source safetensors for transformers research / GGUF conversion via llama.cpp
SolarHive E4B GGUF solarhive-e4b-gguf Edge deployment — Q4_K_M GGUF + mmproj for Ollama / llama.cpp on 16 GB CPU laptop. 10/10 benchmark.
SolarHive Dataset solarhive-community-solar-multimodal 1,727 training examples (1,713 text + 14 image-grounded)
LiteRT-LM Python edge runtime solarhive_e4b_litert_v3.1.ipynb LiteRT Special Tech Track entry — runs upstream base litert-community/gemma-4-E4B-it-litert-lm .litertlm (3.66 GB) + SolarHive UX layer + on-device agentic loop with native Gemma 4 function calling. Q&A 8/8 on Colab Pro CPU + High-RAM. Fine-tuned LiteRT-LM bundle is a planned next iteration once upstream gemma4 example module lands in ai_edge_torch.generative.examples/.
GitHub the-gemma4-good-hackathon-solarhive Full source code, training & quantization notebooks, data principles

Fine-Tuning Architecture — Text-Only on the Multimodal-Capable Corpus

The shipped fine-tune is text-only on the canonical solarhive-community-solar-multimodal corpus (1,727 rows = 1,713 text + 14 image-grounded). Image rows are skipped at the data-prep layer; the training pipeline pre-renders only text rows for TRL's default text collator. Multimodal fine-tuning is deferred post-hackathon — a real image corpus and a held-out VQA benchmark would be prerequisites; the dataset's image schema is preserved so a future multimodal fine-tune can re-enable image rows without changing the corpus.

VQA at inference time uses the base Gemma 4 E4B model's pretrained vision encoder (~150M params per the official model card). Our LoRA targets only the language-model linear layers (target=all-linear); the vision tower is not modified. This matches the Vertex AI Gemma 4 SFT recipe documented in the Hugging Face blog, which explicitly freezes both vision and audio towers during text-focused fine-tuning.

Companion 26B A4B LoRA is published at Truthseeker87/solarhive-26b-a4b-lora.

The dataset uses the project archive for its 14 image-grounded Q&A turns (7 Ann Arbor sky photos × 2 turns). Image-source planning pivoted twice: the SWIM corpora (NUS) were rejected for CC BY-NC licensing, and NREL SRRL was rejected because the legacy MIDC SkyCam image archive ended May 2017 (modern ASI-16 only exposes derived measurements). The shipped dataset uses the project archive only — fewer images, but every label is human-confirmed and every paired Q&A traces back to the same GHI / temperature-derating formula used elsewhere in the dataset.

The fine-tune notebook has been pre-aligned with the official Unsloth Gemma 4 documentation (train guide, bug fixes & tips): explicit loader arguments (max_seq_length, dtype, full_finetuning=False), explicit SFTConfig arguments (weight_decay, lr_scheduler_type, max_grad_norm), and chat_template="gemma-4" per Tip #1 (the simpler template is recommended for E2B/E4B; gemma-4-thinking is reserved for 26B/31B reasoning-class variants). The change makes the embedded chat template more robust across downstream Ollama / llama.cpp runtimes that don't expose enable_thinking=False at the runtime layer.


Citation

@misc{solarhive2026,
  title={SolarHive: AI-Powered Community Solar Energy Intelligence},
  author={Youshen Lim},
  year={2026},
  url={https://github.com/youshen-lim/the-gemma4-good-hackathon-solarhive},
  note={Gemma 4 Good Hackathon submission — Google DeepMind x Kaggle}
}

Links

Built with Gemma 4 in Ann Arbor, Michigan. May 2026.

Gemma is a trademark of Google LLC.

Downloads last month
41
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Truthseeker87/solarhive-e4b-ollama

Papers for Truthseeker87/solarhive-e4b-ollama

Evaluation results