Instructions to use disham993/electrical-embeddinggemma-ir_q8_0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use disham993/electrical-embeddinggemma-ir_q8_0 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="disham993/electrical-embeddinggemma-ir_q8_0", filename="embeddinggemma-300m.Q8_0.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use disham993/electrical-embeddinggemma-ir_q8_0 with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf disham993/electrical-embeddinggemma-ir_q8_0:Q8_0 # Run inference directly in the terminal: llama-cli -hf disham993/electrical-embeddinggemma-ir_q8_0:Q8_0
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf disham993/electrical-embeddinggemma-ir_q8_0:Q8_0 # Run inference directly in the terminal: llama-cli -hf disham993/electrical-embeddinggemma-ir_q8_0:Q8_0
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf disham993/electrical-embeddinggemma-ir_q8_0:Q8_0 # Run inference directly in the terminal: ./llama-cli -hf disham993/electrical-embeddinggemma-ir_q8_0:Q8_0
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf disham993/electrical-embeddinggemma-ir_q8_0:Q8_0 # Run inference directly in the terminal: ./build/bin/llama-cli -hf disham993/electrical-embeddinggemma-ir_q8_0:Q8_0
Use Docker
docker model run hf.co/disham993/electrical-embeddinggemma-ir_q8_0:Q8_0
- LM Studio
- Jan
- Ollama
How to use disham993/electrical-embeddinggemma-ir_q8_0 with Ollama:
ollama run hf.co/disham993/electrical-embeddinggemma-ir_q8_0:Q8_0
- Unsloth Studio new
How to use disham993/electrical-embeddinggemma-ir_q8_0 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for disham993/electrical-embeddinggemma-ir_q8_0 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for disham993/electrical-embeddinggemma-ir_q8_0 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for disham993/electrical-embeddinggemma-ir_q8_0 to start chatting
- Docker Model Runner
How to use disham993/electrical-embeddinggemma-ir_q8_0 with Docker Model Runner:
docker model run hf.co/disham993/electrical-embeddinggemma-ir_q8_0:Q8_0
- Lemonade
How to use disham993/electrical-embeddinggemma-ir_q8_0 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull disham993/electrical-embeddinggemma-ir_q8_0:Q8_0
Run and chat with the model
lemonade run user.electrical-embeddinggemma-ir_q8_0-Q8_0
List all available models
lemonade list
electrical-embeddinggemma-ir_q8_0
Model Description
This model is the GGUF q8_0 (8-bit quantized) variant of the gemma-300m-electrical-electronics-ir family, fine-tuned from unsloth/embeddinggemma-300m for dense Information Retrieval (IR) in the electrical and electronics engineering domain. This build offers a near-lossless quality at roughly half the size of the f16 GGUF (~329 MB vs ~612 MB), making it ideal for deployments where high accuracy and moderate memory footprint are both important.

Training Data
The model was trained on the disham993/ElectricalElectronicsIR dataset — 20,000 question-passage pairs covering electrical engineering, electronics, power systems, and communications.
- 16k train / 2k validation / 2k test
- Queries: 133–822 characters; passages: 586–5,590 characters
- Topics include phased array antennas, IEC 61850 protocols, Josephson junctions, OTDR measurements, MIMO channel estimation, FPGA partial reconfiguration, and more
Model Details
| Base Model | unsloth/embeddinggemma-300m (308M params) |
| Format | GGUF q8_0 (8-bit quantization) |
| Task | Feature Extraction (Dense IR / Semantic Search) |
| Language | English (en) |
| Dataset | disham993/ElectricalElectronicsIR |
| Approx. size | ~329 MB |
| Backend | llama.cpp / llama-cpp-python |
| License | MIT |
Training Procedure
Training Hyperparameters
| Method | LoRA via Unsloth's FastSentenceTransformer, exported to GGUF q8_0 |
| LoRA rank / alpha | r=32, α=64 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Loss | MultipleNegativesRankingLoss (in-batch negatives) |
| Batch size | 128 per device × 2 gradient accumulation = 256 effective |
| Learning rate | 2e-5 (linear schedule, 3% warmup) |
| Max steps | 100 |
| Max sequence length | 1024 |
| Precision | bf16 (training) → q8_0 GGUF (export) |
| Batch sampler | NO_DUPLICATES |
| Hardware | NVIDIA RTX 5090 |
Evaluation Results
Evaluated on the held-out test split (2,000 queries) of disham993/ElectricalElectronicsIR using sentence_transformers.evaluation.InformationRetrievalEvaluator.
| Model | MAP@100 | NDCG@10 | MRR@10 | Recall@10 |
|---|---|---|---|---|
unsloth/embeddinggemma-300m (baseline) |
0.5753 | 0.6221 | 0.5682 | 0.7925 |
electrical-embeddinggemma-ir_lora |
0.9795 | 0.9847 | 0.9795 | 1.0000 |
electrical-embeddinggemma-ir_finetune_16bit |
0.9797 | 0.9849 | 0.9797 | 1.0000 |
electrical-embeddinggemma-ir_f16 |
0.9849 | 0.9887 | 0.9849 | 0.9995 |
electrical-embeddinggemma-ir_q8_0 (this model) |
0.9844 | 0.9883 | 0.9844 | 0.9995 |
electrical-embeddinggemma-ir_q4_k_m |
0.9841 | 0.9879 | 0.9840 | 0.9990 |
electrical-embeddinggemma-ir_q5_k_m |
0.9824 | 0.9866 | 0.9823 | 0.9990 |
MAP@100 delta vs f16: only −0.0005. Near-lossless at ~2× smaller file size.
Usage
LM Studio (OpenAI-compatible API)
Load this model in LM Studio and use it via the built-in OpenAI-compatible server:
from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:1234/v1", api_key="lm-studio")
texts = [
"What is impedance matching?",
"Impedance matching maximises power transfer by equalising source and load impedance.",
"An LLC resonant converter achieves zero-voltage switching using an LC tank circuit.",
]
response = client.embeddings.create(
model="text-embedding-electrical-embeddinggemma-ir",
input=texts,
)
for item in response.data:
print(f"[{item.index}] dim={len(item.embedding)} first5={item.embedding[:5]}")
llama-cpp-python
# Install dependencies
pip install huggingface_hub
CMAKE_ARGS="-DGGML_CUDA=on" FORCE_CMAKE=1 pip install llama-cpp-python # (For NVIDIA GPU acceleration)
import torch
import torch.nn.functional as F
from huggingface_hub import hf_hub_download, HfApi
from llama_cpp import Llama
class DummyModelCardData:
def set_evaluation_metrics(self, *args, **kwargs): pass
class GGUFEmbeddingWrapper:
def __init__(self, repo_id):
self.repo_id = repo_id
# Automatically detect the GGUF file in the repo
api = HfApi()
files = api.list_repo_files(repo_id)
gguf_file = next((f for f in files if f.endswith('.gguf')), None)
if not gguf_file: raise ValueError(f"No .gguf file found in disham993/electrical-electronics-gemma-ir_q8_0")
print(f"Downloading/Using {gguf_file} from disham993/electrical-electronics-gemma-ir_q8_0...")
model_path = hf_hub_download(repo_id=repo_id, filename=gguf_file)
self.llm = Llama(
model_path=model_path,
embedding=True, # CRITICAL: Required for dense extraction
n_gpu_layers=-1, # Offload completely to GPU (Optional)
n_ctx=1024, # Constrain context window
verbose=False
)
self.dtype = torch.float16
self.model_card_data = DummyModelCardData() # Bypasses evaluator metadata crashes
def encode(self, sentences, batch_size=None, **kwargs):
convert_to_tensor = kwargs.pop('convert_to_tensor', True)
if isinstance(sentences, str): sentences = [sentences]
# Handling list of dicts for corpus evaluations
if isinstance(sentences, list) and len(sentences) > 0 and isinstance(sentences[0], dict):
sentences = [(doc.get("title", "") + " " + doc.get("text", "")).strip() for doc in sentences]
embeddings = []
for text in sentences:
res = self.llm.create_embedding(text)
embeddings.append(res['data'][0]['embedding'])
tensors = torch.tensor(embeddings, dtype=torch.float32)
if convert_to_tensor:
if torch.cuda.is_available(): tensors = tensors.cuda()
return tensors
return tensors.cpu().numpy()
# Dynamic alias interceptor to satisfy strict evaluator engines
def __getattr__(self, name):
if name.startswith("encode_"):
def wrapper(*args, **kwargs):
kwargs['convert_to_tensor'] = True
return self.encode(*args, **kwargs)
return wrapper
raise AttributeError(f"'{self.__class__.__name__}' object has no attribute '{name}'")
# === SEMANTIC SEARCH EXAMPLE ===
if __name__ == "__main__":
# Boot the wrapper dynamically against this Hub Repo
model = GGUFEmbeddingWrapper("disham993/electrical-electronics-gemma-ir_q8_0")
query = "How do transformers step up voltage?"
# A miniature corpus of 10 engineering documents
documents = [
"Ohm's law defines the relationship between voltage, current, and resistance.",
"AC circuits use alternating current which changes direction periodically.",
"A step-up transformer has more turns on its secondary coil than its primary, increasing voltage.",
"Capacitors store electrical energy in an electric field.",
"Inductors resist changes in electric current passing through them.",
"Transformers operate on Faraday's law of induction to transfer energy between circuits.",
"Diodes allow current to pass in only one direction.",
"Voltage is the electric potential difference between two points.",
"A step-down transformer decreases voltage for safe residential use.",
"Power is the rate at which electrical energy is transferred by a circuit."
]
print("Embedding query and documents...")
query_emb = model.encode(query)
doc_embs = model.encode(documents)
similarities = F.cosine_similarity(query_emb, doc_embs)
top_3_idx = torch.topk(similarities, k=3).indices.tolist()
print(f"\n--- Top 3 Documents for Query: '{query}' ---")
for rank, idx in enumerate(top_3_idx, 1):
print(f"Rank {rank} (Score: {similarities[idx]:.4f}) | {documents[idx]}")
Limitations and Bias
While this model performs exceptionally well in the electrical and electronics engineering domain, it is not designed for use in other domains. Additionally, it may:
- Underperform on queries that mix electrical engineering with unrelated domains (e.g., biomedical, legal, financial)
- Show reduced performance on non-English text or highly colloquial phrasing
- Require
llama-cpp-pythonwith CUDA support for GPU-accelerated inference; CPU inference is supported but slower
This model is intended for research, educational, and production IR applications in the electrical engineering domain.
Training Infrastructure
For the complete fine-tuning and evaluation pipeline — from data loading to GGUF export — refer to the GitHub repository and the notebooks Finetuning_EmbeddingGemma_EEIR_RTX_5090.ipynb and Evaluate_All_Models.ipynb.
Last Update
2026-04-18
Citation
@misc{electrical-embeddinggemma-ir,
author = {disham993},
title = {Electrical \& Electronics Engineering Embedding Models},
year = {2026},
howpublished = {\url{https://huggingface.co/collections/disham993/electrical-and-electronics-engineering-embedding-models}},
}
- Downloads last month
- 213
8-bit
Model tree for disham993/electrical-embeddinggemma-ir_q8_0
Base model
unsloth/embeddinggemma-300m