Gemma-3-12B Lithology Description (Russian)
LoRA adapter for google/gemma-3-12b-it fine-tuned on geological core lithology descriptions in Russian.
Model Description
This model generates professional lithological descriptions of drill core samples for mining/drilling operations. Trained on expert geological descriptions from the Zhosabay (Жосабай) deposit in Kazakhstan.
Training Details
| Parameter | Value |
|---|---|
| Base Model | google/gemma-3-12b-it (12.19B params) |
| Method | SFT with LoRA |
| LoRA r | 16 |
| LoRA alpha | 32 |
| Target Modules | q_proj, v_proj, k_proj, o_proj |
| Dataset Size | 1,664 samples (1,497 train / 167 eval) |
| Epochs | 4 |
| Learning Rate | 1e-5 |
| Batch Size | 8 (effective) |
| Hardware | 4x NVIDIA RTX 5090 |
| Training Time | ~37 minutes |
Metrics
| Metric | Value |
|---|---|
| Eval Loss | 1.121 |
| Eval Accuracy | 75.1% |
| Train Loss | 0.34 |
Usage
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
# Load base model and adapter
base_model = "google/gemma-3-12b-it"
adapter = "tauken/gemma-3-12b-lithology-ru"
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(
base_model,
torch_dtype=torch.bfloat16,
device_map="auto"
)
model = PeftModel.from_pretrained(model, adapter)
model.eval()
# Generate description
prompt = "Ты опытный геолог-эксперт. Опиши литологию керна. Скважина: ZHS_028_25, интервал: 94.5-101.9м."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(response)
Example Output
Input: Скважина ZHS_028_25, интервал 94.5-101.9м
Output:
В интервале 94.5-101.9м: Андезитовые порфириты от темно-коричнево-зеленого до серого цвета
с зеленоватым оттенком. Структура порфировая, текстура массивная. Основная масса
скрытокристаллическая, порфировые включения выполнены плагиоклазом таблитчатых,
полуокруглых форм размерами до 5х12мм, составляющие около 40-50% от общей массы.
Породы слабо, умеренно трещиноватые в количестве до 5-7 трещин на п.м., преимущественно
ориентированные под углом 30-70° к оси керна. Вторичные изменения представлены слабым
окварцеванием и серицитизацией по массе.
Рудная минерализация выполнена пиритом в виде вкраплений по массе.
Output Structure
The model follows standard geological description format:
- Lithotype & Color - Rock type identification (андезитовые порфириты)
- Texture & Structure - Porphyritic structure, phenocryst description
- Fracturing - Fracture density per running meter, angles to core axis
- Secondary Alterations - Veining, alteration minerals (окварцевание, серицитизация)
- Mineralization - Ore minerals if present (пирит)
- Contacts - Description of boundaries with other intervals
Limitations
⚠️ Important limitations:
- Text-only model - Does not process images (for VLM version, see upcoming release)
- May occasionally hallucinate depth intervals outside the requested range
- Quantitative estimates (sizes, percentages) may be approximate
- Best used as an assistant with mandatory expert verification
- Not production-ready for autonomous use
Training Data
Geological descriptions from Zhosabay copper-gold deposit (Kazakhstan):
- Well intervals with detailed lithological descriptions in Russian
- Expert-written by professional geologists
- Rock types: andesite porphyrites, metasomatites, weathering crusts
- Features: fracturing, veining, alteration, mineralization
Intended Use
✅ Recommended uses:
- Drafting lithological descriptions for geologist review
- Accelerating core logging workflows
- Training geological terminology
❌ Not recommended:
- Autonomous geological reporting without expert review
- Critical mining decisions based solely on model output
License
This model inherits the Gemma license from the base model.
Citation
@misc{tauken-lithology-2025,
author = {Tauken Team},
title = {Gemma-3-12B Lithology Description Model},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/tauken/gemma-3-12b-lithology-ru}
}
Framework Versions
- PEFT: 0.18.1
- Transformers: 4.51+
- PyTorch: 2.11.0
- TRL: 0.27.0
- Downloads last month
- 1