my2000cup
/

Gaia-LLM-4B

+---
+library_name: transformers
+license: other
+base_model: Qwen/Qwen3-4B
+tags:
+- llama-factory
+- full
+- generated_from_trainer
+model-index:
+- name: train_2025-05-04-15-25-21
+  results: []
+---
+# train_2025-05-02-18-36-44
+This model is a fine-tuned version of [../pretrained/Qwen3-4B](https://huggingface.co/../pretrained/Qwen3-1.7B) on the wikipedia_zh and the petro_books datasets.
+## Model description
+Gaia-Petro-LLM is a large language model specialized in the oil and gas industry, fine-tuned from Qwen/Qwen3-4B. It was further pre-trained on a curated 20GB corpus of petroleum engineering texts, including technical documents, academic papers, and domain literature. The model is designed to support domain experts, researchers, and engineers in petroleum-related tasks, providing high-quality, domain-specific language understanding and generation.
+## Model Details
+Base Model: Qwen/Qwen3-4B
+Domain: Oil & Gas / Petroleum Engineering
+Corpus Size: ~20GB (petroleum engineering)
+Languages: Primarily Chinese; domain-specific English supported
+Repository: my2000cup/Gaia-Petro-LLM
+## Intended uses & limitations
+Technical Q&A in petroleum engineering
+Document summarization for oil & gas reports
+Knowledge extraction from unstructured domain texts
+Education & training in oil & gas technologies
+Not suitable for general domain tasks outside oil & gas.
+May not be up to date with the latest industry developments (post-2023).
+Not to be used for critical, real-time decision-making without expert review.
+## Training and evaluation data
+The model was further pre-trained on an in-house text corpus (~20GB) collected from:
+Wikipedia (Chinese, petroleum-related entries)
+Open petroleum engineering books and literature
+Technical standards and manuals
+## Usage
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+# Replace with your model repository
+model_name = "my2000cup/Gaia-LLM-4B"
+# Load tokenizer and model
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype="auto",
+    device_map="auto"
+)
+# Prepare a petroleum engineering prompt
+prompt = "What are the main challenges in enhanced oil recovery (EOR) methods?"
+messages = [
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True,
+    enable_thinking=True  # Optional: enables model's 'thinking' mode
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+# Generate the model's response
+generated_ids = model.generate(
+    **model_inputs,
+    max_new_tokens=1024  # adjust as needed
+)
+output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
+# Optional: parse 'thinking' content, if your template uses it
+try:
+    # Find the index of the </think> token (ID may differ in your tokenizer!)
+    think_token_id = 151668  # double-check this ID in your tokenizer
+    index = len(output_ids) - output_ids[::-1].index(think_token_id)
+except ValueError:
+    index = 0
+thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
+content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
+print("Thinking content:", thinking_content)
+print("Answer:", content)
+```
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 2e-05
+- train_batch_size: 1
+- eval_batch_size: 8
+- seed: 42
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 8
+- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 16
+- num_epochs: 3.0
+### Training results
+### Framework versions
+- Transformers 4.51.3
+- Pytorch 2.6.0+cu124
+- Datasets 3.5.0
+- Tokenizers 0.21.1