Qwen3-0.6B Information Extractor
Model Details
Model Name: Qwen3-0.6B Information Extractor
Base Model: Qwen/Qwen3-0.6B
Fine-tuning Method: LoRA (Low-Rank Adaptation)
Framework: Transformers, PEFT, TRL
License: Apache 2.0
Model Description
This is a fine-tuned version of the Qwen3-0.6B model optimized for information extraction tasks. The model has been adapted using parameter-efficient LoRA fine-tuning to extract structured information from unstructured text while maintaining lightweight inference requirements.
The base Qwen3-0.6B is a 600M-parameter instruction-following language model with excellent performance-to-size ratio, making it ideal for resource-constrained environments.
Model Architecture
- Base Model: Qwen3-0.6B
- Fine-tuning Technique: LoRA
- LoRA Rank (r): 16
- LoRA Alpha: 32
- LoRA Dropout: 0.05
- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Quantization: 4-bit (NF4) for memory efficiency
- Task Type: Causal Language Modeling
Training Details
Training Data
- Examples: 159 training samples
- Format: Chat-based instruction-response pairs (JSON with messages array)
- Domain: Information extraction
Training Configuration
- Epochs: 3
- Batch Size: 1 (per device) + 16 gradient accumulation steps = effective batch size of 16
- Learning Rate: 2e-4
- Optimizer: Paged AdamW 8-bit
- Scheduler: Cosine with 50 warmup steps
- Loss Function: Causal Language Modeling cross-entropy
- Device: NVIDIA GPU (Kaggle environment)
- Precision: float16 (no mixed precision for stability)
Checkpoints
- Saved at each epoch
- Kept best 2 checkpoints based on save_strategy
Intended Use
This model is designed for:
✅ Information Extraction: Extract structured data from unstructured text
✅ Instruction Following: Following extraction instructions in natural language
✅ Lightweight Inference: Deploy in resource-constrained environments
✅ Fine-tuning Base: Use as a starting point for further domain-specific adaptation
How to Use
Option 1: Using LoRA Adapter (Recommended)
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3-0.6B",
device_map="auto",
trust_remote_code=True
)
# Load LoRA adapter
model = PeftModel.from_pretrained(
base_model,
"your-hf-username/qwen3-0.6b-info-extractor"
)
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
"Qwen/Qwen3-0.6B",
trust_remote_code=True
)
# Inference
messages = [
{"role": "system", "content": "You are a strict information extractor. Extract all requested information from the text. Return JSON format."},
{"role": "user", "content": "Extract the person's name and email from: John Smith works at john@company.com"}
]
inputs = tokenizer.apply_chat_template(
messages,
return_tensors="pt",
add_generation_prompt=True
).to(model.device)
outputs = model.generate(
inputs,
max_new_tokens=256,
temperature=0.7,
top_p=0.9
)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)
Option 2: Using Merged Model
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load merged model (no LoRA required)
model = AutoModelForCausalLM.from_pretrained(
"your-hf-username/qwen3-0.6b-info-extractor-merged",
device_map="auto",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
"your-hf-username/qwen3-0.6b-info-extractor-merged",
trust_remote_code=True
)
# Use exactly as above
Model Performance
Training Metrics
- Loss convergence: Achieved steady decrease over 3 epochs
- Training time: ~20-30 minutes on Kaggle GPU
- Memory usage: ~4GB with 4-bit quantization
Evaluation
Two samples from training data were tested during final validation. The model successfully extracted information following the instruction format.
Note: Full benchmark evaluation on a held-out test set is recommended for production use.
Limitations
- Training size: Fine-tuned on only 159 examples. Larger, more diverse datasets may improve generalization.
- Domain specificity: Optimized for the training data domain. Performance may vary on out-of-domain text.
- Model size: 600M parameters may have reduced capability compared to larger models.
- Quantization: 4-bit quantization may slightly impact output quality compared to full precision.
- No extensive evaluation: Limited evaluation on held-out test set.
Ethical Considerations
This model inherits considerations from the base Qwen3-0.6B model:
- Bias: May contain biases from training data
- Misuse: Could be used for unauthorized data extraction
- Hallucination: May generate plausible-sounding but incorrect information
- Limitations: Should not be used for critical applications without human review
Environmental Impact
This lightweight 600M model has minimal environmental footprint compared to larger models:
- 4-bit quantization reduces memory requirements
- LoRA fine-tuning is parameter-efficient
- Suitable for edge deployment and inference
Citation
@misc{qwen3-0.6b-info-extractor,
title={Qwen3-0.6B Information Extractor},
author={Your Name},
year={2026},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/your-username/qwen3-0.6b-info-extractor}}
}
References
Model Card Contact
For questions or issues, please open an issue on the model repository.
Last Updated: January 2026
Training Infrastructure: Kaggle (GPU T4/P100)
Developed with: Transformers, PEFT, TRL
- Downloads last month
- 17