|
|
--- |
|
|
library_name: peft |
|
|
license: apache-2.0 |
|
|
base_model: Qwen/Qwen2.5-1.5B-Instruct |
|
|
tags: |
|
|
- base_model:adapter:Qwen/Qwen2.5-1.5B-Instruct |
|
|
- llama-factory |
|
|
- lora |
|
|
- transformers |
|
|
- question-generation |
|
|
- education |
|
|
- secondary-school |
|
|
pipeline_tag: text-generation |
|
|
model-index: |
|
|
- name: question_generation_1.5B_model_v2 |
|
|
results: [] |
|
|
--- |
|
|
|
|
|
# Question Generation 1.5B Model v2 |
|
|
|
|
|
A fine-tuned language model specifically designed to generate high-quality English comprehension and assessment questions for secondary school students. This model is optimized to create questions aligned with standard educational curricula and learning objectives. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model is a LoRA (Low-Rank Adaptation) fine-tuned version of [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct). It has been trained specifically on educational question generation tasks to produce contextually relevant, pedagogically sound questions suitable for secondary school assessment. |
|
|
|
|
|
### Key Features |
|
|
|
|
|
- **Lightweight and Efficient**: 1.5B parameters with LoRA adaptation for fast inference |
|
|
- **Education-Focused**: Trained on curated educational content |
|
|
- **Curriculum-Aligned**: Questions follow standard secondary school curricula and learning outcomes |
|
|
- **Question Variety**: Capable of generating multiple question types (multiple choice, short answer, essay prompts, etc.) |
|
|
- **Context-Aware**: Generates questions based on provided text passages or topics |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
This model is intended for: |
|
|
- **Educational Content Creation**: Generating practice questions and assessments for secondary school students |
|
|
- **Curriculum Support**: Creating supplementary learning materials aligned with educational standards |
|
|
- **Assessment Design**: Assisting educators in developing comprehension questions and quiz content |
|
|
- **Language Learning**: Generating English language proficiency assessment questions |
|
|
|
|
|
### Limitations |
|
|
|
|
|
- Designed for English language question generation |
|
|
- Best performance on secondary school level content (ages 14-18) |
|
|
- May require post-processing or human review for use in high-stakes assessments |
|
|
- Performance may vary with non-English text inputs |
|
|
|
|
|
## Training Data |
|
|
|
|
|
The model was fine-tuned on a curated dataset of secondary school English curriculum materials and assessment question templates. Training data includes various question types aligned with standard educational frameworks. |
|
|
|
|
|
## Training Procedure |
|
|
|
|
|
### Hyperparameters |
|
|
|
|
|
| Parameter | Value | |
|
|
|-----------|-------| |
|
|
| Learning Rate | 0.0005 | |
|
|
| Training Batch Size | 8 (gradient accumulation) | |
|
|
| Epochs | 10 | |
|
|
| Optimizer | AdamW (fused) | |
|
|
| LR Scheduler | Cosine with 0.1 warmup ratio | |
|
|
| Seed | 42 | |
|
|
| Training Precision | Native AMP (Mixed Precision) | |
|
|
|
|
|
### Training Performance |
|
|
|
|
|
The model achieved strong convergence with decreasing training loss across epochs: |
|
|
|
|
|
| Epoch | Step | Training Loss | |
|
|
|-------|------|---------------| |
|
|
| 1.1 | 100 | 0.6345 | |
|
|
| 2.3 | 200 | 0.4720 | |
|
|
| 3.4 | 300 | 0.3499 | |
|
|
| 4.5 | 400 | 0.2457 | |
|
|
| 5.7 | 500 | 0.1229 | |
|
|
| 6.8 | 600 | 0.0728 | |
|
|
| 8.0 | 700 | 0.0398 | |
|
|
| 9.1 | 800 | 0.0213 | |
|
|
|
|
|
The model demonstrates consistent improvement in question generation quality as training progresses, with training loss decreasing from 0.63 to 0.02. |
|
|
|
|
|
## Framework Versions |
|
|
|
|
|
- PEFT: 0.17.1 |
|
|
- Transformers: 4.57.1 |
|
|
- PyTorch: 2.9.0+cu126 |
|
|
- Datasets: 4.0.0 |
|
|
- Tokenizers: 0.22.1 |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from peft import AutoPeftModelForCausalLM |
|
|
from transformers import AutoTokenizer |
|
|
|
|
|
model_id = "tokhey/question_generation_1.5B_model_v2" |
|
|
model = AutoPeftModelForCausalLM.from_pretrained(model_id) |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
|
|
|
# Generate questions from a passage |
|
|
prompt = "Generate 3 comprehension questions about: [your text passage]" |
|
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_length=512) |
|
|
print(tokenizer.decode(outputs[0])) |
|
|
``` |
|
|
|
|
|
## Recommendations for Use |
|
|
|
|
|
- Test the model on sample content before deploying in production |
|
|
- Review generated questions for accuracy and appropriateness |
|
|
- Use as an assistive tool to reduce educator workload, not as a sole assessment creation method |
|
|
- Provide context and learning materials with generated questions for optimal student engagement |
|
|
|
|
|
## License |
|
|
|
|
|
Apache License 2.0 |
|
|
|
|
|
--- |
|
|
|
|
|
*This model card was automatically generated and updated. For questions or contributions, please reach out to the model developers.* |