dytr / README.md
alsubari's picture
Update README.md
7a96126 verified
---
license: apache-2.0
language:
- ar
- en
- af
- fr
- fa
- ff
- fi
- fj
- fo
- fy
- qu
- zh
- za
- zu
- he
- es
- el
- ro
- ta
- pa
- hi
- de
metrics:
- perplexity
- f1
- accuracy
base_model:
- google-bert/bert-base-uncased
---
# dytr - Dynamic Transformer Library
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![PyPI version](https://badge.fury.io/py/dytr.svg)](https://badge.fury.io/py/dytr)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://github.com/AAlsubari/dytr/blob/main/dytr_bert_finetune_demo.ipynb)
**Build dynamic transformers that learn multiple tasks with supports to load and modify pretrained model such as bert.**
dytr is a flexible PyTorch library for multi-task learning with transformer architectures. Train multiple tasks sequentially or simultaneously while preserving performance on previous tasks through built-in continual learning techniques.
## Why dytr?
- 🎯 **Multi-Task Ready** - Train classification, generation, and sequence tasks in one model
- 🧠 **Never Forgets** - Built-in EWC and experience replay prevent catastrophic forgetting
- πŸ”§ **No Black Box** - Full control over architecture, understand every component
- ⚑ **Lightweight** - Pure PyTorch, minimal dependencies
- πŸ“¦ **Pretrained Support** - Load BERT, RoBERTa, and more as your encoder backbone and fine tune it on multiple tasks.
-
## Installation
```bash
pip install dytr
```
## Quick Start
```python
from dytr import DynamicTransformer, ModelConfig, TaskConfig, TrainingStrategy, Trainer, SingleDatasetProcessing
import pandas as pd
# 1. Configure your transformer
config = ModelConfig(
embed_dim=256,
num_layers=6,
num_heads=8,
max_seq_len=256
)
# 2. Create the model
model = DynamicTransformer(config)
# data loading and processing
train_data = pd.DataFrame({
'text': ['Great movie!', 'Terrible film.', 'Amazing acting!', 'Boring plot.'],
'label': [1, 0, 1, 0]
})
train_dataset = SingleDatasetProcessing(
df=train_data,
tokenizer=model.tokenizer,
max_len=128,
task_name="sentiment_analysis",
strategy=TrainingStrategy.SENTENCE_CLASSIFICATION,
text_column="text",
label_column="label"
)
# 3. Add a task
task = TaskConfig(
task_name="sentiment_analysis",
training_strategy=TrainingStrategy.SENTENCE_CLASSIFICATION,
num_labels=2,# train_data.num_labels
)
#model.add_task(task) # not require it will be add automatically during the training process
# Initialize trainer and train
trainer = Trainer(model, config, exp_dir="./experiments")
train_datasets = {"sentiment_analysis": (train_dataset, TrainingStrategy.SENTENCE_CLASSIFICATION)}
model = trainer.train([task], train_datasets, {})# you can set more than one for list of tasks and dataset for multitasks training
# 4. Generate predictions
result = model.generate("This product is amazing!", task_name="sentiment_analysis")
print(f"Prediction: {result['prediction']}")
# Save the entire multi-task model
model.save_model("multi_task_model.pt")
# Load the model
loaded_model = DynamicTransformer.load_model("multi_task_model.pt")
```
## Core Capabilities
### Multiple Training Strategies
| Strategy | Purpose | Use Case |
|----------|---------|----------|
| **Causal LM** | Autoregressive text generation | Chatbots, content creation |
| **Seq2Seq** | Input to output transformation | Translation, summarization |
| **Sentence Classification** | Document-level categorization | Sentiment, topic detection |
| **Token Classification** | Token-level labeling | Named entity recognition, POS tagging |
### Continual Learning
Train tasks sequentially without losing previous knowledge:
```python
config = ModelConfig(
use_ewc=True, # Protect important weights
use_replay=True, # Replay old samples
use_task_adapters=True, # Task-specific modules
ewc_lambda=1000.0,
replay_buffer_size=2000
)
model = DynamicTransformer(config)
# Train tasks one after another
for task in task_list:
model.add_task(task)
trainer.train([task], train_data, val_data)
# Previous tasks remain accurate
# The trainer automatically handles EWC and replay buffer, but you should add the samples to the pretrained model
```
### Pretrained Encoders
Load powerful encoders as your backbone and extend them with tasks:
```python
from dytr import PretrainedModelLoader
loader = PretrainedModelLoader()
config = ModelConfig(tokenizer_name='bert-base-uncased',per_device_train_batch_size=32,num_train_epochs=3,per_device_eval_batch_size=8,special_tokens={},use_task_adapters=False,use_ewc=True,use_replay=True,use_rotary_embedding=False, training_from_scratch=False)
# Load pretrained BERT as your encoder
model = loader.load_pretrained('bert-base-uncased', config)
# Now add your own tasks - the model is fully dytr compatible
class_train = pd.DataFrame(
{
"text": [
"Great product!",
"Poor quality.",
"Excellent service!",
"Very disappointed.",
"Highly recommended!",
],
"label": [1, 0, 1, 0, 1],
}
)
classification_task = TaskConfig(
task_name="sentiment",
training_strategy=TrainingStrategy.SENTENCE_CLASSIFICATION,
num_labels=2,
text_column="text",
label_column="label",
max_length=128,
)
class_dataset = SingleDatasetProcessing(
df=class_train,
tokenizer=model.tokenizer,
max_len=classification_task.max_length,
task_name=classification_task.task_name,
strategy=classification_task.training_strategy,
num_labels=classification_task.num_labels,
text_column=classification_task.text_column,
label_column=classification_task.label_column,
)
# Causal LM task data (text generation)
lm_train = pd.DataFrame(
{
"text": [
"The sun rises in the east.",
"Cats are adorable animals.",
"Machine learning is fascinating.",
"Python is a great programming language.",
"Deep learning powers modern AI.",
]
}
)
lm_task = TaskConfig(
task_name="text_generation",
training_strategy=TrainingStrategy.CAUSAL_LM,
max_length=256,
)
lm_dataset = SingleDatasetProcessing(
df=lm_train,
tokenizer=model.tokenizer,
max_len=lm_task.max_length,
task_name=lm_task.task_name,
strategy=lm_task.training_strategy,
text_column="text",
)
train_datasets = {
classification_task.task_name: (class_dataset, classification_task.training_strategy),
lm_task.task_name: (lm_dataset, lm_task.training_strategy),
}
val_datasets = {
#classification_task.task_name: (class_val_dataset, classification_task.training_strategy)
}
# 6. Train model
print("\n6. Training model...")
trainer = Trainer(model, config, exp_dir="./multi_task_experiments")
model = trainer.train([classification_task, lm_task], train_datasets, val_datasets)
#model = trainer.train([ lm_task], train_datasets, val_datasets)
test_texts = ["This is amazing!", "I hate this."]
for text in test_texts:
result = model.generate(text, task_name="sentiment")
sentiment = "POSITIVE" if result["prediction"] == 1 else "NEGATIVE"
print(f" {text} -> {sentiment}")
# Test generation
print("\n Text generation test:")
prompt = "The future of technology"
generated = model.generate(prompt, task_name="text_generation", max_new_tokens=20)
print(f" Prompt: {prompt}")
print(f" Generated: {generated}")
#model.add_task(sentiment_task)
#model.add_task(ner_task)
#model.add_task(translation_task)
# Train, generate, and use just like any dytr model
```
### Task-Specific Learning Rates
Different components learn at different speeds:
```python
config = ModelConfig(
learning_rate=3e-4,
head_lr_mult=2.0, # Task heads: fast adaptation
decoder_lr_mult=0.5, # Decoders: moderate
shared_lr_mult=0.1 # Shared encoder: preserve knowledge
)
```
## Architecture Overview
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ DynamicTransformer β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Shared Encoder β”‚ β”‚
β”‚ β”‚ (Pretrained or from scratch) β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β–Ό β–Ό β–Ό β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚Task 1β”‚ β”‚Task 2β”‚ β”‚Task 3β”‚ β”‚
β”‚ β”‚ Head β”‚ β”‚ Head β”‚ β”‚Decoderβ”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β–Ό β–Ό β–Ό β”‚
β”‚Classification NER Generation β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
## Who Should Use dytr?
| Audience | Why It Matters |
|----------|----------------|
| **Researchers** | Customize every aspect of the transformer architecture, Test continual learning algorithms with EWC and experience replay, experiment with multi-task architectures, Experiment with task-specific learning rates and adapters, Analyze forgetting behavior across sequential tasks |
| **Developers** | Add new tasks without retraining from scratch, Load pretrained models and extend them with your own tasks, Build production-ready multi-task systems without complex dependencies |
| **Students** | Understand transformers from scratch with transparent, readable code, Visualize the impact of hyperparameters on model size, Learn multi-task learning concepts hands-on |
| **Organizations** | Deploy single models that handle multiple tasks efficiently , Deploy lighter, faster inference systems, Maintain knowledge across task updates with continual learning |
## Key Differentiators
- **Full Transparency** - No hidden complexity, understand every component
- **Continual Learning First** - Built from the ground up for sequential task learning
- **Truly Dynamic** - Add or remove tasks without retraining from scratch
- **Pure PyTorch** - No heavy dependencies, easy to customize
## Requirements
- Python 3.8+
- PyTorch 1.10+
- NumPy
- pandas
- scikit-learn
- tqdm
- requests
## Documentation
- **ModelConfig**: Architecture, training, and continual learning parameters
- **TaskConfig**: Dataset configuration, column mapping, task-specific settings
- **TrainingStrategy**: Causal LM, Seq2Seq, Sentence Classification, Token Classification
- **PretrainedModelLoader**: Load BERT, RoBERTa, DistilBERT, ALBERT as encoders
## License
Apache License 2.0
## project links
- [PyPI Page](https://pypi.org/project/dytr/)
- [GitHub Repository](https://github.com/AAlsubari/dytr)
## Author
**Dr. Akram Alsubari**
- akram.alsubari@outlook.com
- akram.alsubari87@gmail.com
## Contributing
Contributions are welcome! Open issues or share your use cases.
## Support and Contact
For questions, issues, or suggestions:
For questions, issues, or suggestions:
- πŸ“§ **Email**: akram.alsubari@outlook.com
- πŸ”— **LinkedIn**: [https://www.linkedin.com/in/akram-alsubari/](https://www.linkedin.com/in/akram-alsubari/)
- πŸ“± **Connect**: Feel free to reach out for collaborations, research discussions, or feedback
- πŸŽ“ **Research Interests**: Natural Language Processing, Deep Learning, Transformers, Continual Learning, Multi-Task Learning, Large Language Models
---
**Build once. Learn multiple tasks. Never forget.**