| --- |
| license: apache-2.0 |
| language: |
| - ar |
| - en |
| - af |
| - fr |
| - fa |
| - ff |
| - fi |
| - fj |
| - fo |
| - fy |
| - qu |
| - zh |
| - za |
| - zu |
| - he |
| - es |
| - el |
| - ro |
| - ta |
| - pa |
| - hi |
| - de |
| metrics: |
| - perplexity |
| - f1 |
| - accuracy |
| base_model: |
| - google-bert/bert-base-uncased |
| --- |
| # dytr - Dynamic Transformer Library |
|
|
| [](https://www.python.org/downloads/) |
| [](https://opensource.org/licenses/Apache-2.0) |
| [](https://badge.fury.io/py/dytr) |
| [](https://github.com/AAlsubari/dytr/blob/main/dytr_bert_finetune_demo.ipynb) |
|
|
| **Build dynamic transformers that learn multiple tasks with supports to load and modify pretrained model such as bert.** |
|
|
| dytr is a flexible PyTorch library for multi-task learning with transformer architectures. Train multiple tasks sequentially or simultaneously while preserving performance on previous tasks through built-in continual learning techniques. |
|
|
| ## Why dytr? |
|
|
| - π― **Multi-Task Ready** - Train classification, generation, and sequence tasks in one model |
| - π§ **Never Forgets** - Built-in EWC and experience replay prevent catastrophic forgetting |
| - π§ **No Black Box** - Full control over architecture, understand every component |
| - β‘ **Lightweight** - Pure PyTorch, minimal dependencies |
| - π¦ **Pretrained Support** - Load BERT, RoBERTa, and more as your encoder backbone and fine tune it on multiple tasks. |
| - |
|
|
| ## Installation |
|
|
| ```bash |
| pip install dytr |
| ``` |
|
|
| ## Quick Start |
|
|
| ```python |
| from dytr import DynamicTransformer, ModelConfig, TaskConfig, TrainingStrategy, Trainer, SingleDatasetProcessing |
| import pandas as pd |
| |
| # 1. Configure your transformer |
| config = ModelConfig( |
| embed_dim=256, |
| num_layers=6, |
| num_heads=8, |
| max_seq_len=256 |
| ) |
| |
| # 2. Create the model |
| model = DynamicTransformer(config) |
| |
| # data loading and processing |
| train_data = pd.DataFrame({ |
| 'text': ['Great movie!', 'Terrible film.', 'Amazing acting!', 'Boring plot.'], |
| 'label': [1, 0, 1, 0] |
| }) |
| train_dataset = SingleDatasetProcessing( |
| df=train_data, |
| tokenizer=model.tokenizer, |
| max_len=128, |
| task_name="sentiment_analysis", |
| strategy=TrainingStrategy.SENTENCE_CLASSIFICATION, |
| text_column="text", |
| label_column="label" |
| ) |
| # 3. Add a task |
| task = TaskConfig( |
| task_name="sentiment_analysis", |
| training_strategy=TrainingStrategy.SENTENCE_CLASSIFICATION, |
| num_labels=2,# train_data.num_labels |
| ) |
| #model.add_task(task) # not require it will be add automatically during the training process |
| |
| # Initialize trainer and train |
| trainer = Trainer(model, config, exp_dir="./experiments") |
| train_datasets = {"sentiment_analysis": (train_dataset, TrainingStrategy.SENTENCE_CLASSIFICATION)} |
| model = trainer.train([task], train_datasets, {})# you can set more than one for list of tasks and dataset for multitasks training |
| |
| # 4. Generate predictions |
| result = model.generate("This product is amazing!", task_name="sentiment_analysis") |
| print(f"Prediction: {result['prediction']}") |
| |
| # Save the entire multi-task model |
| model.save_model("multi_task_model.pt") |
| |
| # Load the model |
| loaded_model = DynamicTransformer.load_model("multi_task_model.pt") |
| |
| |
| ``` |
|
|
| ## Core Capabilities |
|
|
| ### Multiple Training Strategies |
|
|
| | Strategy | Purpose | Use Case | |
| |----------|---------|----------| |
| | **Causal LM** | Autoregressive text generation | Chatbots, content creation | |
| | **Seq2Seq** | Input to output transformation | Translation, summarization | |
| | **Sentence Classification** | Document-level categorization | Sentiment, topic detection | |
| | **Token Classification** | Token-level labeling | Named entity recognition, POS tagging | |
|
|
| ### Continual Learning |
|
|
| Train tasks sequentially without losing previous knowledge: |
|
|
| ```python |
| config = ModelConfig( |
| use_ewc=True, # Protect important weights |
| use_replay=True, # Replay old samples |
| use_task_adapters=True, # Task-specific modules |
| ewc_lambda=1000.0, |
| replay_buffer_size=2000 |
| ) |
| |
| model = DynamicTransformer(config) |
| |
| # Train tasks one after another |
| for task in task_list: |
| model.add_task(task) |
| trainer.train([task], train_data, val_data) |
| # Previous tasks remain accurate |
| # The trainer automatically handles EWC and replay buffer, but you should add the samples to the pretrained model |
| ``` |
|
|
| ### Pretrained Encoders |
|
|
| Load powerful encoders as your backbone and extend them with tasks: |
|
|
| ```python |
| from dytr import PretrainedModelLoader |
| |
| loader = PretrainedModelLoader() |
| config = ModelConfig(tokenizer_name='bert-base-uncased',per_device_train_batch_size=32,num_train_epochs=3,per_device_eval_batch_size=8,special_tokens={},use_task_adapters=False,use_ewc=True,use_replay=True,use_rotary_embedding=False, training_from_scratch=False) |
| |
| # Load pretrained BERT as your encoder |
| model = loader.load_pretrained('bert-base-uncased', config) |
| |
| # Now add your own tasks - the model is fully dytr compatible |
| class_train = pd.DataFrame( |
| { |
| "text": [ |
| "Great product!", |
| "Poor quality.", |
| "Excellent service!", |
| "Very disappointed.", |
| "Highly recommended!", |
| ], |
| "label": [1, 0, 1, 0, 1], |
| } |
| ) |
| classification_task = TaskConfig( |
| task_name="sentiment", |
| training_strategy=TrainingStrategy.SENTENCE_CLASSIFICATION, |
| num_labels=2, |
| text_column="text", |
| label_column="label", |
| max_length=128, |
| ) |
| class_dataset = SingleDatasetProcessing( |
| df=class_train, |
| tokenizer=model.tokenizer, |
| max_len=classification_task.max_length, |
| task_name=classification_task.task_name, |
| strategy=classification_task.training_strategy, |
| num_labels=classification_task.num_labels, |
| text_column=classification_task.text_column, |
| label_column=classification_task.label_column, |
| ) |
| # Causal LM task data (text generation) |
| lm_train = pd.DataFrame( |
| { |
| "text": [ |
| "The sun rises in the east.", |
| "Cats are adorable animals.", |
| "Machine learning is fascinating.", |
| "Python is a great programming language.", |
| "Deep learning powers modern AI.", |
| ] |
| } |
| ) |
| lm_task = TaskConfig( |
| task_name="text_generation", |
| training_strategy=TrainingStrategy.CAUSAL_LM, |
| max_length=256, |
| ) |
| lm_dataset = SingleDatasetProcessing( |
| df=lm_train, |
| tokenizer=model.tokenizer, |
| max_len=lm_task.max_length, |
| task_name=lm_task.task_name, |
| strategy=lm_task.training_strategy, |
| text_column="text", |
| ) |
| train_datasets = { |
| classification_task.task_name: (class_dataset, classification_task.training_strategy), |
| lm_task.task_name: (lm_dataset, lm_task.training_strategy), |
| } |
| |
| val_datasets = { |
| #classification_task.task_name: (class_val_dataset, classification_task.training_strategy) |
| } |
| |
| # 6. Train model |
| print("\n6. Training model...") |
| trainer = Trainer(model, config, exp_dir="./multi_task_experiments") |
| model = trainer.train([classification_task, lm_task], train_datasets, val_datasets) |
| |
| #model = trainer.train([ lm_task], train_datasets, val_datasets) |
| test_texts = ["This is amazing!", "I hate this."] |
| for text in test_texts: |
| result = model.generate(text, task_name="sentiment") |
| sentiment = "POSITIVE" if result["prediction"] == 1 else "NEGATIVE" |
| print(f" {text} -> {sentiment}") |
| |
| # Test generation |
| print("\n Text generation test:") |
| prompt = "The future of technology" |
| generated = model.generate(prompt, task_name="text_generation", max_new_tokens=20) |
| print(f" Prompt: {prompt}") |
| print(f" Generated: {generated}") |
| |
| |
| |
| #model.add_task(sentiment_task) |
| #model.add_task(ner_task) |
| #model.add_task(translation_task) |
| |
| # Train, generate, and use just like any dytr model |
| ``` |
|
|
| ### Task-Specific Learning Rates |
|
|
| Different components learn at different speeds: |
|
|
| ```python |
| config = ModelConfig( |
| learning_rate=3e-4, |
| head_lr_mult=2.0, # Task heads: fast adaptation |
| decoder_lr_mult=0.5, # Decoders: moderate |
| shared_lr_mult=0.1 # Shared encoder: preserve knowledge |
| ) |
| ``` |
|
|
| ## Architecture Overview |
|
|
| ``` |
| βββββββββββββββββββββββββββββββββββββββββββ |
| β DynamicTransformer β |
| βββββββββββββββββββββββββββββββββββββββββββ€ |
| β βββββββββββββββββββββββββββββββββββ β |
| β β Shared Encoder β β |
| β β (Pretrained or from scratch) β β |
| β βββββββββββββββββββββββββββββββββββ β |
| β β β |
| β βββββββββββββββΌββββββββββββββ β |
| β βΌ βΌ βΌ β |
| β ββββββββ ββββββββ ββββββββ β |
| β βTask 1β βTask 2β βTask 3β β |
| β β Head β β Head β βDecoderβ β |
| β ββββββββ ββββββββ ββββββββ β |
| β β β β β |
| β βΌ βΌ βΌ β |
| βClassification NER Generation β |
| βββββββββββββββββββββββββββββββββββββββββββ |
| ``` |
|
|
|
|
|
|
| ## Who Should Use dytr? |
|
|
| | Audience | Why It Matters | |
| |----------|----------------| |
| | **Researchers** | Customize every aspect of the transformer architecture, Test continual learning algorithms with EWC and experience replay, experiment with multi-task architectures, Experiment with task-specific learning rates and adapters, Analyze forgetting behavior across sequential tasks | |
| | **Developers** | Add new tasks without retraining from scratch, Load pretrained models and extend them with your own tasks, Build production-ready multi-task systems without complex dependencies | |
| | **Students** | Understand transformers from scratch with transparent, readable code, Visualize the impact of hyperparameters on model size, Learn multi-task learning concepts hands-on | |
| | **Organizations** | Deploy single models that handle multiple tasks efficiently , Deploy lighter, faster inference systems, Maintain knowledge across task updates with continual learning | |
|
|
| ## Key Differentiators |
|
|
| - **Full Transparency** - No hidden complexity, understand every component |
| - **Continual Learning First** - Built from the ground up for sequential task learning |
| - **Truly Dynamic** - Add or remove tasks without retraining from scratch |
| - **Pure PyTorch** - No heavy dependencies, easy to customize |
|
|
| ## Requirements |
|
|
| - Python 3.8+ |
| - PyTorch 1.10+ |
| - NumPy |
| - pandas |
| - scikit-learn |
| - tqdm |
| - requests |
|
|
| ## Documentation |
|
|
| - **ModelConfig**: Architecture, training, and continual learning parameters |
| - **TaskConfig**: Dataset configuration, column mapping, task-specific settings |
| - **TrainingStrategy**: Causal LM, Seq2Seq, Sentence Classification, Token Classification |
| - **PretrainedModelLoader**: Load BERT, RoBERTa, DistilBERT, ALBERT as encoders |
|
|
| ## License |
|
|
| Apache License 2.0 |
|
|
| ## project links |
| - [PyPI Page](https://pypi.org/project/dytr/) |
| - [GitHub Repository](https://github.com/AAlsubari/dytr) |
|
|
| ## Author |
|
|
| **Dr. Akram Alsubari** |
| - akram.alsubari@outlook.com |
| - akram.alsubari87@gmail.com |
|
|
| ## Contributing |
|
|
| Contributions are welcome! Open issues or share your use cases. |
| ## Support and Contact |
| For questions, issues, or suggestions: |
| For questions, issues, or suggestions: |
| - π§ **Email**: akram.alsubari@outlook.com |
| - π **LinkedIn**: [https://www.linkedin.com/in/akram-alsubari/](https://www.linkedin.com/in/akram-alsubari/) |
| - π± **Connect**: Feel free to reach out for collaborations, research discussions, or feedback |
|
|
| - π **Research Interests**: Natural Language Processing, Deep Learning, Transformers, Continual Learning, Multi-Task Learning, Large Language Models |
| --- |
|
|
| **Build once. Learn multiple tasks. Never forget.** |