--- license: apache-2.0 language: - ar - en - af - fr - fa - ff - fi - fj - fo - fy - qu - zh - za - zu - he - es - el - ro - ta - pa - hi - de metrics: - perplexity - f1 - accuracy base_model: - google-bert/bert-base-uncased --- # dytr - Dynamic Transformer Library [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/) [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![PyPI version](https://badge.fury.io/py/dytr.svg)](https://badge.fury.io/py/dytr) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://github.com/AAlsubari/dytr/blob/main/dytr_bert_finetune_demo.ipynb) **Build dynamic transformers that learn multiple tasks with supports to load and modify pretrained model such as bert.** dytr is a flexible PyTorch library for multi-task learning with transformer architectures. Train multiple tasks sequentially or simultaneously while preserving performance on previous tasks through built-in continual learning techniques. ## Why dytr? - 🎯 **Multi-Task Ready** - Train classification, generation, and sequence tasks in one model - 🧠 **Never Forgets** - Built-in EWC and experience replay prevent catastrophic forgetting - πŸ”§ **No Black Box** - Full control over architecture, understand every component - ⚑ **Lightweight** - Pure PyTorch, minimal dependencies - πŸ“¦ **Pretrained Support** - Load BERT, RoBERTa, and more as your encoder backbone and fine tune it on multiple tasks. - ## Installation ```bash pip install dytr ``` ## Quick Start ```python from dytr import DynamicTransformer, ModelConfig, TaskConfig, TrainingStrategy, Trainer, SingleDatasetProcessing import pandas as pd # 1. Configure your transformer config = ModelConfig( embed_dim=256, num_layers=6, num_heads=8, max_seq_len=256 ) # 2. Create the model model = DynamicTransformer(config) # data loading and processing train_data = pd.DataFrame({ 'text': ['Great movie!', 'Terrible film.', 'Amazing acting!', 'Boring plot.'], 'label': [1, 0, 1, 0] }) train_dataset = SingleDatasetProcessing( df=train_data, tokenizer=model.tokenizer, max_len=128, task_name="sentiment_analysis", strategy=TrainingStrategy.SENTENCE_CLASSIFICATION, text_column="text", label_column="label" ) # 3. Add a task task = TaskConfig( task_name="sentiment_analysis", training_strategy=TrainingStrategy.SENTENCE_CLASSIFICATION, num_labels=2,# train_data.num_labels ) #model.add_task(task) # not require it will be add automatically during the training process # Initialize trainer and train trainer = Trainer(model, config, exp_dir="./experiments") train_datasets = {"sentiment_analysis": (train_dataset, TrainingStrategy.SENTENCE_CLASSIFICATION)} model = trainer.train([task], train_datasets, {})# you can set more than one for list of tasks and dataset for multitasks training # 4. Generate predictions result = model.generate("This product is amazing!", task_name="sentiment_analysis") print(f"Prediction: {result['prediction']}") # Save the entire multi-task model model.save_model("multi_task_model.pt") # Load the model loaded_model = DynamicTransformer.load_model("multi_task_model.pt") ``` ## Core Capabilities ### Multiple Training Strategies | Strategy | Purpose | Use Case | |----------|---------|----------| | **Causal LM** | Autoregressive text generation | Chatbots, content creation | | **Seq2Seq** | Input to output transformation | Translation, summarization | | **Sentence Classification** | Document-level categorization | Sentiment, topic detection | | **Token Classification** | Token-level labeling | Named entity recognition, POS tagging | ### Continual Learning Train tasks sequentially without losing previous knowledge: ```python config = ModelConfig( use_ewc=True, # Protect important weights use_replay=True, # Replay old samples use_task_adapters=True, # Task-specific modules ewc_lambda=1000.0, replay_buffer_size=2000 ) model = DynamicTransformer(config) # Train tasks one after another for task in task_list: model.add_task(task) trainer.train([task], train_data, val_data) # Previous tasks remain accurate # The trainer automatically handles EWC and replay buffer, but you should add the samples to the pretrained model ``` ### Pretrained Encoders Load powerful encoders as your backbone and extend them with tasks: ```python from dytr import PretrainedModelLoader loader = PretrainedModelLoader() config = ModelConfig(tokenizer_name='bert-base-uncased',per_device_train_batch_size=32,num_train_epochs=3,per_device_eval_batch_size=8,special_tokens={},use_task_adapters=False,use_ewc=True,use_replay=True,use_rotary_embedding=False, training_from_scratch=False) # Load pretrained BERT as your encoder model = loader.load_pretrained('bert-base-uncased', config) # Now add your own tasks - the model is fully dytr compatible class_train = pd.DataFrame( { "text": [ "Great product!", "Poor quality.", "Excellent service!", "Very disappointed.", "Highly recommended!", ], "label": [1, 0, 1, 0, 1], } ) classification_task = TaskConfig( task_name="sentiment", training_strategy=TrainingStrategy.SENTENCE_CLASSIFICATION, num_labels=2, text_column="text", label_column="label", max_length=128, ) class_dataset = SingleDatasetProcessing( df=class_train, tokenizer=model.tokenizer, max_len=classification_task.max_length, task_name=classification_task.task_name, strategy=classification_task.training_strategy, num_labels=classification_task.num_labels, text_column=classification_task.text_column, label_column=classification_task.label_column, ) # Causal LM task data (text generation) lm_train = pd.DataFrame( { "text": [ "The sun rises in the east.", "Cats are adorable animals.", "Machine learning is fascinating.", "Python is a great programming language.", "Deep learning powers modern AI.", ] } ) lm_task = TaskConfig( task_name="text_generation", training_strategy=TrainingStrategy.CAUSAL_LM, max_length=256, ) lm_dataset = SingleDatasetProcessing( df=lm_train, tokenizer=model.tokenizer, max_len=lm_task.max_length, task_name=lm_task.task_name, strategy=lm_task.training_strategy, text_column="text", ) train_datasets = { classification_task.task_name: (class_dataset, classification_task.training_strategy), lm_task.task_name: (lm_dataset, lm_task.training_strategy), } val_datasets = { #classification_task.task_name: (class_val_dataset, classification_task.training_strategy) } # 6. Train model print("\n6. Training model...") trainer = Trainer(model, config, exp_dir="./multi_task_experiments") model = trainer.train([classification_task, lm_task], train_datasets, val_datasets) #model = trainer.train([ lm_task], train_datasets, val_datasets) test_texts = ["This is amazing!", "I hate this."] for text in test_texts: result = model.generate(text, task_name="sentiment") sentiment = "POSITIVE" if result["prediction"] == 1 else "NEGATIVE" print(f" {text} -> {sentiment}") # Test generation print("\n Text generation test:") prompt = "The future of technology" generated = model.generate(prompt, task_name="text_generation", max_new_tokens=20) print(f" Prompt: {prompt}") print(f" Generated: {generated}") #model.add_task(sentiment_task) #model.add_task(ner_task) #model.add_task(translation_task) # Train, generate, and use just like any dytr model ``` ### Task-Specific Learning Rates Different components learn at different speeds: ```python config = ModelConfig( learning_rate=3e-4, head_lr_mult=2.0, # Task heads: fast adaptation decoder_lr_mult=0.5, # Decoders: moderate shared_lr_mult=0.1 # Shared encoder: preserve knowledge ) ``` ## Architecture Overview ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ DynamicTransformer β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ Shared Encoder β”‚ β”‚ β”‚ β”‚ (Pretrained or from scratch) β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β–Ό β–Ό β–Ό β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚Task 1β”‚ β”‚Task 2β”‚ β”‚Task 3β”‚ β”‚ β”‚ β”‚ Head β”‚ β”‚ Head β”‚ β”‚Decoderβ”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β–Ό β–Ό β–Ό β”‚ β”‚Classification NER Generation β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ## Who Should Use dytr? | Audience | Why It Matters | |----------|----------------| | **Researchers** | Customize every aspect of the transformer architecture, Test continual learning algorithms with EWC and experience replay, experiment with multi-task architectures, Experiment with task-specific learning rates and adapters, Analyze forgetting behavior across sequential tasks | | **Developers** | Add new tasks without retraining from scratch, Load pretrained models and extend them with your own tasks, Build production-ready multi-task systems without complex dependencies | | **Students** | Understand transformers from scratch with transparent, readable code, Visualize the impact of hyperparameters on model size, Learn multi-task learning concepts hands-on | | **Organizations** | Deploy single models that handle multiple tasks efficiently , Deploy lighter, faster inference systems, Maintain knowledge across task updates with continual learning | ## Key Differentiators - **Full Transparency** - No hidden complexity, understand every component - **Continual Learning First** - Built from the ground up for sequential task learning - **Truly Dynamic** - Add or remove tasks without retraining from scratch - **Pure PyTorch** - No heavy dependencies, easy to customize ## Requirements - Python 3.8+ - PyTorch 1.10+ - NumPy - pandas - scikit-learn - tqdm - requests ## Documentation - **ModelConfig**: Architecture, training, and continual learning parameters - **TaskConfig**: Dataset configuration, column mapping, task-specific settings - **TrainingStrategy**: Causal LM, Seq2Seq, Sentence Classification, Token Classification - **PretrainedModelLoader**: Load BERT, RoBERTa, DistilBERT, ALBERT as encoders ## License Apache License 2.0 ## project links - [PyPI Page](https://pypi.org/project/dytr/) - [GitHub Repository](https://github.com/AAlsubari/dytr) ## Author **Dr. Akram Alsubari** - akram.alsubari@outlook.com - akram.alsubari87@gmail.com ## Contributing Contributions are welcome! Open issues or share your use cases. ## Support and Contact For questions, issues, or suggestions: For questions, issues, or suggestions: - πŸ“§ **Email**: akram.alsubari@outlook.com - πŸ”— **LinkedIn**: [https://www.linkedin.com/in/akram-alsubari/](https://www.linkedin.com/in/akram-alsubari/) - πŸ“± **Connect**: Feel free to reach out for collaborations, research discussions, or feedback - πŸŽ“ **Research Interests**: Natural Language Processing, Deep Learning, Transformers, Continual Learning, Multi-Task Learning, Large Language Models --- **Build once. Learn multiple tasks. Never forget.**