Instructions to use shariqtorres/torres-llm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use shariqtorres/torres-llm with Transformers:
# Load model directly from transformers import MyTransformerLM model = MyTransformerLM.from_pretrained("shariqtorres/torres-llm", dtype="auto") - Notebooks
- Google Colab
- Kaggle
torres-llm
This model's output is absolute dogshit, but I created this as an excerise for others to learn how to create a language model from PyTorch modules, and how to integrate that model into the HuggingFace ecosystem. You can see the Google Colab over here.
Learnings from this exercise:
HF has great tooling but if you step outside of their ecosystem, it is hard to fit a custom model within it. I ran into plenty of gotchas along the way and the path to integration was not easy.
After doing all of this, I learn that HF does offer the opportunity to use a model's architecture without any pretraining as a part of the transformers library. Unless you have a need for a custom architecture, it would be best to do that. For the record, this is how you would do that.
from transformers import GPT2Config, GPT2LMHeadModel, DataCollatorForLanguageModeling
config = GPT2Config(
vocab_size=tokenizer.vocab_size,
n_positions=BLOCK_SIZE,
n_embd=512,
n_layer=6,
n_head=8,
bos_token_id=tokenizer.bos_token_id,
eos_token_id=tokenizer.eos_token_id,
)
model = GPT2LMHeadModel(config) # randomly initialized — training from scratch
collator = DataCollatorForLanguageModeling(tokenizer, mlm=False)
# Then pass data_collator=collator to Trainer; you can skip the group_texts
# step and let the collator handle labels.
It achieves the following results on the evaluation set:
- Loss: 5.5769
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 64
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.95) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 500
- num_epochs: 10
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 22.4902 | 6.8522 | 500 | 5.8179 |
Framework versions
- Transformers 5.0.0
- Pytorch 2.10.0+cu128
- Datasets 4.0.0
- Tokenizers 0.22.2
- Downloads last month
- -