File size: 4,895 Bytes

e9cb6f0
 
d24f453
 
e9cb6f0
 
 
 
 
 
d24f453
 
e9cb6f0
d24f453
e9cb6f0
 
 
 
 
 
 
 
 
d24f453
 
 
 
e9cb6f0
d24f453
e9cb6f0
d24f453
e9cb6f0
 
 
d24f453
 
e9cb6f0
 
 
 
 
 
 
 
 
d24f453
e9cb6f0
d24f453
e9cb6f0
 
 
d24f453
e9cb6f0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d24f453
 
 
 
 
e9cb6f0
d24f453
 
 
e9cb6f0
d24f453
 
e9cb6f0
d24f453
 
 
 
 
e9cb6f0
d24f453
e9cb6f0
 
 
d24f453
e9cb6f0
 
 
 
 
d24f453

---
library_name: transformers
base_model:
- google/byt5-small
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->

This model is pre-trained to take a representation of a Finite State Transducer (FST) and a string and predict the output of the FST for that string. The FSTs for pre-training were synthetically generated.
The goal is to inject an inductive bias for FST-like tasks. Analysis of the model suggests that it has learned to internally simulate transitions between FST states in its hidden representations -- without being explicitly trained to do so.

See [SIP: Injecting a Structural Inductive Bias into a Seq2Seq Model by Simulation](https://aclanthology.org/2024.acl-long.355/) for all the details.

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

- **Developed by:** Matthias Lindemann
- **Funded by:** UKRI, Huawei, Dutch National Science Foundation
- **Model type:**  Sequence-to-Sequence model
- **Language(s) (NLP):** no natural language data was used for continual pretraining
- **License:** [More Information Needed]
- **Finetuned from model:** ByT5

### Model Sources

<!-- Provide the basic links for the model. -->

- **Repository:** https://github.com/namednil/sip
- **Paper:** https://aclanthology.org/2024.acl-long.355/

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

Without fine-tuning, the model can approximately simulate FST behavior (see also `namednil/sip-d4-pt` and the documentation in the git repo). The main use is in fine-tuning.

### Downstream Use

<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->

FST-like tasks such as grapheme-to-phoneme conversion, or simple text editing in few-shot setups.

### Out-of-Scope Use

<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->

[More Information Needed]

## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

[More Information Needed]

### Recommendations

<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

## How to Get Started with the Model

Use the code below to get started with the model.

```python
import transformers, torch
tokenizer = transformers.AutoTokenizer.from_pretrained("google/byt5-small")
model = transformers.AutoModelForSeq2SeqLM.from_pretrained("namednil/sip-d4", trust_remote_code=True)
# (always make sure to check the remote code on Huggingface!)

# Construct an optimizer that uses the SIP-finetuning procedure:
optimizer = model.get_optimizer(torch.optim.Adam, prefix_lr=1.0, lr=3e-4)
# ... fine-tune the model as usual

# The above code uses a random initialization of the tunable prefix of SIP. 
# If you don't want that and have more control over the length of the tunable prefix, run:

config = transformers.AutoConfig.from_pretrained("namednil/sip-d4", trust_remote_code=True)
config.random_selection = False
config.prefix_length = 50 
model = transformers.AutoModelForSeq2SeqLM.from_pretrained("namednil/sip-d4", config=config, trust_remote_code=True)
```

## Model Examination

<!-- Relevant interpretability work for the model goes here -->

See [SIP: Injecting a Structural Inductive Bias into a Seq2Seq Model by Simulation](https://aclanthology.org/2024.acl-long.355/)

## Environmental Impact

<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

- **Hardware Type:** Nvidia RTX 2080 Ti
- **Hours used:** 30
- **Compute Region:** Scotland
- **Carbon Emitted:** 0.2 kg CO2eq

## Citation

```bibtex
@inproceedings{lindemann-etal-2024-sip,
    title = "{SIP}: Injecting a Structural Inductive Bias into a {S}eq2{S}eq Model by Simulation",
    author = "Lindemann, Matthias  and
      Koller, Alexander  and
      Titov, Ivan",
    booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.acl-long.355/",
    doi = "10.18653/v1/2024.acl-long.355",
}
```