Instructions to use catherinearnett/B-GPT_pl_en_sequential with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use catherinearnett/B-GPT_pl_en_sequential with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="catherinearnett/B-GPT_pl_en_sequential")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("catherinearnett/B-GPT_pl_en_sequential")
model = AutoModelForCausalLM.from_pretrained("catherinearnett/B-GPT_pl_en_sequential", device_map="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use catherinearnett/B-GPT_pl_en_sequential with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "catherinearnett/B-GPT_pl_en_sequential"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "catherinearnett/B-GPT_pl_en_sequential",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/catherinearnett/B-GPT_pl_en_sequential

SGLang

How to use catherinearnett/B-GPT_pl_en_sequential with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "catherinearnett/B-GPT_pl_en_sequential" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "catherinearnett/B-GPT_pl_en_sequential",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "catherinearnett/B-GPT_pl_en_sequential" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "catherinearnett/B-GPT_pl_en_sequential",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use catherinearnett/B-GPT_pl_en_sequential with Docker Model Runner:
```
docker model run hf.co/catherinearnett/B-GPT_pl_en_sequential
```

B-GPT_pl_en_sequential

This is a bilingual GPT-2 style model. For the first half of training, this model was trained only on Polish data. In the second half of training, the model was trained on only English data. At the end of training, 50% of training data seen by the model is Polish and 50% is English. The tokenizer was trained on the same overall proportions of data as the language model at the final step.

This model was released alongside the paper On the Acquisition of Shared Grammatical Representations in Bilingual Language Models, which contains more details about the models. Additionally, the OSF page provides all code and data related to the project.

Model details:

All models are trained with a [CLS] (same as [BOS]) token prepended, and a [SEP] (same as [EOS]) token separating sequences. For best results, make sure that [CLS] is prepended to your input sequence (see sample usage linked above)! Details for this model specifically:

Architecture: gpt2
Parameters: 124770816
Maximum sequence length: 512 tokens
Training tokens: 12B
Vocabulary size: 50000
Compute cost: ~9 NVIDIA A6000 GPU hours
CO2 Emission: 1.17 kg

Training dataset: OSCAR 2021/09

Checkpoints are taken at training steps: 0, 10000, 20000, 30000, 40000, 50000, 64000, 64010, 64020, 64030, 64040, 64050, 64060, 64070, 64080, 64090, 64100, 64110, 64120, 64130, 64140, 64150, 64160, 64170, 64180, 64190, 64200, 64300, 64400, 64500, 64600, 64700, 64800, 64900, 65000, 66000, 67000, 68000, 69000, 70000, 80000, 90000, 100000, 110000, 120000, 128000.

Use This Model

Load the model:

Note: if you do not specify a revision, it will load the final checkpoint of the model. See above for the list of checkpoints. The checkpoint step is the name of the revision.

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("catherinearnett/B-GPT_en_nl_sequential")
model = AutoModelForCausalLM.from_pretrained("catherinearnett/B-GPT_en_nl_sequential", revision = "128000")

Text Generation:

from transformers import pipeline

pipe = pipeline("text-generation", model="catherinearnett/B-GPT_en_nl_sequential")
    
print(pipe("I am a", max_length=20)[0]["generated_text"])

Citation

If you use this model, please cite:

@article{arnett2025acquisition,
  title={On the Acquisition of Shared Grammatical Representations in Bilingual Language Models},
  author={Arnett, Catherine and Chang, Tyler A and Michaelov, James A and Bergen, Benjamin K},
  journal={arXiv preprint arXiv:2503.03962},
  year={2025}
}

Downloads last month: 10

Safetensors

Model size

0.1B params

Tensor type

F32

Dataset used to train catherinearnett/B-GPT_pl_en_sequential

Collection including catherinearnett/B-GPT_pl_en_sequential

B-GPT

Collection

Bilingual GPT-2 models with checkpoints • 16 items • Updated Apr 23 • 2

Paper for catherinearnett/B-GPT_pl_en_sequential

On the Acquisition of Shared Grammatical Representations in Bilingual Language Models

Paper • 2503.03962 • Published Mar 5, 2025 • 4