---
license: apache-2.0
language:
- bn
base_model:
- google/electra-small-discriminator
---

# VĀC-BERT

**VĀC-BERT** is a 17 million-parameter model, trained on the Vācaspati literary dataset. Despite its compact size, VĀC-BERT achieves competitive performance with state-of-the-art masked-language and downstream models that are over seven times larger.

## Model Details

- **Architecture:** Electra-small (but reduced to 17 M parameters)  
- **Pretraining Corpus:** Vācaspati — a curated Bangla literary corpus  
- **Parameter Count:** 17 M (≈ 1/7th the size of BERT-base)  
- **Tokenizer:** WordPiece, vocabulary size 50 K


## Usage Example

```python
from transformers import BertTokenizer, AutoModelForSequenceClassification

tokenizer = BertTokenizer.from_pretrained("Vacaspati/VAC-BERT")
model = AutoModelForSequenceClassification.from_pretrained("Vacaspati/VAC-BERT")
```

We are releasing the Vācaspati dataset. For access to Vācaspati dataset please fill this form.

Link: https://forms.gle/DiVm2fSVCyXXMbkU9

Vācaspati dataset can also be accessed from: https://huggingface.co/datasets/Vacaspati/Vacaspati

## Citation

If you are using this model please cite:

```bibtex

@inproceedings{bhattacharyya-etal-2023-vacaspati,
    title = "{VACASPATI}: A Diverse Corpus of {B}angla Literature",
    author = "Bhattacharyya, Pramit  and
      Mondal, Joydeep  and
      Maji, Subhadip  and
      Bhattacharya, Arnab",
    editor = "Park, Jong C.  and
      Arase, Yuki  and
      Hu, Baotian  and
      Lu, Wei  and
      Wijaya, Derry  and
      Purwarianti, Ayu  and
      Krisnadhi, Adila Alfa",
    booktitle = "Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = nov,
    year = "2023",
    address = "Nusa Dua, Bali",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.ijcnlp-main.72/",
    doi = "10.18653/v1/2023.ijcnlp-main.72",
    pages = "1118--1130"
}

```