Instructions to use l3cube-pune/english-topic-all-doc with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use l3cube-pune/english-topic-all-doc with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="l3cube-pune/english-topic-all-doc")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("l3cube-pune/english-topic-all-doc") model = AutoModelForSequenceClassification.from_pretrained("l3cube-pune/english-topic-all-doc") - Notebooks
- Google Colab
- Kaggle
English-Doc-Topic-BERT
Engish-Doc-Topic-BERT model is a BERT-Base-uncased model fine-tuned on Engish documents from the L3Cube-IndicNews Corpus [dataset link]https://github.com/l3cube-pune/indic-nlp.
This dataset consists of sub-datasets like LDC (Long Document Classification), LPC (Long Paragraph Classification), and SHC (Short Headlines Classification), each having different document lengths.
This model is trained on a combination of all three variants and works well across different document sizes.
More details on the dataset, models, and baseline results can be found in our [paper]https://arxiv.org/abs/2401.02254
Citing:
@article{mirashi2024l3cube,
title={L3Cube-IndicNews: News-based Short Text and Long Document Classification Datasets in Indic Languages},
author={Mirashi, Aishwarya and Sonavane, Srushti and Lingayat, Purva and Padhiyar, Tejas and Joshi, Raviraj},
journal={arXiv preprint arXiv:2401.02254},
year={2024}
}
Other document topic models for different Indic languages are listed below:
Hindi-Doc-Topic-BERT
Marathi-Doc-Topic-BERT
Bengali-Doc-Topic-BERT
Telugu-Doc-Topic-BERT
Tamil-Doc-Topic-BERT
Gujarati-Doc-Topic-BERT
Kannada-Doc-Topic-BERT
Odia-Doc-Topic-BERT
Malayalam-Doc-Topic-BERT
Punjabi-Doc-Topic-BERT
English-Doc-Topic-BERT
- Downloads last month
- 2