AI & ML interests

Building the future of AI through collaborative research, shared datasets, and open-source models. | Managed by @Cossale

Recent Activity

Cossale  published a dataset 18 days ago
keplersystems/UrduPoetry-106k
Cossale  updated a Space 7 months ago
keplersystems/README
View all activity

Cossale 
posted an update 3 months ago
view post
Post
281
Releasing 8 multilingual datasets from the People's Archive of Rural India (PAARI).
Indian languages represent 1B+ speakers but remain underrepresented in quality training data. These datasets help address that gap.
Languages: Hindi, Urdu, Punjabi, Tamil, Telugu, Marathi, Gujarati, English
Scripts: Devanagari, Arabic, Gurmukhi, Tamil, Telugu, Gujarati
Total: 7,650 articles, 19.9M tokens, 51MB
Content covers rural life, agriculture, social issues, and cultural traditions. Professionally written journalism, not web scrapes.
Free to use.
Collection: https://huggingface.co/collections/keplersystems/paari-datasets
Technical details: https://kepler.systems/blog/introducing-paari-datasets
Cossale 
updated a Space 7 months ago
Cossale 
updated a Space about 1 year ago