Erkhembayar

erkhem-gantulga

AI & ML interests

NMT,TTS, LLM, ASR

Recent Activity

updated a dataset about 1 month ago

erkhem-gantulga/cv-mn-workshop

published a dataset about 1 month ago

erkhem-gantulga/cv-mn-workshop

liked a model 10 months ago

xai-org/grok-2

View all activity

Organizations

updated a dataset about 1 month ago

erkhem-gantulga/cv-mn-workshop

Viewer • Updated May 31 • 200 • 31 • 1

published a dataset about 1 month ago

erkhem-gantulga/cv-mn-workshop

Viewer • Updated May 31 • 200 • 31 • 1

liked a model 10 months ago

xai-org/grok-2

Updated Nov 5, 2025 • 42.8k • 1.11k

liked a model over 1 year ago

openai/whisper-large-v3-turbo

Automatic Speech Recognition • 0.8B • Updated Oct 4, 2024 • 6.94M • • 3.12k

updated a model almost 2 years ago

erkhem-gantulga/whisper-medium-mn

Automatic Speech Recognition • 0.8B • Updated Sep 16, 2024 • 50

reacted to AlexBodner's post with 🚀 almost 2 years ago

Post

3854

💾🧠How much VRAM will you need for training your AI model? 💾🧠
Check out this app where you convert:
Pytorch/tensorflow summary -> needed VRAM
or
Parameter count -> needed VRAM

Use it in: http://howmuchvram.com

And everything is open source! Ask for new functionalities or contribute in:
https://github.com/AlexBodner/How_Much_VRAM
If it's useful to you leave a star 🌟and share it to someone that will find the tool useful!

3 replies

updated a model almost 2 years ago

erkhem-gantulga/w2v-bert-2.0-mongolian-colab-CV17.0

Automatic Speech Recognition • 0.6B • Updated Aug 29, 2024 • 7

liked a dataset almost 2 years ago

reach-vb/jenny_tts_dataset

Viewer • Updated Jan 9, 2024 • 21k • 353 • 37

liked a Space almost 2 years ago

Open ASR Leaderboard

🏆

1.39k

Compare speech recognition models on benchmark scores

reacted to ylacombe's post with 🔥 almost 2 years ago

Post

9524

Yesterday, we released Parler-TTS and Data-Speech, fully open-source reproduction of work from the paper: Natural language guidance of high-fidelity text-to-speech with synthetic annotations (2402.01912)

Parler-TTS is a lightweight text-to-speech (TTS) model that can generate high-quality, natural sounding speech in the style of a given speaker (gender, pitch, speaking style, etc).

https://huggingface.co/collections/parler-tts/parler-tts-fully-open-source-high-quality-tts-models-66164ad285ba03e8ffde214c

Parler-TTS Mini v0.1, is the first iteration Parler-TTS model trained using 10k hours of narrated audiobooks. It generates high-quality speech with features that can be controlled using a simple text prompt (e.g. gender, background noise, speaking rate, pitch and reverberation).

To improve the prosody and naturalness of the speech further, we're scaling up the amount of training data to 50k hours of speech. The v1 release of the model will be trained on this data, as well as inference optimisations, such as flash attention and torch compile.

parler-tts/parler_tts_mini_v0.1

Data-Speech can be used for annotating speech characteristics in a large-scale setting.

parler-tts/open-source-speech-datasets-annotated-using-data-speech-661648ffa0d3d76bfa23d534

This work is both scalable and easily modifiable and will hopefully help the TTS research community explore new ways of conditionning speech synthesis.

All of the datasets, pre-processing, training code and weights are released publicly under permissive license, enabling the community to build on our work and develop their own powerful TTS models.

1 reply

reacted to 1aurent's post with 🔥 almost 2 years ago

Post

2623

Hey everyone 🤗!
Check out this cool new space from Finegrain: finegrain/finegrain-object-eraser

Under the hoods, it's a pipeline of models (currently exposed via an API) that allows you to easily erase any object from your image just by naming it or selecting it! Not only will the object disappear, but so will its effects on the scene, like shadows and reflections. Built on top of Refiners, our micro-framework for simple foundation model adaptation (feel free to star it on GitHub if you like it: https://github.com/finegrain-ai/refiners)