Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
14.2
TFLOPS
15
4
9
Omar Kamali
PRO
omarkamali
Follow
Norod78's profile picture
abdelhaqueidali's profile picture
wnstjr3570's profile picture
57 followers
Β·
24 following
https://omarkama.li
omarkamali
omarkamali
omar-kamali
AI & ML interests
NLP & LLMs for low resource languages.
Recent Activity
updated
a dataset
4 days ago
omarkamali/wikipedia-monthly
posted
an
update
6 days ago
You're probably training on outdated Wikipedia data right now and don't know it. π‘ In June last year, a friend from the Moroccan Wikipedia community slid into my DMs: "Are you using the current version? The official dataset is severely outdated. We added so many articles nowhere to be found on HuggingFace." He was right. I was running a 2023 snapshot. In 2025. The official Wikipedia dataset, the one hundreds of labs and researchers grab by default without a second thought, was frozen in time. β’ For English, that's 700,000 missing articles. β’ For Moroccan Arabic, 30% of the language's entire Wikipedia. β’ For 31 other languages, there was literally no text corpus at all until recently. I could've shrugged and moved on. Instead I spent the next months building a monthly automated pipeline for 340+ languages, on my personal laptop, nearly killing it several times in the process (100% disk, frozen screen, the works). Nous Research trained Hermes 4 on it. INRIA cited it. It's now three years ahead of what most people are training on. Here's the full story of how I built Wikipedia Monthly π https://omarkamali.com/blog/wikipedia-monthly-pipeline
updated
a model
9 days ago
wikilangs/hu
View all activity
Organizations
omarkamali
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
liked
a model
about 1 month ago
Guyohm/ppo-LunarLander-v3
Reinforcement Learning
β’
Updated
Jan 26
β’
1
β’
1
liked
a model
2 months ago
uisikdag/qwen3-14b-tr-wiki-monthly-qlora
Text Generation
β’
Updated
Dec 14, 2025
β’
3
β’
1
liked
2 models
4 months ago
rocky1410/haipai-nano
Text Generation
β’
54.6M
β’
Updated
Jan 15
β’
2
MartinSeeler/txt2emoji-gemma-3-270m-it
Text Generation
β’
0.4B
β’
Updated
Oct 13, 2025
β’
1
β’
1
liked
a model
5 months ago
AhmetSemih/merged_dataset-32k-bpe-tokenizer
Updated
Nov 10, 2025
β’
1
liked
2 datasets
over 1 year ago
sawalni-ai/fw-darija-websites
Viewer
β’
Updated
Dec 8, 2024
β’
4k
β’
10
β’
2
sawalni-ai/fw-darija
Viewer
β’
Updated
Dec 8, 2024
β’
37.4k
β’
19
β’
9
liked
a model
over 1 year ago
sawalni-ai/smollm-fw-darija
Text Generation
β’
0.1B
β’
Updated
Dec 8, 2024
β’
16
β’
2
liked
a dataset
almost 2 years ago
imomayiz/morocco-img
Viewer
β’
Updated
Dec 27, 2023
β’
49
β’
25
β’
2