Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
1
6
Juan Pablo Mejia Gómez
Juanpa0128
Follow
0 followers
·
15 following
Juanpa0128j
AI & ML interests
LLMs, Agents, Transformers, ML, and many more...
Recent Activity
reacted
to
codelion
's
post
with 🔥
1 day ago
Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models! Key findings from our research on optimal architectures for small language models: → Depth beats width: 32 layers outperforms 12 layers at the same parameter count → Best-in-class factuality: 47.5% on TruthfulQA → 10x training efficiency using WSD (Warmup-Stable-Decay) conversion → Canon layers add only 0.13% parameters but improve reasoning We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens. Blog: https://huggingface.co/blog/codelion/optimal-model-architecture Model: https://huggingface.co/codelion/dhara-70m
liked
a model
1 day ago
zai-org/GLM-4.7
liked
a model
4 days ago
meta-llama/Llama-3.2-3B-Instruct
View all activity
Organizations
None yet
Juanpa0128
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
liked
a model
1 day ago
zai-org/GLM-4.7
Text Generation
•
358B
•
Updated
6 days ago
•
28.6k
•
•
1.22k
liked
4 models
4 days ago
meta-llama/Llama-3.2-3B-Instruct
Text Generation
•
3B
•
Updated
Oct 24, 2024
•
2.05M
•
•
1.89k
openai/gpt-oss-20b
Text Generation
•
22B
•
Updated
Aug 26
•
6.8M
•
•
4.13k
mistralai/Mistral-Small-3.1-24B-Instruct-2503
24B
•
Updated
7 days ago
•
81.2k
•
1.34k
HuggingFaceTB/SmolLM-135M
Text Generation
•
0.1B
•
Updated
Aug 1, 2024
•
313k
•
241
liked
a model
7 days ago
vidore/colpali-v1.3
Visual Document Retrieval
•
Updated
Mar 14
•
37.8k
•
83