AI & ML interests

On-Device AI, Small Language Models (SLMs), hyper-specialized code generation, model compression (quantization), and synthetic dataset curation.

Recent Activity

aj1mlyd1nΒ  updated a collection about 13 hours ago
The Hippo Ecosystem: End-to-End Molecular & Bio-Safety AI
aj1mlyd1nΒ  updated a collection about 13 hours ago
The Hippo Ecosystem: End-to-End Molecular & Bio-Safety AI
aj1mlyd1nΒ  updated a Space about 13 hours ago
ZemResearch/README
View all activity

Organization Card

🧬 Hello from ZemResearch!

Mixing Artificial Intelligence with Chemistry, one dataset at a time.


πŸ‘‹ Who We Are

Welcome to ZemResearch! We are an open-source research initiative passionate about bridging the gap between computer science and molecular biology. We believe that training specialized, lightweight Large Language Models (LLMs) shouldn't require massive corporate budgetsβ€”it just needs incredibly clean data and smart engineering.

🎯 What We Do

  • 🧹 Extreme Data Cleaning: We don't just scrape data; we sterilize it. We heavily rely on tools like RDKit to ensure our molecular datasets obey the fundamental laws of chemistry.
  • πŸ€– Lightweight AI Models: We focus on fine-tuning accessible, efficient LLMs that can run smoothly without needing massive GPU clusters.
  • 🌍 Open Science: Everything we build is dedicated to the global open-source community. Let's democratize AI drug discovery together!

πŸš€ Our Flagship Project

  • HippoCrates: A massive, heavily sterilized dataset containing 1.46 million molecular structures. It's ready-to-use (in Apache Parquet format) for text-generation and chemical bioactivity fine-tuning.
  • HippoXic: A premium, domain-specific instruction-tuning dataset containing 10,630 highly curated rows focused on chemical toxicology, FDA clinical safety, and real-world side effects. It bridges the gap between molecular structures and clinical bio-safety reasoning.

🀝 Let's Collaborate

Got a cool idea for molecular LLMs, or just want to chat about AI in healthcare? Feel free to explore our datasets, open a discussion in our repositories, or reach out. We are always open to new collaborations!


Stay curious. Keep building. πŸš€

models 0

None public yet