π₯ Skin Disease Knowledge Base Embeddings v2
Knowledge base lengkap untuk chatbot kesehatan kulit berbahasa Indonesia & English. Scraping dari multiple trusted sources untuk informasi yang comprehensive.
π Statistics
- Total Documents: 1346
- Embedding Dimension: 384
- Diseases Covered: 16
- Sources: wikipedia, alodokter, halodoc, sehatq
- Model:
sentence-transformers/all-MiniLM-L6-v2 - Last Updated: 2026-01-05 04:30:00.329293
π₯ Diseases Included
dermatitis_seboroik, eksim, herpes, impetigo, jerawat, kulit_berminyak, kulit_kering, kulit_sensitif, kurap, kutil ... (+6 more)
π Data Sources
- Wikipedia (EN & ID) - Authoritative medical information
- Halodoc - Indonesian health platform
- Alodokter - Trusted Indonesian medical site
- SehatQ - Health information portal
π Quick Start
from sentence_transformers import SentenceTransformer
import pickle
import numpy as np
from huggingface_hub import hf_hub_download
# Download KB
kb_path = hf_hub_download(
repo_id="Ardian122/skin-embeddings-v2",
filename="skin_kb.pkl"
)
# Load KB
with open(kb_path, "rb") as f:
kb = pickle.load(f)
print(f"Loaded {len(kb['documents'])} documents")
# Load embedder
embedder = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
# Search
def search_similar(query, top_k=5):
q_emb = embedder.encode(query, normalize_embeddings=True)
embeddings = np.array(kb['embeddings'])
sims = np.dot(embeddings, q_emb)
top_idx = np.argsort(sims)[-top_k:][::-1]
return [kb['documents'][i] for i in top_idx]
# Example
results = search_similar("penyebab jerawat", top_k=3)
for result in results:
print(result['text'][:200])
π License
Apache 2.0
β οΈ Disclaimer
This knowledge base is for educational purposes only. Always consult healthcare professionals for medical advice.