|
|
--- |
|
|
license: mit |
|
|
tags: |
|
|
- embedding |
|
|
- text-embedding |
|
|
- crypto |
|
|
- nlp |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
# crypto-mini-embed |
|
|
|
|
|
**crypto-mini-embed** adalah contoh model mini embedding berbasis arsitektur sederhana untuk eksperimen NLP seperti: |
|
|
|
|
|
- text similarity |
|
|
- vector search |
|
|
- clustering |
|
|
- semantic tagging |
|
|
- crypto-topic classification |
|
|
|
|
|
Model ini merupakan **dummy model** untuk membantu pengguna memahami struktur repository model di HuggingFace. |
|
|
|
|
|
--- |
|
|
|
|
|
## ⚙️ Arsitektur Model |
|
|
|
|
|
- Tipe model: `MiniEmbeddingModel` |
|
|
- Hidden size: 64 |
|
|
- Max length: 128 tokens |
|
|
- Framework: PyTorch |
|
|
- Format: Safetensors |
|
|
- Tokenizer: Basic CharTokenizer (dummy) |
|
|
|
|
|
--- |
|
|
|
|
|
## 📦 File dalam Model |
|
|
|
|
|
| File | Fungsi | |
|
|
|------|--------| |
|
|
| `config.json` | Konfigurasi model | |
|
|
| `tokenizer.json` | Tokenizer sederhana | |
|
|
| `model.safetensors` | Parameter model | |
|
|
| `README.md` | Dokumentasi model | |
|
|
|
|
|
--- |
|
|
|
|
|
## 🧪 Contoh Penggunaan |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModel |
|
|
import torch |
|
|
|
|
|
tok = AutoTokenizer.from_pretrained("0xcubin/crypto-mini-embed") |
|
|
model = AutoModel.from_pretrained("0xcubin/crypto-mini-embed") |
|
|
|
|
|
text = "Bitcoin is digital money" |
|
|
inputs = tok(text, return_tensors="pt") |
|
|
|
|
|
with torch.no_grad(): |
|
|
emb = model(**inputs).last_hidden_state.mean(dim=1) |
|
|
|
|
|
print(emb.shape) # contoh: (1, 64) |
|
|
|