Instructions to use deepvk/deberta-v1-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use deepvk/deberta-v1-base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="deepvk/deberta-v1-base")# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("deepvk/deberta-v1-base") model = AutoModel.from_pretrained("deepvk/deberta-v1-base") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -49,7 +49,7 @@ A mix of the following data: Wikipedia, Books, Twitter comments, Pikabu, Proza.r
|
|
| 49 |
1. Calculate shingles with size of 5
|
| 50 |
2. Calculate MinHash with 100 seeds β for every sample (text) have a hash of size 100
|
| 51 |
3. Split every hash into 10 buckets β every bucket, which contains (100 / 10) = 10 numbers, get hashed into 1 hash β we have 10 hashes for every sample
|
| 52 |
-
4. For each bucket find duplicates: find samples which have the same hash β calculate pair-wise jaccard
|
| 53 |
5. Gather duplicates from all the buckets and filter
|
| 54 |
|
| 55 |
### Training Hyperparameters
|
|
|
|
| 49 |
1. Calculate shingles with size of 5
|
| 50 |
2. Calculate MinHash with 100 seeds β for every sample (text) have a hash of size 100
|
| 51 |
3. Split every hash into 10 buckets β every bucket, which contains (100 / 10) = 10 numbers, get hashed into 1 hash β we have 10 hashes for every sample
|
| 52 |
+
4. For each bucket find duplicates: find samples which have the same hash β calculate pair-wise jaccard similarity β if the similarity is >0.7 than it's a duplicate
|
| 53 |
5. Gather duplicates from all the buckets and filter
|
| 54 |
|
| 55 |
### Training Hyperparameters
|