Update README.md
Browse files
README.md
CHANGED
|
@@ -19,15 +19,17 @@ Founded in 2019, Anvilogic specializes in AI-driven threat detection and automat
|
|
| 19 |
|
| 20 |
### Models
|
| 21 |
|
| 22 |
-
- **
|
| 23 |
-
- **Cross-
|
| 24 |
-
- **T5
|
| 25 |
|
| 26 |
### Datasets
|
| 27 |
|
| 28 |
- **Embedder training dataset :** Dataset formatted to train embedding model with (Anchor,Positive) pairs
|
| 29 |
-
- **Cross-Encoder :** Dataset formatted to train Cross-encoder model with (Anchor,Positive,label) samples.
|
| 30 |
-
- **T5
|
| 31 |
|
| 32 |
### Spaces
|
| 33 |
-
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
### Models
|
| 21 |
|
| 22 |
+
- **Embedders :** This model provides representation for domain names. This is used to mine similar domains. This model exists both based on RoBERTa model (with BPE tokenization) and CANINE-c (with character-level encoding)
|
| 23 |
+
- **Cross-Encoders :** This model is able to compare two domain names and conclude if one domain is a typosquat of another. This model exists both based on RoBERTa model (with BPE tokenization) and CANINE-c (with character-level encoding)
|
| 24 |
+
- **T5 :** This model is a derived version of T5 trained on a new task, with the prefix : "Is the first domain a typosquat of the second : " to which we append *TYPOSQUAT_DOMAIN* and *LEGITIMATE_DOMAIN*
|
| 25 |
|
| 26 |
### Datasets
|
| 27 |
|
| 28 |
- **Embedder training dataset :** Dataset formatted to train embedding model with (Anchor,Positive) pairs
|
| 29 |
+
- **Cross-Encoder training dataset :** Dataset formatted to train Cross-encoder model with (Anchor,Positive,label) samples.
|
| 30 |
+
- **T5 training dataset :** Dataset formatted to train T5 model with (prompt,response) pairs .
|
| 31 |
|
| 32 |
### Spaces
|
| 33 |
+
- **Embedder Typosquat Detect :** Allows the user to retrieve most similar domains from a pool of 4000 most common domains.
|
| 34 |
+
- **CE Typosquat Detect :** Allows the user to compare two domains using Cross-encoders.The model outputs of a probability of typosquatting.
|
| 35 |
+
- **T5 Typosquat Detect :** Allows the user to compare two domains using T5. The model outputs a boolean.
|