Update README.md
Browse files
README.md
CHANGED
|
@@ -103,16 +103,9 @@ print(scores.tolist())
|
|
| 103 |
## Training Data
|
| 104 |
### Dataset Details
|
| 105 |
- **Source**: Reddit dataset with English-Armenian translations
|
| 106 |
-
- **Size**:
|
| 107 |
- **Content Type**: Title and body text pairs
|
| 108 |
-
- **
|
| 109 |
-
- Training Set:
|
| 110 |
-
- Translated Title Tokens: 23,921,393
|
| 111 |
-
- Translated Body Tokens: 194,200,654
|
| 112 |
-
- Test Set:
|
| 113 |
-
- Translated Title Tokens: 242,443
|
| 114 |
-
- Translated Body Tokens: 1,946,164
|
| 115 |
-
- **Split Ratio**: 99% train, 1% test
|
| 116 |
|
| 117 |
## Training Procedure
|
| 118 |
### Training Details
|
|
|
|
| 103 |
## Training Data
|
| 104 |
### Dataset Details
|
| 105 |
- **Source**: Reddit dataset with English-Armenian translations
|
| 106 |
+
- **Size**: 0.66M pairs of rows
|
| 107 |
- **Content Type**: Title and body text pairs
|
| 108 |
+
- **Split Ratio**: 98.5% train, 1.5% test
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 109 |
|
| 110 |
## Training Procedure
|
| 111 |
### Training Details
|