Update README.md
Browse files
README.md
CHANGED
|
@@ -18,7 +18,7 @@ library_name: xgboost
|
|
| 18 |
|
| 19 |
# Article Extraction Outcome Classifier
|
| 20 |
|
| 21 |
-
A fast, lightweight classifier that categorizes web article extraction outcomes with
|
| 22 |
|
| 23 |
## Model Description
|
| 24 |
|
|
@@ -36,21 +36,19 @@ This model predicts whether HTML extraction succeeded, failed, or returned a non
|
|
| 36 |
|
| 37 |
## Performance
|
| 38 |
|
| 39 |
-
|
| 40 |
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
```
|
| 52 |
|
| 53 |
-
## Usage
|
| 54 |
|
| 55 |
```python
|
| 56 |
import numpy as np
|
|
|
|
| 18 |
|
| 19 |
# Article Extraction Outcome Classifier
|
| 20 |
|
| 21 |
+
A fast, lightweight classifier that categorizes web article extraction outcomes with 90% accuarcy
|
| 22 |
|
| 23 |
## Model Description
|
| 24 |
|
|
|
|
| 36 |
|
| 37 |
## Performance
|
| 38 |
|
| 39 |
+
~90% accuracy on a large, real-world test set, with strong performance on dominant classes
|
| 40 |
|
| 41 |
+
| Class | Precision | Recall | F1-score | Support |
|
| 42 |
+
| ------------------------- | --------- | ------ | -------- | ------- |
|
| 43 |
+
| full_article_extracted | 0.91 | 0.84 | 0.87 | 1,312 |
|
| 44 |
+
| partial_article_extracted | 0.76 | 0.63 | 0.69 | 92 |
|
| 45 |
+
| api_provider_error | 0.95 | 0.93 | 0.94 | 627 |
|
| 46 |
+
| other_failure | 0.41 | 0.28 | 0.33 | 44 |
|
| 47 |
+
| full_page_not_article | 0.92 | 0.97 | 0.94 | 11,821 |
|
| 48 |
+
| **Accuracy** | — | — | **0.90** | 13,852 |
|
| 49 |
+
| **Macro Avg** | 0.79 | 0.73 | 0.72 | 13,852 |
|
| 50 |
+
| **Weighted Avg** | 0.90 | 0.90 | 0.90 | 13,852 |
|
|
|
|
| 51 |
|
|
|
|
| 52 |
|
| 53 |
```python
|
| 54 |
import numpy as np
|