adgw
/

quality_classifier_pl

Model card Files Files and versions

adgw commited on Jul 10, 2025

Commit

09a7ef8

·

verified ·

1 Parent(s): 5382e15

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -20,6 +20,7 @@ The script is designed for efficient, large-scale data processing. It leverages
 - **Scalable**: Capable of handling millions of documents by processing files sequentially and texts in parallel.
 - **Seamless Integration**: Appends classification results (`quality_ai` and `confidence`) directly to the original data, preserving all existing columns/keys.
 - **User-Friendly Progress**: Displays a `tqdm` progress bar to monitor the analysis in real-time.
 ## 3. How It Works

 - **Scalable**: Capable of handling millions of documents by processing files sequentially and texts in parallel.
 - **Seamless Integration**: Appends classification results (`quality_ai` and `confidence`) directly to the original data, preserving all existing columns/keys.
 - **User-Friendly Progress**: Displays a `tqdm` progress bar to monitor the analysis in real-time.
+- **Language-Aware Filtering**: Automatically classifies all non-Polish texts as LOW quality, unless a multilingual mix (e.g., Polish-English) is detected, in which case the model’s prediction may vary accordingly.
 ## 3. How It Works