Joblib
adgw commited on
Commit
09a7ef8
·
verified ·
1 Parent(s): 5382e15

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -0
README.md CHANGED
@@ -20,6 +20,7 @@ The script is designed for efficient, large-scale data processing. It leverages
20
  - **Scalable**: Capable of handling millions of documents by processing files sequentially and texts in parallel.
21
  - **Seamless Integration**: Appends classification results (`quality_ai` and `confidence`) directly to the original data, preserving all existing columns/keys.
22
  - **User-Friendly Progress**: Displays a `tqdm` progress bar to monitor the analysis in real-time.
 
23
 
24
  ## 3. How It Works
25
 
 
20
  - **Scalable**: Capable of handling millions of documents by processing files sequentially and texts in parallel.
21
  - **Seamless Integration**: Appends classification results (`quality_ai` and `confidence`) directly to the original data, preserving all existing columns/keys.
22
  - **User-Friendly Progress**: Displays a `tqdm` progress bar to monitor the analysis in real-time.
23
+ - **Language-Aware Filtering**: Automatically classifies all non-Polish texts as LOW quality, unless a multilingual mix (e.g., Polish-English) is detected, in which case the model’s prediction may vary accordingly.
24
 
25
  ## 3. How It Works
26