lfsm commited on
Commit
660b0ab
·
1 Parent(s): 1e0c568

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ## CC_FILTER
2
- this is ja cc filter fo reference from ja wiki vs random ja mc4, and build with following procedure.
3
  1. get ja wiki dump file, and extract the all url inside, get about 4M urls
4
  2. crawl 300K of 4M webpages from the urls
5
  3. get pure text and remove content len less than 1k,
 
1
  ## CC_FILTER
2
+ this is ja cc filter for reference from ja wiki vs random ja mc4, and build with following procedure.
3
  1. get ja wiki dump file, and extract the all url inside, get about 4M urls
4
  2. crawl 300K of 4M webpages from the urls
5
  3. get pure text and remove content len less than 1k,