Spaces:
Sleeping
Sleeping
| ABOUT_TEXT = """ | |
| ## Web Languages Project | |
| Welcome! This is a crowd-sourced effort to improve crawling | |
| of low-resource languages. This dataset is public. | |
| [Common Crawl](https://commoncrawl.github.io/cc-crawl-statistics/plots/languages) | |
| recognizes a lot of languages, and we can see that we don't have | |
| enough of languages like Hindi (500 million speakers!), smaller | |
| country languages like Hungarian, and regional languages like Catalan. | |
| We are interested in languages from all over the world. If you choose | |
| to help, you'll be helping create lists of websites related to | |
| languages that you read or speak. | |
| ### How can I contribute? | |
| If you look below you'll see a huge list of living languages. If you | |
| see one that looks interesting, click on it. You'll see a | |
| language-specific document, probably mostly blank, that you can fill | |
| out. | |
| There are 2 ways to add to this document. If you aren't very familiar | |
| with Github, you can copy the entire document into an email, fill it | |
| out, and send it to web-languages ZAT commoncrawl ZOT org. We'll do the rest. | |
| If you are familiar with Github, and are logged in, click on the pen | |
| icon in the upper right corner to start editing the document. | |
| Github will request that you fork the repo. Do that, edit the | |
| document, and finally create a pull request. | |
| To see a partially completed example, look at the | |
| [Welsh](living/welsh.md) entry. | |
| Sometimes asking a Large Language Model can be helpful: "What are some | |
| top websites written in the Welsh language?" | |
| ### What kind of websites are you looking for? | |
| If you look at the template, we have requested urls in a few | |
| categories: News, Culture/History, Government, Political Parties, and | |
| Other. Remember that we're looking for websites in this particular | |
| language. If the language is only a part of the website, and that's | |
| visible in the URL as https://example.com/catalan/, then that's the | |
| URL you should add. | |
| For a language like Hindi, with 500 million speakers, there are a lot | |
| of websites to choose from. Please suggest websites that are important | |
| and influential, and please think about diversity. Are all geographic | |
| regions represented? | |
| For a country-wide language like Hungarian, there are still probably a | |
| wide variety of websites to choose from. If a website is all English, | |
| however, that's not what we're looking for. | |
| For a regional language like Catalan, things are trickier. Catalan has | |
| multiple names -- it's called Valencian in some parts of Spain -- and | |
| use of the Catalan language is a part of a vigorous debate in Spanish | |
| national and regional politics. You might not be able to find | |
| Catalan-language content for every political party, and government | |
| websites might offer Catalan content one day and remove it after | |
| the next election. In that case, please do the best you can. | |
| If your favorite language has its own Wikipedia -- [check the list here](https://en.wikipedia.org/wiki/List_of_Wikipedias) -- | |
| please include this link under "Other". | |
| """ |