Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
Building on HF
75.2
TFLOPS
281
21
59
nyuuzyou
PRO
nyuuzyou
Follow
arhanovich's profile picture
Pomni's profile picture
Hima27's profile picture
287 followers
·
33 following
https://ducks.party/donate
nyuuzyou
nyuuzyou
AI & ML interests
None yet
Recent Activity
posted
an
update
2 days ago
🌐 NNTP Discussion Archives - 387M Messages from Public Newsgroups - https://huggingface.co/datasets/nyuuzyou/nntp-text-387m Here's something different from the code datasets: 20+ years of public discussion archives from NNTP newsgroups. Clean Parquet format, but this time it's conversations instead of code. Key Stats: - 386,629,949 messages from 159,345 newsgroups - 191 GB compressed Parquet storage - Spans 2002-2026 - Multilingual: English, German, French, Italian, Dutch, Polish, Russian, and others - Email addresses redacted for privacy The data is messy in the way real discussions are messy. Spam wasn't filtered out - you get the advertisements, the arguments, the off-topic threads, all of it. If you want sanitized text, this isn't it. If you want to see how people actually talked online before Discord and Reddit took over, here you go. Processing kept it simple: convert everything to UTF-8, remove exact duplicates, strip binary attachments, redact emails. Legacy character encodings were a nightmare - had to handle Windows-1252, ISO-8859 variants, KOI8-R, Shift-JIS, GBK, and others just to get readable text. At least it was fun to do, and I think the result turned out pretty well. I hope someone else will also be able to have fun or gain something useful from this project.
new
activity
3 days ago
nyuuzyou/nntp-text-387m:
[bot] Conversion to Parquet
updated
a dataset
3 days ago
nyuuzyou/nntp-text-387m
View all activity
Organizations
nyuuzyou
's datasets
146
Sort: Recently updated
nyuuzyou/nntp-text-387m
Viewer
•
Updated
3 days ago
•
387M
•
164
•
3
nyuuzyou/sprite-compositing
Viewer
•
Updated
4 days ago
•
50.8k
•
158
nyuuzyou/animations
Viewer
•
Updated
5 days ago
•
50.8k
•
89
•
2
nyuuzyou/begemot
Viewer
•
Updated
12 days ago
•
2.73M
•
138
•
3
nyuuzyou/edutexts
Viewer
•
Updated
13 days ago
•
1.38M
•
134
•
5
nyuuzyou/google-code-archive
Viewer
•
Updated
19 days ago
•
65.8M
•
766
•
35
nyuuzyou/moshub-code
Viewer
•
Updated
19 days ago
•
15.7M
•
101
•
2
nyuuzyou/gitverse-code
Viewer
•
Updated
19 days ago
•
2.8M
•
147
•
2
nyuuzyou/notabug-code
Viewer
•
Updated
19 days ago
•
12.6M
•
318
•
6
nyuuzyou/jihulab-code
Viewer
•
Updated
19 days ago
•
1.85M
•
166
•
3
nyuuzyou/gitflic-code
Viewer
•
Updated
19 days ago
•
5.98M
•
146
•
2
nyuuzyou/gitgud-code
Viewer
•
Updated
19 days ago
•
16.3M
•
252
•
2
nyuuzyou/gitcode-code
Viewer
•
Updated
19 days ago
•
48.1M
•
668
•
4
nyuuzyou/gitee-code
Viewer
•
Updated
20 days ago
•
819M
•
7.02k
•
11
nyuuzyou/ru-QnA-333K
Viewer
•
Updated
20 days ago
•
333k
•
112
•
1
nyuuzyou/joyreactor
Updated
Dec 26, 2025
•
5
nyuuzyou/pastvu
Updated
Jul 26, 2025
•
11.4k
•
3
nyuuzyou/uwupad
Preview
•
Updated
Jul 11, 2025
•
76
•
1
nyuuzyou/juick
Preview
•
Updated
Jul 6, 2025
•
256
•
1
nyuuzyou/soundbible
Viewer
•
Updated
Jun 29, 2025
•
2.08k
•
14
•
1
nyuuzyou/goodgame
Viewer
•
Updated
Jun 23, 2025
•
39.3k
•
5
•
4
nyuuzyou/glif
Viewer
•
Updated
Jun 22, 2025
•
4.36M
•
3
nyuuzyou/manus
Updated
Jun 13, 2025
•
6
nyuuzyou/ambientcg
Viewer
•
Updated
Jun 13, 2025
•
2.36k
•
13
•
1
nyuuzyou/texturecan
Viewer
•
Updated
Jun 12, 2025
•
650
•
68
nyuuzyou/pbrpx
Viewer
•
Updated
Jun 12, 2025
•
710
•
12
nyuuzyou/textureninja
Viewer
•
Updated
Jun 11, 2025
•
4.56k
•
156
nyuuzyou/cc0-textures
Viewer
•
Updated
Jun 10, 2025
•
21.1k
•
18
•
1
nyuuzyou/ClevelandMuseumArt
Viewer
•
Updated
Jun 7, 2025
•
67.9k
•
3
nyuuzyou/Minecraft-Skins-20M
Viewer
•
Updated
Jun 4, 2025
•
19.9M
•
236
•
5
Previous
1
2
3
...
5
Next