zzqsb commited on
Commit
fb63150
·
verified ·
1 Parent(s): 993797a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -13
README.md CHANGED
@@ -1,32 +1,21 @@
1
  # PDB 3Di Chains Dataset
2
 
3
  This repository contains chain-level sequences and 3Di tokens derived from RCSB PDB structures, with per-chain polymer class labels (`prot`, `DNA`, `RNA`, `other`). Files are chunked/merged from 1k-folders and cleaned for consistent CSV schema.
 
4
 
5
  ## Files
6
 
7
  3di_chains_chaintag_all.csv
8
  474 MB
9
- xet
10
- Upload 4 files
11
- 13 minutes ago
12
 
13
  3di_chains_chaintag_prot_only.csv
14
  464 MB
15
- xet
16
- Upload 4 files
17
- 13 minutes ago
18
-
19
- 3di_chains_chaintag_raw.csv
20
- 482 MB
21
- xet
22
- Upload 4 files
23
- 13 minutes ago
24
 
25
  3di_chains_chaintag_sample_1000.csv
26
 
 
27
  - `_all` contains **all chains** (including **D-amino**, **RNA**, **DNA**, and any others).
28
  - `_prot_only` contains **protein chains only** (L- and D-amino acids treated as protein).
29
- - `_raw` is the merged, minimally-processed export prior to minor repairs (e.g., malformed lines).
30
  - `_sample_1000` is a 1,000-row random sample for quick inspection.
31
 
32
  ## Schema
 
1
  # PDB 3Di Chains Dataset
2
 
3
  This repository contains chain-level sequences and 3Di tokens derived from RCSB PDB structures, with per-chain polymer class labels (`prot`, `DNA`, `RNA`, `other`). Files are chunked/merged from 1k-folders and cleaned for consistent CSV schema.
4
+ Current data based on only 120K proteins.
5
 
6
  ## Files
7
 
8
  3di_chains_chaintag_all.csv
9
  474 MB
 
 
 
10
 
11
  3di_chains_chaintag_prot_only.csv
12
  464 MB
 
 
 
 
 
 
 
 
 
13
 
14
  3di_chains_chaintag_sample_1000.csv
15
 
16
+
17
  - `_all` contains **all chains** (including **D-amino**, **RNA**, **DNA**, and any others).
18
  - `_prot_only` contains **protein chains only** (L- and D-amino acids treated as protein).
 
19
  - `_sample_1000` is a 1,000-row random sample for quick inspection.
20
 
21
  ## Schema