FWeindel commited on
Commit
75d481a
·
verified ·
1 Parent(s): 025163e

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +10 -19
README.md CHANGED
@@ -5,35 +5,26 @@ TReconLM is a decoder-only transformer model for trace reconstruction of noisy D
5
  ## Model Variants
6
 
7
  ### Pretrained Models (Fixed Length)
8
-
9
- | Model | Sequence Length | Description |
10
- |-------|-----------------|-------------|
11
- | `model_seq_len_60.pt` | 60nt | Pretrained on synthetic IDS data |
12
- | `model_seq_len_110.pt` | 110nt | Pretrained on synthetic IDS data |
13
- | `model_seq_len_180.pt` | 180nt | Pretrained on synthetic IDS data |
14
 
15
  ### Pretrained Models (Variable Length)
16
-
17
- | Model | Sequence Length | Description |
18
- |-------|-----------------|-------------|
19
- | `model_var_len_50_120.pt` | 50-120nt | Pretrained on synthetic IDS data with variable sequence lengths |
20
 
21
  ### Fine-tuned Models
 
 
 
22
 
23
- | Model | Sequence Length | Description |
24
- |-------|-----------------|-------------|
25
- | `finetuned_noisy_dna_len60.pt` | 60nt | Fine-tuned on Noisy-DNA dataset |
26
- | `finetuned_microsoft_dna_len110.pt` | 110nt | Fine-tuned on Microsoft DNA dataset |
27
- | `finetuned_chandak_len117.pt` | 117nt | Fine-tuned on Chandak dataset |
28
-
29
- Each model supports reconstruction from cluster sizes between 2 and 10.
30
 
31
  ## How to Use
32
 
33
  Tutorial notebooks are available in our [GitHub repository](https://github.com/MLI-lab/TReconLM) under `tutorial/`:
34
 
35
  - `quick_start.ipynb`: Run inference on synthetic datasets from HuggingFace
36
- - `custom_data.ipynb`: Run inference on your own data or real-world datasets (Microsoft DNA, Noisy-DNA)
37
 
38
  The test datasets used in the notebooks can be downloaded from [Hugging Face](https://huggingface.co/datasets/mli-lab/TReconLM_datasets).
39
 
@@ -47,4 +38,4 @@ For full experimental details, see [our paper](http://arxiv.org/abs/2507.12927).
47
 
48
  ## Limitations
49
 
50
- Models trained for fixed sequence lengths may perform worse on other lengths or if the test data distribution differs significantly from the training data. The variable-length model (`model_var_len_50_120.pt`) is trained with the same compute budget as the fixed-length models, so it sees less data per sequence length and performs slightly worse for a specific fixed length.
 
5
  ## Model Variants
6
 
7
  ### Pretrained Models (Fixed Length)
8
+ - `model_seq_len_60.pt` (60nt)
9
+ - `model_seq_len_110.pt` (110nt)
10
+ - `model_seq_len_180.pt` (180nt)
 
 
 
11
 
12
  ### Pretrained Models (Variable Length)
13
+ - `model_var_len_50_120.pt` (50-120nt)
 
 
 
14
 
15
  ### Fine-tuned Models
16
+ - `finetuned_noisy_dna_len60.pt` (60nt, [Noisy-DNA dataset](https://doi.org/10.1038/s41467-020-14319-8))
17
+ - `finetuned_microsoft_dna_len110.pt` (110nt, [Microsoft DNA dataset](https://doi.org/10.1109/ISIT45174.2021.9518012))
18
+ - `finetuned_chandak_len117.pt` (117nt, [Chandak dataset](https://doi.org/10.1109/ICASSP40776.2020.9053441))
19
 
20
+ All models support reconstruction from cluster sizes between 2 and 10.
 
 
 
 
 
 
21
 
22
  ## How to Use
23
 
24
  Tutorial notebooks are available in our [GitHub repository](https://github.com/MLI-lab/TReconLM) under `tutorial/`:
25
 
26
  - `quick_start.ipynb`: Run inference on synthetic datasets from HuggingFace
27
+ - `custom_data.ipynb`: Run inference on your own data or real-world datasets (Microsoft DNA, Noisy-DNA, Chandak)
28
 
29
  The test datasets used in the notebooks can be downloaded from [Hugging Face](https://huggingface.co/datasets/mli-lab/TReconLM_datasets).
30
 
 
38
 
39
  ## Limitations
40
 
41
+ Models trained for fixed sequence lengths may perform worse on other lengths or if the test data distribution differs significantly from the training data. The variable-length model (`model_var_len_50_120.pt`) is trained with the same compute budget as our fixed-length models, so it sees less data per sequence length and may perform slightly worse for a specific fixed length.