VyLala commited on
Commit
432213c
·
verified ·
1 Parent(s): 9bfa4c2

Update mtdna_tool_explainer_updated.html

Browse files
Files changed (1) hide show
  1. mtdna_tool_explainer_updated.html +135 -135
mtdna_tool_explainer_updated.html CHANGED
@@ -1,135 +1,135 @@
1
- <!DOCTYPE html>
2
- <html lang="en">
3
- <head>
4
- <meta charset="UTF-8">
5
- <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
- <title>mtDNA Tool – System Overview</title>
7
-
8
- <style>
9
- .custom-container {
10
- background-color: #ffffff !important;
11
- color: #222222 !important;
12
- font-family: Arial, sans-serif !important;
13
- line-height: 1.6 !important;
14
- padding: 2rem !important;
15
- max-width: 900px !important;
16
- margin: auto !important;
17
- }
18
-
19
- .custom-container h1,
20
- .custom-container h2,
21
- .custom-container h3,
22
- .custom-container strong,
23
- .custom-container b,
24
- .custom-container p,
25
- .custom-container li,
26
- .custom-container ol,
27
- .custom-container ul,
28
- .custom-container span {
29
- color: #222222 !important;
30
- font-weight: normal !important;
31
- }
32
-
33
- .custom-container h1,
34
- .custom-container h2 {
35
- font-weight: bold !important;
36
- }
37
-
38
- .custom-container img {
39
- max-width: 100%;
40
- border: 1px solid #ccc;
41
- padding: 5px;
42
- background: #fff;
43
- }
44
-
45
- .custom-container code {
46
- background: none !important;
47
- color: #222 !important;
48
- font-family: inherit !important;
49
- font-size: inherit !important;
50
- padding: 0 !important;
51
- border-radius: 0 !important;
52
- }
53
-
54
-
55
- .custom-container .highlight {
56
- background: #ffffcc;
57
- padding: 4px 8px;
58
- border-left: 4px solid #ffcc00;
59
- margin: 1rem 0;
60
- color: #333 !important;
61
- }
62
- </style>
63
- </head>
64
-
65
- <body>
66
- <div class="custom-container">
67
-
68
- <h1>mtDNA Location Classifier – Brief System Pipeline and Usage Guide</h1>
69
-
70
- <p>The <strong>mtDNA Tool</strong> is a lightweight pipeline designed to help researchers extract metadata such as geographic origin, sample type (ancient/modern), and optional niche labels (e.g., ethnicity, specific location) from mtDNA GenBank accession numbers. It supports batch input and produces structured Excel summaries.</p>
71
-
72
- <h2>System Overview Diagram</h2>
73
- <p>The figure below shows the core execution flow—from input accession to final output.</p>
74
- <img src="https://huggingface.co/spaces/VyLala/mtDNALocation/resolve/main/flowchart.png" alt="mtDNA Pipeline Flowchart">
75
-
76
-
77
- <h2>Key Steps</h2>
78
- <ol>
79
- <li><strong>Input</strong>: One or more GenBank accession numbers are submitted (e.g., via UI, CSV, or text).</li>
80
-
81
- <li><strong>Metadata Collection</strong>: Using <code>fetch_ncbi_metadata</code>, the pipeline retrieves metadata like country, isolate, collection date, and reference title. If available, supplementary material and full-text articles are parsed using DOI, PubMed, or Google Custom Search.</li>
82
-
83
- <li><strong>Text Extraction & Preprocessing</strong>:
84
- <ul>
85
- <li>All available documents are parsed and cleaned (tables, paragraphs, overlapping sections).</li>
86
- <li>Text is merged into two formats: a smaller <code>chunk</code> and a full <code>all_output</code>.</li>
87
- </ul>
88
- </li>
89
-
90
- <li><strong>LLM-based Inference (Gemini + RAG)</strong>:
91
- <ul>
92
- <li>Chunks are embedded with FAISS and stored for reuse.</li>
93
- <li>The Gemini model answers specific queries like predicted country, sample type, and any niche label requested by the user.</li>
94
- </ul>
95
- </li>
96
-
97
- <li><strong>Result Structuring</strong>:
98
- <ul>
99
- <li>Each output includes predicted fields + explanation text (methods used, quotes, sources).</li>
100
- <li>Summarized and saved using <code>save_to_excel</code>.</li>
101
- </ul>
102
- </li>
103
- </ol>
104
-
105
- <h2>Output Format</h2>
106
- <p>The final output is an Excel file with the following fields:</p>
107
- <ul>
108
- <li><code>Sample ID</code></li>
109
- <li><code>Predicted Country</code> and <code>Country Explanation</code></li>
110
- <li><code>Predicted Sample Type</code> and <code>Sample Type Explanation</code></li>
111
- <li><code>Sources</code> (links to articles)</li>
112
- <li><code>Time Cost</code></li>
113
- </ul>
114
-
115
- <h2>System Highlights</h2>
116
- <ul>
117
- <li>RAG + Gemini integration for improved explanation and transparency</li>
118
- <li>Excel export for structured research use</li>
119
- <li>Optional ethnic/location/language inference using isolate names</li>
120
- <li>Quality check (e.g., fallback on short explanations, low token count)</li>
121
- <li>Report Button – After results are displayed, users can submit errors or mismatches using the report text box below the output table</li>
122
- </ul>
123
-
124
- <h2>Citation</h2>
125
- <div class="highlight">
126
- Phung, V. (2025). mtDNA Location Classifier. HuggingFace Spaces. https://huggingface.co/spaces/VyLala/mtDNALocation
127
- </div>
128
-
129
- <h2>Contact</h2>
130
- <p>If you are a researcher working with historical mtDNA data or edge-case accessions and need scalable inference or logging, reach out through the HuggingFace space or email provided in the repo README.</p>
131
-
132
- </div>
133
- </body>
134
- </html>
135
-
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>BioMetadataAudit – System Overview</title>
7
+
8
+ <style>
9
+ .custom-container {
10
+ background-color: #ffffff !important;
11
+ color: #222222 !important;
12
+ font-family: Arial, sans-serif !important;
13
+ line-height: 1.6 !important;
14
+ padding: 2rem !important;
15
+ max-width: 900px !important;
16
+ margin: auto !important;
17
+ }
18
+
19
+ .custom-container h1,
20
+ .custom-container h2,
21
+ .custom-container h3,
22
+ .custom-container strong,
23
+ .custom-container b,
24
+ .custom-container p,
25
+ .custom-container li,
26
+ .custom-container ol,
27
+ .custom-container ul,
28
+ .custom-container span {
29
+ color: #222222 !important;
30
+ font-weight: normal !important;
31
+ }
32
+
33
+ .custom-container h1,
34
+ .custom-container h2 {
35
+ font-weight: bold !important;
36
+ }
37
+
38
+ .custom-container img {
39
+ max-width: 100%;
40
+ border: 1px solid #ccc;
41
+ padding: 5px;
42
+ background: #fff;
43
+ }
44
+
45
+ .custom-container code {
46
+ background: none !important;
47
+ color: #222 !important;
48
+ font-family: inherit !important;
49
+ font-size: inherit !important;
50
+ padding: 0 !important;
51
+ border-radius: 0 !important;
52
+ }
53
+
54
+
55
+ .custom-container .highlight {
56
+ background: #ffffcc;
57
+ padding: 4px 8px;
58
+ border-left: 4px solid #ffcc00;
59
+ margin: 1rem 0;
60
+ color: #333 !important;
61
+ }
62
+ </style>
63
+ </head>
64
+
65
+ <body>
66
+ <div class="custom-container">
67
+
68
+ <h1>Bio Metadata AUdit – Brief System Pipeline and Usage Guide</h1>
69
+
70
+ <p>The <strong>BioMetadataAudit Tool</strong> is a lightweight pipeline designed to help researchers extract metadata such as geographic origin, sample type (ancient/modern), and optional niche labels (e.g., ethnicity, specific location) from GenBank accession numbers. It supports batch input and produces structured Excel summaries.</p>
71
+
72
+ <h2>System Overview Diagram</h2>
73
+ <p>The figure below shows the core execution flow—from input accession to final output.</p>
74
+ <img src="https://huggingface.co/spaces/VyLala/mtDNALocation/resolve/main/flowchart.png" alt="BioMetadataAuditPipeline Flowchart">
75
+
76
+
77
+ <h2>Key Steps</h2>
78
+ <ol>
79
+ <li><strong>Input</strong>: One or more GenBank accession numbers are submitted (e.g., via UI, CSV, or text).</li>
80
+
81
+ <li><strong>Metadata Collection</strong>: Using <code>fetch_ncbi_metadata</code>, the pipeline retrieves metadata like country, isolate, collection date, and reference title. If available, supplementary material and full-text articles are parsed using DOI, PubMed, or Google Custom Search.</li>
82
+
83
+ <li><strong>Text Extraction & Preprocessing</strong>:
84
+ <ul>
85
+ <li>All available documents are parsed and cleaned (tables, paragraphs, overlapping sections).</li>
86
+ <li>Text is merged into two formats: a smaller <code>chunk</code> and a full <code>all_output</code>.</li>
87
+ </ul>
88
+ </li>
89
+
90
+ <li><strong>LLM-based Inference (Gemini + RAG)</strong>:
91
+ <ul>
92
+ <li>Chunks are embedded with FAISS and stored for reuse.</li>
93
+ <li>The Gemini model answers specific queries like predicted country, sample type, and any niche label requested by the user.</li>
94
+ </ul>
95
+ </li>
96
+
97
+ <li><strong>Result Structuring</strong>:
98
+ <ul>
99
+ <li>Each output includes predicted fields + explanation text (methods used, quotes, sources).</li>
100
+ <li>Summarized and saved using <code>save_to_excel</code>.</li>
101
+ </ul>
102
+ </li>
103
+ </ol>
104
+
105
+ <h2>Output Format</h2>
106
+ <p>The final output is an Excel file with the following fields:</p>
107
+ <ul>
108
+ <li><code>Sample ID</code></li>
109
+ <li><code>Predicted Country</code> and <code>Country Explanation</code></li>
110
+ <li><code>Predicted Sample Type</code> and <code>Sample Type Explanation</code></li>
111
+ <li><code>Sources</code> (links to articles)</li>
112
+ <li><code>Time Cost</code></li>
113
+ </ul>
114
+
115
+ <h2>System Highlights</h2>
116
+ <ul>
117
+ <li>RAG + Gemini integration for improved explanation and transparency</li>
118
+ <li>Excel export for structured research use</li>
119
+ <li>Optional ethnic/location/language inference using isolate names</li>
120
+ <li>Quality check (e.g., fallback on short explanations, low token count)</li>
121
+ <li>Report Button – After results are displayed, users can submit errors or mismatches using the report text box below the output table</li>
122
+ </ul>
123
+
124
+ <h2>Citation</h2>
125
+ <div class="highlight">
126
+ Phung, V. (2025). BioMetadataAudit. HuggingFace Spaces. https://huggingface.co/spaces/VyLala/mtDNALocation
127
+ </div>
128
+
129
+ <h2>Contact</h2>
130
+ <p>If you are a researcher working with genomic data or edge-case accessions and need scalable inference or logging, reach out through the HuggingFace space or email provided in the repo README.</p>
131
+
132
+ </div>
133
+ </body>
134
+ </html>
135
+