| <!DOCTYPE html> |
| <html style="font-size: 16px;" lang="en"><head> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> |
| <meta charset="utf-8"> |
| <meta name="keywords" content="🧬 mtDNA Location Classifier"> |
| <meta name="description" content=""> |
| <title>About</title> |
| <link rel="stylesheet" href="{{ url_for('static', filename='nicepage.css') }}" media="screen"> |
| <link rel="stylesheet" href="{{ url_for('static', filename='About.css') }}" media="screen"> |
| |
| <link id="u-page-google-font" rel="stylesheet" href="https://fonts.googleapis.com/css2?display=swap&family=Roboto:ital,wght@0,100;0,200;0,300;0,400;0,500;0,600;0,700;0,800;0,900;1,100;1,200;1,300;1,400;1,500;1,600;1,700;1,800;1,900&family=Open+Sans:ital,wght@0,300;0,400;0,500;0,600;0,700;0,800;1,300;1,400;1,500;1,600;1,700;1,800&family=Roboto+Slab:wght@100;200;300;400;500;600;700;800;900"> |
| <meta name="theme-color" content="#478ac9"> |
| <meta property="og:title" content="About"> |
| <meta property="og:type" content="website"> |
| </head> |
| <body data-path-to-root="./" data-include-products="false" class="u-body u-xl-mode" data-lang="en"> |
| <header class="u-clearfix u-custom-color-1 u-header u-header" id="header"> |
| <div class="u-clearfix u-sheet u-sheet-1"> |
| <p class="u-align-center u-custom-font u-font-roboto-slab u-text u-text-1">{% if isvip %}Premium{% else %}Freemium{% endif %}</p> |
| <h1 class="u-custom-font u-font-roboto-slab u-text u-text-color-var u-title u-text-2" spellcheck="false"> 🧬 mtDNA Location Classifier</h1> |
| <nav class="u-menu u-menu-one-level u-menu-1" role="navigation"> |
| <div class="u-custom-menu u-nav-container"> |
| <ul role="menubar" class="u-nav u-unstyled"> |
| <li role="none" class="u-nav-item"><a tabindex="-1" role="menuitem" class="u-button-style u-nav-link" href="{{ url_for('home') }}" aria-haspopup="true">Home</a></li> |
| <li role="none" class="u-nav-item"><a role="menuitem" class="u-button-style u-nav-link" href="{{ url_for('about') }}">About</a></li> |
| <li role="none" class="u-nav-item"><a role="menuitem" class="u-button-style u-nav-link" href="{{ url_for('pricing') }}">Pricing</a></li> |
| <li role="none" class="u-nav-item"><a role="menuitem" class="u-button-style u-nav-link" href="{{ url_for('contact') }}">Contact</a></li> |
| </ul> |
| </div> |
| </nav> |
| </div> |
| </header> |
| |
| |
|
|
| <section class="u-clearfix u-section-1" id="block-1"> |
| <div class="u-clearfix u-sheet u-sheet-1"> |
| <div class="custom-container"> |
|
|
| <h3>Brief System Pipeline and Usage Guide</h3> |
|
|
| <p>The <strong>mtDNA Tool</strong> is a lightweight pipeline designed to help researchers extract metadata such as geographic origin, sample type (ancient/modern), and optional niche labels (e.g., ethnicity, specific location) from mtDNA GenBank accession numbers. It supports batch input and produces structured Excel summaries.</p> |
|
|
| <h3>System Overview Diagram</h3> |
| <p>The figure below shows the core execution flow—from input accession to final output.</p> |
| <div class="u-clearfix u-sheet u-valign-middle-xl u-valign-top-lg u-valign-top-md u-valign-top-sm u-valign-top-xs u-sheet-1"> |
| <img |
| class="u-image u-image-default u-image-1" |
| src="https://huggingface.co/spaces/VyLala/mtDNALocation/resolve/main/flowchart.png" |
| alt="mtDNA Pipeline Flowchart"> |
| </div> |
| <h3>Key Steps</h3> |
| <ol> |
| <li><strong>Input</strong>: One or more GenBank accession numbers are submitted (e.g., via UI, CSV, or text).</li> |
|
|
| <li><strong>Metadata Collection</strong>: Using <code>fetch_ncbi_metadata</code>, the pipeline retrieves metadata like country, isolate, collection date, and reference title. If available, supplementary material and full-text articles are parsed using DOI, PubMed, or Google Custom Search.</li> |
|
|
| <li><strong>Text Extraction & Preprocessing</strong>: |
| <ul> |
| <li>All available documents are parsed and cleaned (tables, paragraphs, overlapping sections).</li> |
| <li>Text is merged into two formats: a smaller <code>chunk</code> and a full <code>all_output</code>.</li> |
| </ul> |
| </li> |
|
|
| <li><strong>LLM-based Inference (Gemini + RAG)</strong>: |
| <ul> |
| <li>Chunks are embedded with FAISS and stored for reuse.</li> |
| <li>The Gemini model answers specific queries like predicted country, sample type, and any niche label requested by the user.</li> |
| </ul> |
| </li> |
|
|
| <li><strong>Result Structuring</strong>: |
| <ul> |
| <li>Each output includes predicted fields + explanation text (methods used, quotes, sources).</li> |
| <li>Summarized and saved using <code>save_to_excel</code>.</li> |
| </ul> |
| </li> |
| </ol> |
|
|
| <h3>Output Format</h3> |
| <p>The final output is an Excel file with the following fields:</p> |
| <ul> |
| <li><code>Sample ID</code></li> |
| <li><code>Predicted Country</code> and <code>Country Explanation</code></li> |
| <li><code>Predicted Sample Type</code> and <code>Sample Type Explanation</code></li> |
| <li><code>Sources</code> (links to articles)</li> |
| <li><code>Time Cost</code></li> |
| </ul> |
|
|
| <h3>System Highlights</h3> |
| <ul> |
| <li>RAG + Gemini integration for improved explanation and transparency</li> |
| <li>Excel export for structured research use</li> |
| <li>Optional ethnic/location/language inference using isolate names</li> |
| <li>Quality check (e.g., fallback on short explanations, low token count)</li> |
| <li>Report Button – After results are displayed, users can submit errors or mismatches using the report text box below the output table</li> |
| </ul> |
|
|
| <h3>Citation</h3> |
| <div class="highlight"> |
| Phung, V. (2025). mtDNA Location Classifier. HuggingFace Spaces. https://huggingface.co/spaces/VyLala/mtDNALocation |
| </div> |
|
|
| <h3>Contact</h3> |
| <p>If you are a researcher working with historical mtDNA data or edge-case accessions and need scalable inference or logging, reach out through the HuggingFace space or email provided in the repo README.</p> |
|
|
| </div> |
| </div> |
| </section> |
| |
| |
| |
| <footer class="u-align-center u-clearfix u-color-var u-container-align-center u-footer u-valign-top-lg u-valign-top-md u-valign-top-sm u-valign-top-xs u-footer" id="footer"><div class="u-clearfix u-sheet u-sheet-1"> |
| <p class="u-align-right u-custom-font u-font-roboto-slab u-small-text u-text u-text-variant u-text-white u-text-1">Contact<br> |
| </p> |
| </div> |
| </footer> |
| <section class="u-backlink u-clearfix u-grey-80"></section> |
|
|
| <script> |
| document.addEventListener('click', function (e) { |
| const a = e.target.closest('a[href]'); |
| if (!a) return; |
| |
| const href = a.getAttribute('href'); |
| |
| if (!href || /^(?:[a-z]+:)?\/\//i.test(href) || href.startsWith('#')) return; |
| |
| |
| const qs = window.location.search; |
| |
| const joined = href + (href.includes('?') ? '&' + qs.slice(1) : qs); |
| |
| e.preventDefault(); |
| window.location.href = joined; |
| }); |
| </script> |
| |
| </body></html> |