--- title: ClinFly emoji: ✈️ sdk_version: 1.25.0 sdk: streamlit app_file: clinfly_app_st.py pinned: true --- # ClinFly ![](img/clinfly_logo.png) Contact : [kevin.yauy@chu-montpellier.fr](mailto:kevin.yauy@chu-montpellier.fr) ## Introduction ClinFly is an automated framework designed to facilitate precision medicine (PM) for rare diseases. It addresses the challenge of precision phenotyping and data sharing across different languages. ClinFly can anonymize, translate, and summarize clinical reports using Human Phenotype Ontology (HPO) terms, ensuring compliance with medical data privacy standards. The output includes a de-identified translated clinical report and a summary report in HPO format. By streamlining the translation and anonymization of clinical reports, ClinFly aims to enhance inter-hospital data sharing, expedite medical discoveries, and pave the way for an international patient file accessible to non-English speakers. ## Pipeline ![](img/pipeline.png) ## Installation To install ClinFly on your local machine, you need the `poetry` package manager. Navigate to the project folder and run: ``` poetry install ``` You'll need [`poppler`](https://pdf2image.readthedocs.io/en/latest/installation.html) and [`tesseract-ocr`](https://tesseract-ocr.github.io/tessdoc/Installation.html) installed in your os to use OCR on pdf file. If you need to generate a `requirements.txt` file, use the following command: ``` poetry export --without-hashes --format=requirements.txt > requirements.txt ``` ## Usage ### Graphical User Interface For single report usage with interactive analysis, ClinFly provides a web application accessible at https://huggingface.co/spaces/kyauy/ClinFly. To run the Streamlit application on your local computer, activate the poetry shell and run the `clinfly_app_st.py` file: ``` poetry shell streamlit run clinfly_app_st.py ``` ### Command Line Interface For processing multiple reports with offline options, use the command line interface provided by `clinfly_app_cli.py`. The input should be a TSV .tsv file structured as follows (see `data/test.tsv` for an example): ``` Report_id_1 Doe John Report text ... Report_id_X Doe John Report text ``` You can also put a symlink to a pdf file (see `data/test_pdf.tsv` for an example) ``` Report_id_X Doe John Example.pdf ``` Outputs will be placed in the `results` folder according to the file extension, using first three columns in filename. - The deidentify report will be generated and placed in the `results/Reports` folder. - Three HPO extraction outputs will be generated in `TSV`, `TXT` and `JSON` folders. To run the CLI application on your local computer : ``` poetry shell clinfly_app_cli.py --file --language --model_dir --result_dir ``` ## Acknoledgments and reference ClinFly is a tool developed by University Hospital of Montpellier and Hospices Civils de Lyon. Thanks to the developers, especially Enody Gernet for his contribution. If you use ClinFly, please cite: > Gauthier et al., Assessing feasibility and risk to translate, de-identify and summarize medical letters using deep learning. medrXiv (2023). [https://doi.org/10.1101/2023.07.27.23293234](https://doi.org/10.1101/2023.07.27.23293234)