Spaces:

taijichat
/

chat

Running

App Files Files Community

chat / tools /excel_data_documentation.md

WeMWish

Fix infinite loop bug in literature search system

557ed35 7 months ago

preview code

raw

history blame contribute delete

13.7 kB

Documented Excel Files

This file lists the Excel files that have been analyzed and documented.

./www/multi-omicsdata.xlsx
./www/networkanalysis/comp_log2FC_RegulatedData_TRMTEXterm.xlsx comp_log2FC_RegulatedData_TRMTEXterm.xlsx tabulates log₂ fold-change values for 17,483 genes (rows) across 198 transcription factors (columns) in the TRM→TexTerm regulated-data comparison. The first column ("Unnamed: 0") lists each gene's identifier (e.g. "0610005C13RIK"); each subsequent column is named by a TF (Ahr, Arid3a, Arnt, …, Zscan20) and contains the corresponding log₂ fold-change value.

For instance, a value of 19.615925 in row 0610009B22RIK under Arnt indicates that gene 0610009B22RIK exhibited a log₂ fold-change of 19.615925 in the Arnt-associated regulated data when comparing TRM to TexTerm.

./www/old files/log2FC_RegulatedData_TRMTEXterm.xlsx
./www/tablePagerank/MP.xlsx MP.xlsx tabulates performance scores for 57 transcription factors ("TF") across 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention , where:

Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.

TrainingDataset is the source dataset used to train the model (e.g., Mackay, Chung, Scott, etc.).

EvaluationDataset is the dataset on which performance was assessed.

Each cell contains the resulting floating-point score for that TF under the specified method and dataset pairing.

For example, a cell value of 0.72 in row GATA1 under column MP_Mackay_Chung means that the MP scoring method—trained on the Mackay dataset—achieved a performance score of 0.72 when evaluated on the Chung dataset.

./www/tablePagerank/Naive.xlsx Naive.xlsx tabulates performance scores for 31 transcription factors ("TF") across the same 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention , where:

Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.

TrainingDataset is the source dataset used to train the model (e.g., Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson, etc.).

EvaluationDataset is the dataset on which performance was assessed.

Each cell contains the resulting floating-point score for that TF under the specified method and dataset pairing.

For example, in row Tcf7 under column Naive_Kaech_Chung, the value 1.626392 indicates that the Naive scoring method—trained on the Kaech dataset—achieved a performance score of 1.626392 when evaluated on the Chung dataset.

./www/tablePagerank/Table_TF PageRank Scores for Audrey.xlsx Table_TF PageRank Scores for Audrey.xlsx tabulates PageRank‐derived scores for 308 transcription factors (“TF”) across the same 42 method–dataset combinations, with two additional annotation columns:

TF (first column): Transcription factor name.

Category: Broad TF class (e.g. “Universal TFs,” “Lineage-specific TFs,” etc.).

Cell-state specificity: Whether the TF is “Universal,” “Pluripotent,” “Myeloid,” etc.

Each of the remaining 42 columns follows the convention , where:

Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.

TrainingDataset is the dataset used to fit the PageRank model (e.g. Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).

EvaluationDataset is the dataset on which the PageRank scores were assessed.

Each cell holds the floating-point PageRank score for that TF under the specified method and dataset pairing.

For example, a value of 1.003938 in row Elf1 under column Naive_Kaech_Kaech indicates that the Naive PageRank model—trained and evaluated on the Kaech dataset—assigned Elf1 a score of 1.003938.

./www/tablePagerank/TCM.xlsx TCM.xlsx tabulates performance scores for 28 transcription factors (“TF”) across 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention , where:

Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.

TrainingDataset is the source dataset used to train the model (e.g. Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).

EvaluationDataset is the dataset on which performance was assessed (e.g. Chung, Mackay, Scott, etc.).

Each cell holds the resulting floating-point metric for that TF under the specified method and dataset pairing.

For example, a value of 0.837792 in row Msgn1 under column TCM_Mackay_Chung indicates that the TCM scoring method—trained on the Mackay dataset—achieved a performance score of 0.837792 when evaluated on the Chung dataset.

./www/tablePagerank/TE.xlsx TE.xlsx tabulates performance scores for 33 transcription factors (“TF”) across the same 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention , where:

Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.

TrainingDataset is the source dataset used to train the model (e.g., Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).

EvaluationDataset is the dataset on which performance was assessed (e.g., Chung, Scott, Mackay, etc.).

Each cell contains the resulting floating-point metric for that TF under the specified method and dataset pairing.

For example, if you see 0.65 in row Myod1 under column TE_Mackay_Chung, it means that the TE method—trained on the Mackay dataset—achieved a performance score of 0.65 when evaluated on the Chung dataset.

./www/tablePagerank/TEM.xlsx TEM.xlsx tabulates performance scores for 25 transcription factors (“TF”) across 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention , where:

Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.

TrainingDataset is the source dataset used to train the model (e.g., Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).

EvaluationDataset is the dataset on which performance was assessed (e.g., Chung, Mackay, Scott, etc.).

Each cell contains the resulting floating-point metric for that TF under the specified method and dataset pairing.

For example, a value of 1.6696786566 in row Foxc2 under column TEM_Mackay_Chung means that the TEM scoring method—trained on the Mackay dataset—achieved a performance score of 1.6696786566 when evaluated on the Chung dataset.

./www/tablePagerank/TEXeff.xlsx TEXeff.xlsx tabulates performance scores for 62 transcription factors (“TF”) across 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention , where:

Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.

TrainingDataset is the source dataset used to train the model (e.g. Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).

EvaluationDataset is the dataset on which performance was assessed.

Each cell contains the resulting floating‐point metric for that TF under the specified method and dataset pairing.

For example, a value of 0.647 in row Vax2 under column TexTerm_Hudson_Beltra means that the TexTerm scoring method—trained on the Hudson dataset—achieved a performance score of 0.647 when evaluated on the Beltra dataset.

./www/tablePagerank/TEXprog.xlsx TEXprog.xlsx tabulates performance scores for 63 transcription factors (“TF”) across 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention , where:

Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.

TrainingDataset is the source dataset used to train the model (e.g. Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).

EvaluationDataset is the dataset on which performance was assessed (e.g. Chung, Mackay, Scott, etc.).

Each cell holds the resulting floating-point metric for that TF under the specified method and dataset pairing.

For example, a value of 1.5403 in row Irf9 under column TexProg_Beltra_Chung means that the TexProg scoring method—trained on the Beltra dataset—achieved a performance score of 1.5403 when evaluated on the Chung dataset.

./www/tablePagerank/TEXterm.xlsx TEXterm.xlsx tabulates performance scores for 51 transcription factors (“TF”) across 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention , where:

Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.

TrainingDataset is the dataset used to fit the model (e.g. Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).

EvaluationDataset is the dataset on which performance was assessed.

Each cell holds the floating-point metric for that TF under the specified method and dataset pairing.

For example, a value of 0.912 in row Sox2 under column TexTerm_Scott_Mackay means that the TexTerm method—trained on the Scott dataset—achieved a performance score of 0.912 when evaluated on the Mackay dataset.

./www/tablePagerank/TRM.xlsx TRM.xlsx tabulates performance scores for 43 transcription factors (“TF”) across the same 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention , where:

Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.

TrainingDataset is the dataset used to train the model (e.g., Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).

EvaluationDataset is the dataset on which performance was assessed (e.g., Chung, Mackay, Scott, etc.).

Each cell contains the resulting floating-point metric for that TF under the specified method and dataset pairing.

For example, a value of 0.91 in row PU.1 under column TRM.IEL_Chung_Mackay means that the TRM.IEL scoring method—trained on the Chung dataset—achieved a performance score of 0.91 when evaluated on the Mackay dataset.

./www/tfcommunities/texcommunities.xlsx texcommunities.xlsx is a multi-sheet workbook (12 sheets) that organizes transcription factors into network "communities" for two models—TEX and TRM:

TEX Communities: A summary sheet with two columns—C (community ID, e.g. C1–C5) and TF Members (a comma-separated list of all TFs in that community).

TEX_c1 through TEX_c5: One sheet per TEX community, each listing a single TF column of member factors.

TRM Communities: A parallel summary sheet for the TRM model, also with C and TF Members columns.

TRM_c1 through TRM_c5: Individual sheets listing TFs for each TRM community.

Each community groups TFs based on network topology under the respective model. For example, in the TEX Communities sheet, community C1 includes the following TF members: Usf1, Arnt, Mlx, Srebf1, Arntl, Tfe3, Heyl, Bhlhe40, …, indicating that these factors cluster together in the TEX network.

./www/tfcommunities/trmcommunities.xlsx trmcommunities.xlsx is a multi‐sheet workbook (6 sheets) that defines transcription factor communities for the TRM network model:

TRM Communities: A summary sheet with two columns—C (community ID, C1–C5) and TF Members (a comma‐separated list of all TFs in that community).

TRM_c1 through TRM_c5: Each sheet lists a single TF column naming the factors that belong to that community.

These communities reflect clusters of TFs based on network topology under the TRM model. For example, in the TRM Communities sheet, community C1 might include TFs such as PU.1, Runx3, and Irf4, indicating that these factors form a tightly connected module in the TRM network.

./www/TFcorintextrm/TF-TFcorTRMTEX.xlsx TF-TFcorTRMTEX.xlsx contains pairwise correlation matrices of transcription factor scores for both the TRM and TEX models. It has two sheets:

TRM: A square matrix where both rows and columns list the same set of TFs; each cell at the intersection of TF A (row) and TF B (column) gives the Pearson correlation coefficient between their TRM PageRank (or performance) scores across all dataset contexts.

TEX: The analogous matrix for the TEX model.

For example, on the TRM sheet, the value 0.82 at row PU.1 and column Runx3 indicates that PU.1 and Runx3 have a correlation of 0.82 in their TRM-derived scores.

./www/waveanalysis/searchtfwaves.xlsx