TigerGraph-Hack / evaluation_set.json
Meshyboi's picture
Upload 27 files
90645a4 verified
Raw
History Blame Contribute Delete
11.2 kB
[
{
"work_id": "W2980282514",
"type": "factual",
"question": "What two factors have driven recent progress in NLP according to the HuggingFace Transformers paper, and what is the library's stated goal?",
"correct_answer": "Two factors have driven recent progress in NLP: advances in model architecture and model pretraining. The HuggingFace Transformers library's stated goal is to support industrial-strength implementations of popular model variants that are easy to read, extend, and deploy, and to provide a centralized model hub for the distribution and usage of a wide variety of pretrained models."
},
{
"work_id": "W2529996553",
"type": "factual",
"question": "What three coupled functions does the neural network construct in the automatic chemical design paper and what does each one do?",
"correct_answer": "The neural network constructs three coupled functions: an encoder, a decoder, and a predictor. The encoder converts the discrete representation of a molecule into a real-valued continuous vector (latent space), the decoder converts these continuous vectors back to discrete molecular representations, and the property predictor estimates chemical properties from the latent continuous vector representation of the molecule."
},
{
"work_id": "W2766856748",
"type": "factual",
"question": "In the 2018 CGCNN paper by Xie and Grossman, what specific limitation of prior machine learning methods for crystal property prediction required manually constructed feature vectors, and how does their crystal graph framework address it?",
"correct_answer": "Prior machine learning methods required manually constructed feature vectors or complex transformations of atom coordinates to input the crystal structure, which either constrained the model to certain crystal types or made chemical interpretation difficult. The crystal graph framework addresses this by directly learning material properties from the connection of atoms in the crystal, providing a universal and interpretable representation across different crystal types and chemistries."
},
{
"work_id": "W2887306621",
"type": "factual",
"question": "In the 2018 review by Wu and Sun on machine learning in materials science, what four interdisciplinary fields are combined, and what is the primary advantage of ML methods over traditional theoretical simulations?",
"correct_answer": "Machine learning in materials science is an interdisciplinary field combining computer science, statistics, computational mathematics, and engineering. Its primary advantage is providing faster calculation speeds and higher prediction accuracy compared to traditional theoretical simulations that rely on solving fundamental physical or chemical equations."
},
{
"work_id": "W3154258817",
"type": "factual",
"question": "What does ADMET stand for, why are these properties the primary reason for drug development failure, and what does ADMETlab 2.0 provide to address this?",
"correct_answer": "ADMET stands for Absorption, Distribution, Metabolism, Excretion, and Toxicity. These properties are the primary reason for drug development failure because undesirable pharmacokinetics and toxicity of candidate compounds are the main reasons for failure. ADMETlab 2.0 provides an integrated online platform for accurate and comprehensive predictions of 17 physicochemical properties, 13 medicinal chemistry properties, 23 ADME properties, 27 toxicity endpoints, and 8 toxicophore rules."
},
{
"work_id": "W2742127985",
"type": "factual",
"question": "What makes the Deep Potential Molecular Dynamics (DPMD) method first-principles based, and what three natural symmetries does the neural network model explicitly preserve?",
"correct_answer": "The Deep Potential Molecular Dynamics (DPMD) method is first-principles based because there are no ad hoc components aside from the network model itself. The neural network model explicitly preserves the translational, rotational, and permutational symmetries of the atomic system."
},
{
"work_id": "W2968923792",
"type": "factual",
"question": "What collection of statistical methods is described as one of the most exciting new tools in the material science toolbox, and what two types of research has it proved capable of speeding up?",
"correct_answer": "Machine learning, described as a collection of statistical methods, is identified as one of the most exciting new tools in the material science toolbox. It has proved capable of considerably speeding up both fundamental and applied research in the field."
},
{
"work_id": "W3207969687",
"type": "factual",
"question": "According to the 2021 review on de novo drug design by Bai et al., what are the three main types of deep learning architectures used for molecule generation, and how do they benefit molecular dynamics simulations?",
"correct_answer": "The three main deep learning architectures for molecule generation are Recurrent Neural Networks (RNNs), Variational Autoencoders (VAEs), and Generative Adversarial Networks (GANs). In molecular dynamics (MD), deep learning models provide high-accuracy potential energy surfaces and force fields at a computational cost similar to traditional force fields, improving simulation efficiency."
},
{
"work_id": "W2966357564",
"type": "factual",
"question": "What two classes of neural network models have yielded promising results for molecular property prediction, and what does each class use as its input representation?",
"correct_answer": "Two classes of neural network models have yielded promising results for molecular property prediction: 1. neural networks applied to computed molecular fingerprints or expert-crafted descriptors, 2. graph convolutional neural networks that construct a learned molecular representation by operating on the graph structure of the molecule. The first class uses fingerprints or descriptors as input, while the second class uses the graph structure (atoms and bonds)."
},
{
"work_id": "W2900489253",
"type": "factual",
"question": "What is the key architectural feature of CheMixNet that allows it to predict multiple chemical properties simultaneously with high accuracy?",
"correct_answer": "The key architectural feature of CheMixNet is its ability to learn from a mixture of features learned from two different input representations: a vector input (molecular fingerprints) and a sequence input (SMILES strings), using a variation of multi-input-single-output (MISO) architectures to achieve high accuracy."
},
{
"work_ids": ["W2766856748", "W2742127985"],
"type": "multi_hop",
"question": "How does the CGCNN framework represent crystal structures, and what specific physical symmetries do both CGCNN and Deep Potential Molecular Dynamics (DPMD) explicitly preserve?",
"correct_answer": "CGCNN represents crystals as crystal graphs where nodes are atoms and edges are bonds, using atom/bond feature vectors. Both CGCNN and DPMD explicitly preserve translational, rotational, and permutational symmetries of the atomic system to ensure physical consistency and invariance."
},
{
"work_ids": ["W3154258817", "W3207969687"],
"type": "multi_hop",
"question": "The ADMETlab 2.0 paper and the de novo drug design review both describe ML applications to drug development. At what specific stage does ADMETlab 2.0 intervene, and how does this complement generative design described in the review?",
"correct_answer": "ADMETlab 2.0 intervenes at the stage of evaluating and optimizing absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties in lead compounds. This complements generative design (using RNNs, VAEs, GANs) by providing a crucial evaluation step where generated molecules are assessed for pharmacokinetic and toxicity properties, increasing the likelihood of identifying successful drug candidates."
},
{
"work_ids": ["W2887306621", "W2968923792"],
"type": "multi_hop",
"question": "How does the application of ML in fluid dynamics surrogate modeling relate to the broader acceleration of research described in the materials science review?",
"correct_answer": "Machine learning in fluid dynamics surrogate modeling accelerates research by supporting design exploration and optimization of material properties, such as mechanical properties of alloy materials and energy-related applications. By utilizing multifidelity models, physics-based constraints, and surrogate modeling techniques, ML reduces computational costs and improves simulation accuracy, bridging the gap between fundamental research and applied engineering."
},
{
"work_ids": ["W2900489253", "W2968923792"],
"type": "multi_hop",
"question": "The CheMixNet paper and the materials science review both discuss property prediction. How does CheMixNet's mixed architecture address the general need for better 'statistical tools' in the materials science toolbox?",
"correct_answer": "CheMixNet's mixed architecture addresses the need for advanced statistical tools by combining sequence (SMILES) and fingerprint representations using RNNs and 1-D CNNs. This hybrid approach generalizes across diverse datasets like the Harvard Clean Energy Project and MoleculeNet, outperforming single-representation models (like MLP, CNN, or RNN alone) and providing a robust framework for predicting multiple chemical properties simultaneously."
},
{
"work_ids": ["W3207969687", "W3154258817"],
"type": "multi_hop",
"question": "How does the challenge of screening de novo designed molecules relate to the integrated platform provided by ADMETlab 2.0?",
"correct_answer": "ADMETlab 2.0 provides an integrated platform to screen the vast chemical space generated by de novo design. It offers high-throughput evaluation of over 50 endpoints, including physicochemical properties, medicinal chemistry rules, ADME properties, and toxicity. Using a multi-task graph attention framework and batch computation, it facilitates the rapid prioritization of candidates, significantly reducing the downstream synthesis workload."
},
{
"work_ids": ["W2980282514", "W2887306621"],
"type": "multi_hop",
"question": "The HuggingFace Transformers paper describes a unified API for pretrained models. How could this unified approach benefit the development of surrogate models for fluid dynamics?",
"correct_answer": "A unified API similar to HuggingFace Transformers could revolutionize fluid dynamics by establishing a centralized model hub for distributing and usage of pretrained surrogate models. This would allow users to compare model variants using the same minimal API and facilitate the use of transfer learning, where models pretrained on proxy properties (using large datasets) are repurposed for target tasks with limited data. Integrating these attention-based models with traditional CFD methods (like finite element analysis) enables hybrid modeling for complex industrial systems such as turbines, pumps, and pipelines, while providing industrial-strength, extensible implementations."
}
]