Alleinzellgaenger commited on
Commit
91a7ed6
·
1 Parent(s): aa7e9a7

Add patricia version

Browse files
backend/app.py CHANGED
@@ -39,7 +39,7 @@ def load_document(filename):
39
  # Load prompts at startup
40
  SYSTEM_PROMPT_TEMPLATE = load_prompt("system_prompt.txt")
41
  TRANSITION_PROMPT_TEMPLATE = load_prompt("transition_prompt.txt")
42
- DOCUMENT = load_document("lennart.txt")
43
  app = FastAPI()
44
 
45
  # Enable CORS
@@ -77,7 +77,7 @@ async def chat_endpoint(request: ChatRequest):
77
  current_chunk = request.currentChunk or request.chunk or "No specific chunk provided"
78
  next_chunk = request.nextChunk or ""
79
  action = request.action
80
- user_goal = request.user_goal or "Understanding GRPO (equation 3) and why does this make sense in contrast to PPO?"
81
 
82
  # Only include full document on first message or transitions to provide initial context
83
  include_document = len(request.messages) <= 1 or action in ['skip', 'understood']
 
39
  # Load prompts at startup
40
  SYSTEM_PROMPT_TEMPLATE = load_prompt("system_prompt.txt")
41
  TRANSITION_PROMPT_TEMPLATE = load_prompt("transition_prompt.txt")
42
+ DOCUMENT = load_document("patricia.txt")
43
  app = FastAPI()
44
 
45
  # Enable CORS
 
77
  current_chunk = request.currentChunk or request.chunk or "No specific chunk provided"
78
  next_chunk = request.nextChunk or ""
79
  action = request.action
80
+ user_goal = request.user_goal or "Understanding why these methods were used for the main question and how successful was the Apicomplexa treatment with herbicides?"
81
 
82
  # Only include full document on first message or transitions to provide initial context
83
  include_document = len(request.messages) <= 1 or action in ['skip', 'understood']
backend/documents/patricia.txt ADDED
@@ -0,0 +1,474 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+
3
+ {0}------------------------------------------------
4
+
5
+ # The effect of herbicides as novel antimalarial drugs on the transcriptome and proteome of Plasmodium falciparum
6
+
7
+ by
8
+
9
+ ## Janette Snyman
10
+
11
+ Submitted in the partial fulfilment of the requirements for the degree Magister Scientiae
12
+
13
+ In the Faculty of Natural and Agricultural Sciences Department of Biochemistry University of Pretoria Pretoria 0002 South Africa
14
+
15
+ SUPERVISOR: **Prof. Lyn-Marie Birkholtz** Department of Biochemistry, University of Pretoria, South Africa CO-SUPERVISOR: Prof. Abraham I Louw Department of Biochemistry, University of Pretoria, South Africa
16
+
17
+ {1}------------------------------------------------
18
+
19
+ ## SUMMARY
20
+
21
+ The Apicomplexan parasite, *P. falciparum*, is one of the causative agents of the morbidity and mortality in sub-Saharan Africa, especially children under 5 years of age and pregnant women (1). The parasite harbours a non-photosynthetic plastid believed to have been acquired from blue-green algae (2, 3). The presence of this apicoplast in the parasite and its connection to plants opens many doors for to the development of novel antimalarials not harmful to the human host.
22
+
23
+ In this study, a herbicide-derived compound (A51B1C1\_1) with structural similarities to 1,2-diacylglycerol (DAG) was tested against $P.$ falciparum. It was anticipated that this herbicide would target similar pathways of the malaria parasite as was shown for Arabidopsis. pathway is the synthesis of alvcerolipids. One such the Monogalactosyldiacylglycerol (MGDG) and digalactosyldiacylglycerol (DGDG) are the two most studied galactolipids. MGDG is synthesised by MGDG synthase and DGDG is synthesised by DGDG synthase from DAG.
24
+
25
+ Morphological studies after inhibition of *P. falciparum* parasites with A51B1C1\_1 confirmed that the compound does have an effect on the parasites. The determined $IC_{50}$ value, the drug-like properties conforming to Lipinski's rule of five and the specificity of the compound towards the parasite makes A51B1C1 1 a possible antimalarial compound. Transcriptomic data of A51B1C1 1 *P. falciparum* treated parasites revealed 1504 differentially affected transcripts, of which 579 transcripts were unique to this treatment. The differentially affected processes included apicoplastassociated metabolic pathways such as glycerolipid and glycerophospholipid These results thus indicated that enzymes involved in glycerolipid metabolism. synthesis, especially those responsible for the metabolism of DAG, are affected in $P$ . *falciparum* parasites treated with A51B1C1 1.
26
+
27
+ Proteome analysis indicated that similar processes as shown for the transcriptomic data were affected by the herbicide treatment. At the assay time-point, a total of 276 Plasmodial proteins were uniquely expressed in the A51B1C1 1 treated sample whereas 204 Plasmodial proteins were uniquely expressed in the control sample. Interestingly, the direction of the change in the abundance of these affected proteins did
28
+
29
+ {2}------------------------------------------------
30
+
31
+ not necessarily correlate with the change of abundance observed in the transcriptomic data, as seen numerous times before in other reported Plasmodial perturbations.
32
+
33
+ Global functional genomics aid in the confirmation that compound A51B1C1 1 does affect glycerolipid and glycerophospholipid metabolism in *P. falciparum* as seen in *Arabidopsis* after treatment with the parent compound Galvestine-1. Overall, this study demonstrated the importance of functional genomics in the investigation for potential antimalarial compounds and contributed in the progress of A51B1C1\_1 from an early hit to an early lead in the antimalarial drug discovery pipeline.
34
+
35
+ {3}------------------------------------------------
36
+
37
+ ## TABLE OF CONTENT
38
+
39
+ | Acknowledgements | i |
40
+ |---------------------------------------------------------------------------------------------------------------------------------------|-----|
41
+ | Summary | ii |
42
+ | Table of Contents | iv |
43
+ | List of Figures | vii |
44
+ | List of Tables | x |
45
+ | List of Boxes | xii |
46
+ | List of Equations | xii |
47
+ | List of Abbreviations | xii |
48
+ | 1 CHAPTER 1 Literature review | 1 |
49
+ | 1.1 History of malaria | 1 |
50
+ | 1.2 The health and economical risk of malaria | 2 |
51
+ | 1.3 Etiologic agents of malaria | 3 |
52
+ | 1.4 Malaria Control | 5 |
53
+ | 1.4.1 Vector control | 5 |
54
+ | 1.4.2 Vaccines | 6 |
55
+ | 1.4.3 Drugs and drug resistance | 6 |
56
+ | 1.4.4 New antimalarial drug targets | 7 |
57
+ | 1.5 Apicomplexa and the apicoplast | 11 |
58
+ | 1.5.1 Origin and structure of the apicoplast | 11 |
59
+ | 1.5.2 Division mechanism of the apicoplast | 14 |
60
+ | 1.5.3 Functions of the apicoplast | 16 |
61
+ | 1.5.3.1 Isoprenoid biosynthesis | 16 |
62
+ | 1.5.3.2 Haem biosynthesis | 17 |
63
+ | 1.5.3.3 Fatty acid biosynthesis | 18 |
64
+ | 1.5.3.4 Galactolipid synthesis | 18 |
65
+ | 1.5.4 Apicoplast as drug target in <i>P. falciparum</i> | 22 |
66
+ | 1.6 The compound A51B1C1_1 | 22 |
67
+ | 1.7 Research objective and aims | 26 |
68
+ | 2 CHAPTER 2 Morphological and transcriptomic analyses of the effect of a herbicide-derived compound on <i>P. falciparum</i> parasites | 28 |
69
+ | 2.1 Introduction | 28 |
70
+
71
+ {4}------------------------------------------------
72
+
73
+ | | 2.1.2 | Microarray experimental design and data analysis. | 32 | |
74
+ |---|---------|-------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|-----|
75
+ | | 2.2 | Materials and methods... | 36 | |
76
+ | | 2.2.1 | <i>In vitro</i> cultivation of asexual <i>P. falciparum</i> parasites | 36 | |
77
+ | | 2.2.1.1 | Preparation of the erythrocytes. | 36 | |
78
+ | | 2.2.1.2 | Thawing of <i>P. falciparum</i> parasites. | 36 | |
79
+ | | 2.2.1.3 | Synchronisation of <i>P. falciparum</i> cultures | 37 | |
80
+ | | 2.2.2 | IC50 determinations of A51B1C1_1.......... | 37 | |
81
+ | | 2.2.3 | Morphological monitoring of drug treated parasites | 38 | |
82
+ | | 2.2.4 | Drug treatment for transcriptome analysis | 38 | |
83
+ | | 2.2.5 | RNA isolation | 39 | |
84
+ | | 2.2.6 | RNA concentration and integrity determination | 40 | |
85
+ | | 2.2.7 | cDNA synthesis and clean up. | 40 | |
86
+ | | 2.2.8 | Cy dye labelling of the cDNA.. | 41 | |
87
+ | | 2.2.9 | Hybridisation, set up of slides, washing of slides and scanning | 42 | |
88
+ | | 2.2.10 | Slide design | 42 | |
89
+ | | 2.2.11 | Microarray Data Analysis | 43 | |
90
+ | | 2.2.12 | Inter-species annotation transfers using non-homology based clustering. | 45 | |
91
+ | | 2.2.13 | Quantitative qRT-PCR to validate microarray data | 45 | |
92
+ | | 2.3 | Results | 48 | |
93
+ | | 2.3.1 | IC50 determinations | 48 | |
94
+ | | 2.3.2 | Morphology studies. | 48 | |
95
+ | | 2.3.3 | RNA isolation | 52 | |
96
+ | | 2.3.4 | Microarray | 55 | |
97
+ | | 2.3.4.1 | Data analysis | 56 | |
98
+ | | 2.3.4.2 | Normalisation of data. | 57 | |
99
+ | | 2.3.4.3 | LIMMA data analysis | 62 | |
100
+ | | 2.3.4.4 | Biological processes in which the differentially expressed transcripts are<br>involved in.......... | 64 | |
101
+ | | 2.3.4.5 | Comparison of the A51B1C1_1 dataset with other <i>P. falciparum</i><br>perturbation data. | 66 | |
102
+ | | 2.3.4.6 | Comparison of the A51B1C1_1 datasetwith the expression data from<br>Galvestine-2 and other datasets | 68 | |
103
+ | | 2.3.5 | Comparison of <i>P. falciparum</i> transcripts after treatment with A51B1C1_1 to<br>transcripts of Arabidopsis treated with Galvestine-2. | 71 | |
104
+ | | 2.3.6 | Validation of microarray data with qRT-PCR. | 74 | |
105
+ | | 2.4 | Discussion | 76 | |
106
+ | 3 | | CHAPTER 3 Investigation of the proteomic response of <i>P. falciparum</i> treated with a<br>herbicide-derived compound | 87 | |
107
+ | | 3.1 | Introduction. | 87 | |
108
+ | | 3.2 | Methods and Materials. | 91 | |
109
+ | | 3.2.1 | Culturing <i>P. falciparum</i> for proteomics. | 91 | |
110
+ | | 3.2.2 | Isolation of proteins. | 91 | |
111
+ | | | | | |
112
+ | | | 3.2.3 | Protein concentration determination | 91 |
113
+ | | | 3.2.4 | Iso-electric focusing (IEF) | 92 |
114
+ | | | 3.2.5 | Two-dimensional polyacrylamide gel electrophoresis (2D-GE) | 93 |
115
+ | | | 3.2.6 | FlamingoPink staining of gels | 94 |
116
+ | | | 3.2.7 | Scanning of gels and data analysis | 94 |
117
+ | | | 3.2.8 | Identification of spots using Plasmo2D | 94 |
118
+ | | | 3.2.9 | Identification of proteins by mass spectrometry | 95 |
119
+ | | 3.3 | | Results | 96 |
120
+ | | | 3.3.1 | Protein concentration determination | 96 |
121
+ | | | 3.3.2 | 2-DE analysis of A51B1C1_1 treated <i>P. falciparum</i> parasites | 96 |
122
+ | | | 3.3.3 | Identification of the differentially expressed proteins in A51B1C1_1-treated <i>P.</i><br>falciparum parasites | 101 |
123
+ | | | 3.3.4 | Identification of the differentially expressed proteins in A51B1C1_1-treated <i>P.</i><br>falciparum parasites with mass spectrometry | 104 |
124
+ | | 3.4 | | Discussion | 106 |
125
+ | 4 | | | CHAPTER 4 Concluding discussion | 110 |
126
+ | | 4.1 | | Rational of the study | 110 |
127
+ | | 4.2 | | Summary of the findings | 111 |
128
+ | | 4.3 | | Implications and limitations of the findings | 112 |
129
+ | | 4.4 | | Future directions | 113 |
130
+ | 5 | | | REFERENCES | 114 |
131
+
132
+ {5}------------------------------------------------
133
+
134
+ ## APPENDICES
135
+
136
+ | Appendix 1 | Differentially affected transcripts of <i>P. falciparum</i> after inhibition with A51B1C1_1 | 129 |
137
+ |------------|------------------------------------------------------------------------------------------------------------------------|-----|
138
+ | Appendix 2 | Unique transcripts specific to the A51B1C1_1 treatment after comparison with other available datasets | 144 |
139
+ | Appendix 3 | The non-homologues clustering of <i>Arabidopsis</i> and <i>P. falciparum</i> using COCO after treatment with A51B1C1_1 | 146 |
140
+ | Appendix 4 | The Plasmodial proteins found in only the A51B1C1_1 treated samples analysed with MS | 153 |
141
+ | Appendix 5 | The Plasmodial proteins found in only the control samples analysed with MS | 157 |
142
+
143
+ {6}------------------------------------------------
144
+
145
+ #### **Fatty acid biosynthesis** 1.5.3.3
146
+
147
+ Fatty acids play an indispensable role in living cells, as it mediates cell growth, cell differentiation, membrane formation and act as precursors for energy stores (71). It also plays a role in the maintenance of cell homeostasis. The nuclear encoded enzymes of fatty acid biosynthesis, acyl carrier protein (ACP), enoyl-ACP reductase (Fabl), β-ketoacyl-ACP synthases III (KASIII) and I/II (FabB/F) and β-hydroxyacyl-ACP dehydratase is targeted to the apicoplast in *P. falciparum* (3, 72, 73, 89). This provided evidence that type II fatty acid biosynthesis is in fact a function of the apicoplast. The substrate for type II fatty acid biosynthesis is malonyl-CoA that is formed outside the apicoplast from acetyl–CoA by acetyl-CoA carboxylase. Fatty acid elongation consists of rounds of transferring of a malonyl moiety by malonyl-CoA:Acyl carrier protein transacylase (FabD) to ACP to form malonyl-ACP. This is followed by condensation, where the malonyl-ACP molecules are condensed by the enzyme $\beta$ -oxoacyl-ACP synthase III (FabH) with an acetyl group to form β-oxoacyl-ACP. The acetyl group can be provided by either acetyl-CoA or acetyl-ACP. The next steps are reduction and dehydration reactions that in the end would have added 2 carbons to the growing chain in each cycle. The last step is once again a reduction step by one of the following: $\beta$ ketoacyl-ACP I and II [(KAS), FabB/F], β-oxoacyl-ACP reductase (FabG), βhydroxyacyl-ACP dehydratase (FabZ) or enoyl-ACP reductase (Fabl) (110). Type II fatty acid synthesis pathway in the apicoplast of the malaria parasite is essential to the parasite but it is also very different from the cytosolic type I fatty acid biosynthesis of the human host. Thus, it may be a good target for drugs and inhibitors against the malaria parasite (111).
148
+
149
+ #### **Galactolipid synthesis** 1.5.3.4
150
+
151
+ Biological cell membranes are permeable barriers made up of lipid bilayers (112). Glycerolipids containing a phosphorous head group are the major lipid class in the membranes of animals, yeast and bacteria (113). Phosphatidylglycerol (PG) is the main phospholipid and represent $\pm 10\%$ of thylakoid lipids in chloroplast membranes (114). However, in photosynthetic organisms, such as algae, land plants and cyanobacteria, phosphorous-free galactolipids are the predominant lipid class (115, 116). The envelope of plastids, such as chloroplasts in plants, contains $\pm 80\%$ galactoglycerolipids. mainly monogalactosyldiacylglycerol (MGDG, ±50%) and digalactosyldiacylglycerol
152
+
153
+ {7}------------------------------------------------
154
+
155
+ (DGDG, $\pm 30\%$ ) (Figure 1.8) (114, 116). MGDG and DGDG can differ in the acvl composition at the sn-1 and the sn-2 position of the glycerol backbone. Two main types of MGDG and DGDG are found in nature. The eukaryotic type contains an 18 carbon fatty acid at the sn-2 site and is found in most land plants. The prokaryotic type, based on cyanobacteria, contains a 16 carbon fatty acid at the sn-2 site. Some plants, such as *Arabidopsis* and spinach, contain both types of MGDG. In most plants, the eukaryotic type DGDG is found (114). Galactolipids can relocate to non-plastid membranes and is considered important in membrane lipid homeostasis (117) and are essential in various processes including photosynthesis, plastid protein import and various other developmental processes (118).
156
+
157
+ ![](_page_7_Figure_1.jpeg)
158
+
159
+ ![](_page_7_Figure_2.jpeg)
160
+
161
+ ### The structures of galactolipids. A- Monogalactosyldiacylglycerol (MGDG) Figure 1.8 and B – Digalactosyldiacylglycerol (DGDG) (119).
162
+
163
+ Galactolipids can differ in the fatty acid chain at the sn-1 and the sn-2 position of the glycerol backbone (Indicated by red arrows). Two main types of MGDG and DGDG are found in nature. The eukaryotic type contains an 18 carbon fatty acid at the sn-2 site and is found in most land plants. The prokaryotic type. based on cyanobacteria, contains a 16 carbon fatty acid at the sn-2 site (114). A single galactosyl residue is transferred to the sn-3 position of DAG from UDP-galactose (Indicated by green arrows) (120).
164
+
165
+ MGDG is synthesised in plants by the enzyme MGDG synthase (MGD, EC 2.4.1.46), which is a galactosyltransferase (120). MGD catalyze glycosylation of 1.2-diacylglycerol (DAG) where a single galactosyl residue from UDP-galactose, is transferred to the sn-3 position of DAG (green arrow in Figure 1.8 A). MGD is encoded for by three genes in *Arabidopsis*, *mgd1*, *mgd2* and *mgd3* (121, 122) resulting in three isoforms that can be arouped phylogenetically into two types. The A-type consists of MGD1, which is responsible for the synthesis of the majority of MGD in the organism (123), whereas
166
+
167
+ {8}------------------------------------------------
168
+
169
+ MGD2 and MGD3 are grouped into the B-type. The main differences between the two types are their different functions and substrate specificity (120). As seen in Figure 1.9. the different MGD isoforms are only found in a specific location within the cell, because of the lack of membrane-spanning domains. MDG1 is found in the inner envelope of the chloroplast where it catalyses the reaction of DAG [18 carbon atom fatty acid at sn-1 position and 16-carbon atom fatty acid at sn-2 position $_{(18, 16)}$ to MGDG $_{(18, 16)}$ (113, 122). MGD2 and MGD3 are associated with the outer membrane of the chloroplast and can catalyse DAG $_{(16, 18)}$ and DAG $_{(18, 18)}$ to the corresponding MGDG (113, 121).
170
+
171
+ ![](_page_8_Figure_1.jpeg)
172
+
173
+ ### Figure 1.9 Schematic representation of the MGDG and DGDG synthesis of Arabidopsis (113).
174
+
175
+ MGDG synthesis occurs in the envelope of the chloroplast (outer and inner). DGDG synthesis is only located in the outer envelope. Lipid synthesis by prokaryotic pathway: green arrows. Lipid synthesis by eukaryotic pathway: red arrows. DGDG synthesis where DGDG is transported to extraplastidial membranes: black arrows. The number of carbon atoms of fatty acids on the sn-1 and sn-2 positions of the glycerol backbone is indicated in brackets.
176
+
177
+ DGDG synthesis is catalysed by DGDG synthase (DGD). A second $\alpha$ -galactosyl group is transferred from UDP-galactose to the 6-hydroxyl position of MGDG. The majority of DGDG in *Arabidopsis* is from DGD1 (124). A paralogue of DGD1 is also found in
178
+
179
+ {9}------------------------------------------------
180
+
181
+ *Arabidopsis*. DGD2 (113) with the main difference between DGD1 and DGD2 being that DGD1 consists of both an N-terminal domain for insertion of the protein into the outer envelope of the chloroplast (lacking in DGD2) as well as a C-terminal domain (glycosyltransferase activity) (113, 124, 125).
182
+
183
+ With *P. falciparum* containing a plant-like plastid apicoplast, the membranes surrounding the apicoplast suggested that glycerolipids that are unique to algae and plant plastids might also be present in the apicoplast membranes. As such, MGDG and DGDG may be present in Apicomplexa like *P. falciparum* and *T. gondii*. Investigations of galactolipid biosynthesis and content in membranes of these Apicomplexa have indicated that radioactively labelled UPD-galactose is incorporated into both MGDG and DGDG. The latter was also immunologically detected in parasite lysates (Figure 1.10). Distinct enzymatic processes or amino acid derivation of the synthases were involved since no clear identification of MGDG or DGDG synthase orthologues could be identified in the *P. falciparum* genome utilising only bioinformatic searches (119).
184
+
185
+ ![](_page_9_Figure_2.jpeg)
186
+
187
+ #### Figure 1.10 Galactolipids synthesised in *P. falciparum* (119).
188
+
189
+ After the parasites were incubated with tritiated UDP-galactose, the lipid extraction was separated using TLC. Sulfoquinovosyldiacylglycerol (SL), trigalactosyldiacylglycerol (triGDG) and tetragalactosyldiacylglycerol (tetraGDG) as detected in the lipid extraction.
190
+
191
+ {10}------------------------------------------------
192
+
193
+ MGDG synthase has been shown to be essential to plant cell growth, with knock-outs of MGD1 in *Arabidopsis* as a member of the multigene MGDG synthase family, leading to a complete lack of chlorophyll, chloroplast ultrastructure disruption and severe plant growth inhibition (118). This data provides support for galactolipid biosynthesis as a valid growth inhibition strategy. As this process is unique in *P. falciparum* and not found in humans, it is an enticing strategy for the development of novel antimalarials.
194
+
195
+ #### 1.5.4 Apicoplast as drug target in *P. falciparum*
196
+
197
+ Numerous studies have proven that the targets of antibiotics such as azithromycin. clindamycin, doxycycline, ciprofloxacin, chloramphenicol, tetracycline and rifampicin are different metabolic activities within the apicoplast (Table 1.2), and that these compounds display a phenomenon called 'delayed death' of the malaria parasites (97). Delayed death occurs when the treated parasites do not show signs of growth inhibition in the first life cycle, but the next generation of merozoites following drug intervention cannot invade new erythrocytes and die before a third life cycle (48, 97, 111). It was also found that delayed death only occurs when the house-keeping functions of the parasite's apicoplast are targeted, such as replication, transcription and translation. This will reduce the number of apicoplasts in the next generation, but the remaining apicoplasts enable the parasite population to survive a while longer. This is known as the 'self-sustenance' function of the apicoplast (126). When more essential pathways and functions of the apicoplast, such as fatty acid biosynthesis and haem biosynthesis are targeted by a compound, the growth inhibitory effect is faster resulting in rapid parasite death (126).
198
+
199
+ Current inhibitors targeting fatty acid biosynthesis include thiolactomycin and triclosan (and analogous) which targets FabB (71). Luteolin-7-*O*-glucoside, a common flavonoid glucoside, targets Fabl in the fatty acid biosynthesis pathway, whereas lucteolin and catechin gallate inhibits FabZ, FabG and FabI (127).
200
+
201
+ ### 1.6 The compound A51B1C1 1
202
+
203
+ In a study conducted by Botté and Maréchal in 2010 (118), a set of herbicide-derived compounds were tested for activity against *Arabidopsis* to identify novel chemical scaffolds as inhibitors of galactolipid biosynthesis in these plants. A high-throughput screening strategy was followed to screen DAG analogues with inhibitory activity
204
+
205
+ {11}------------------------------------------------
206
+
207
+ against recombinantly expressed MGD1 (MGDG synthase family member). The first set of 23360 compounds screened was compiled from the Cerep diversity-based library and only 20 compounds exhibited an apparent inhibition of more than 25%. These 20 compounds, together with 40 additional compounds from the Cerep diversity-based library (not included in the first study) that was selected based on chemical similarities, was tested for their inhibitory properties on MGD1. The inhibition of MGD1 was tested *in vitro* in *Arabidopsis* and two compounds showed MGD1 activity inhibition above 40%, Galvestine-1 and Galvestine-2 (Figure 1.11). These compounds show competitive inhibition with DAG for MGD1.
208
+
209
+ ![](_page_11_Figure_1.jpeg)
210
+
211
+ Figure 1.11 The structures of Galvestine-1 (A) and Galvestine-2 (B) (118).
212
+
213
+ A large set of small molecules were subsequently designed using the structure of Galvestine-1 as scaffold, by changing three active groups on the molecule as shown in Figure 1.12. The linker region (indicated in red) was also exchanged with various The resultant Group A molecules included changes to the possible structures. benzimidazole group on the parent molecule, Group B includes molecules which contained changes to the piperidinyl part and Group C contains changes in the dibenzylamino-ethoxy group. The resultant compounds were therefore named according to the substitution given for each group.
214
+
215
+ {12}------------------------------------------------
216
+
217
+ ![](_page_12_Figure_0.jpeg)
218
+
219
+ ### Figure 1.12 The process by which the small molecules were designed, using Galvestine-1 as the guide molecule (118).
220
+
221
+ Group A - changes to the benzimidazole group, Group B - changes to the piperidinyl part and Group C changes in the dibenzylamino-ethoxy group.
222
+
223
+ These molecules were tested for their ability to inhibit MGDG synthesis in the envelope vesicles of chloroplasts that were purified from spinach plants in order to determine their bioavailability in the environment of the membrane. Dose-dependent inhibition of *Arabidopsis* plant growth was observed for Galvestine-1 and -2 as well as some of their derivatives. Moreover, there was a decrease in the production of MGDG and the ratio of MGDG:DGDG was also affected. Galvestine-1 and -2 have *in vitro* growth inhibition IC<sub>50</sub> values of 10 $\mu$ M and 12 $\mu$ M, respectively against *Arabidopsis*. Fifty of the derivatives were found to have IC<sub>50</sub> values between 200 and 800 $\mu$ M.
224
+
225
+ {13}------------------------------------------------
226
+
227
+ This study therefore showed for the first time that disruption of cellular lipid homeostasis could be affected through targeting MGDG synthases, and this has a dramatic effect on the growth of plants. Due to the presence of MGDG and DGDG in the plant-derived apicoplast of $P.$ falciparum, an interesting speculation is that Galvestine-1 and its derivatives might have growth-inhibitory capacity against the malaria parasite by targeting lipid biosynthesis processes in the apicoplast. This study therefore presents the determination of the antimalarial property of one of the lead MGDG synthase inhibitors from the Botté study, A51B1C1\_1 (Figure 1.13).
228
+
229
+ ![](_page_13_Picture_1.jpeg)
230
+
231
+ Figure 1.13 The structure of the compound A51B1C1 1 (Personal communication, Eric Maréchal, Pretoria, 2009).
232
+
233
+ One major advantage of this strategy would be that these compounds are herbicidederived and could therefore, if they are active against *P. falciparum*, prove to be highly selective to the parasite without targeting any metabolic process in humans. This compound furthermore provides a novel chemical scaffold unrelated to any current antimalarials, which would be a novel action in the parasite compared to currently used antimalarials, and be able to overcome the resistance mechanisms against current antimalarials. Thus, if these compounds prove to be active against the malaria parasite, they may be developed into new antimalarial drugs.
234
+
235
+ {14}------------------------------------------------
236
+
237
+ ### 1.7 Research objective and aims
238
+
239
+ The primary objective of this study was to determine the antimalarial potential of compound A51B1C1\_1 as well as the physiological response of *P. falciparum* after treatment with this compound by employing a comprehensive functional genomics approach.
240
+
241
+ Chapter 2 focuses on determining the antimalarial activity of A51B1C1\_1 through morphological investigation of *P. falciparum* after treatment with this compound. This is followed by a complete transcriptome analysis employing DNA microarray to identify responsive transcripts in *P. falciparum* that were differentially regulated upon treatment with this compound.
242
+
243
+ Chapter 3 introduces the use of higher-level functional genomics analyses of the response of *P. falciparum* to A51B1C1 1 treatment by investigating the proteome of the parasites after perturbation with this herbicide-derived compound.
244
+
245
+ Chapter 4 is a concluding chapter in which the results and conclusions reached from the above mentioned studies are integrated and future perspectives are presented.
246
+
247
+ Results from this work were presented in the following instances:
248
+
249
+ ### Workshops:
250
+
251
+ - 1. J Snyman, J Verlinden, Al Louw, E Maréchal, L Birkholtz. (2009) Functional genomics of a herbicide treated *P. falciparum.* Grenoble, France.
252
+ - 2. J Snyman, J Verlinden, Al Louw, E Maréchal, L Birkholtz. (2011) Transcriptomic profiling of *Plasmodium falciparum* using the Agilent platform. Latest Advances in Microarray Applications and NGS Target Enrichment Technology. Pretoria, South Africa
253
+
254
+ {15}------------------------------------------------
255
+
256
+ ## Conferences:
257
+
258
+ 1. J Snyman, J Verlinden, Al Louw, E Maréchal, L Birkholtz. (2010) Functional genomic investigation of *P. falciparum* treated with herbicide-derived compounds. SASBMB, Bloemfontein, South Africa
259
+
260
+ ## **Manuscript in preparation:**
261
+
262
+ 1. J Snyman, J Verlinden, Al Louw, E Maréchal, L Birkholtz. Exploiting *Plasmodium falciparum*'s plant origins: The discovery of herbicide-derived compounds as new antimalarial drugs.
263
+
264
+ {16}------------------------------------------------
265
+
266
+ # CHAPTER 2 MORPHOLOGICAL AND TRANSCRIPTOMIC ANALYSES OF THE EFFECT OF A HERBICIDE-DERIVED COMPOUND ON P. FALCIPARUM PARASITES
267
+
268
+ ### 2.1 Introduction
269
+
270
+ Functional genomics is an essential tool in the development of new drugs against $P$ . *falciparum.* It allows determination of the function of different genes, the mode of action of a novel drug, optimization of drug development, validation of drug targets and identification of new drug targets, amongst others (Box 2.1) (128-130). This contributes to our understanding of gene function and the effect of a specific drug on an organism (128, 131).
271
+
272
+ #### Box 2.1 Implementation of functional genomics in drug target discovery in *P*. falciparum (132).
273
+
274
+ | Genome-wide questions | Transcriptome-specific questions |
275
+ |------------------------------------------------------------|----------------------------------|
276
+ | Lifecycle development (stage-specific expression) | Transcriptional machinery |
277
+ | Reproduction genes (strategy-specific expression) | Regulation of transcription |
278
+ | Drug resistance mechanisms | Transcriptional inheritance |
279
+ | Mechanism of drug action | Proteome-specific questions |
280
+ | Reponses to environmental stressors | Post-transcriptional regulation |
281
+ | Drug target specification | Post-translational repression |
282
+ | Host-specific adaptation and expression | Interactome-specific questions |
283
+ | Identification of vaccine targets | Protein function |
284
+ | Virulence determinants | Relationships |
285
+ | Severe disease progression in vivo | Regulatory mechanisms |
286
+ | Biological and mechanistic insights | |
287
+ | Metabolic pathways | |
288
+ | Identity determination of hypothetical proteins | |
289
+ | Cell cycle regulators | |
290
+ | Sex determinants | |
291
+ | Chemical validation of drug targets | |
292
+ | Mode-of-action of inhibitory compounds | |
293
+ | Improved drug target action | |
294
+ | Gene expression regulators (transcription and translation) | |
295
+ | Virulence factors | |
296
+ | Specialised organelle function and metabolism | |
297
+ | Damage compensation | |
298
+
299
+ {17}------------------------------------------------
300
+
301
+ ## 2.3 Results
302
+
303
+ #### 2.3.1 $IC_{50}$ determinations
304
+
305
+ Dose-response curves were established to determine the median inhibitory concentration ( $IC_{50}$ ) of the herbicide derivative A51B1C1\_1 using a fluorescent SYBR Green I assay (MSF assay) on the chloroquine-sensitive *P. falciparum* strain 3D7. The average $IC_{50}$ value of A51B1C1\_1 determined in four individual experiments was found to be 447 ±16 nM (Figure 2.5).
306
+
307
+ ![](_page_17_Figure_3.jpeg)
308
+
309
+ Figure 2.5 Sigmoidal concentration-response curve to calculate the median inhibitory concentration (IC<sub>50</sub>) of *P. falciparum* (3D7) treated with A51B1C1\_1. Data are representative of 4 independent experiments performed in triplicate, $\pm$ SEM.
310
+
311
+ The morphological impact of A51B1C1\_1 on *P. falciparum* parasites was determined next.
312
+
313
+ #### 2.3.2 Morphology studies
314
+
315
+ Two independent *P. falciparum* (3D7) parasite cultures were treated at $2xIC_{50}$ A51B1C1\_1 and Galvestine-2 (Data for Galvestine-2 treated *P. falciparum* were obtained from a previous study by Mr J.C. Verlinden (166) and was included in this study as an additional analogue of the Galvestine-1 parent compound) and the effects on the morphology of the parasites were observed for 72 h. The parasites were treated
316
+
317
+ {18}------------------------------------------------
318
+
319
+ at the invasion stage (merozoites/early rings) and samples were taken every 2-4 h. The morphology of the treated parasites was compared to that of control parasites (treated with DMSO) as shown in Figure 2.6.
320
+
321
+ Both treated cultures remained morphologically similar to the control culture up to 12 hpi. However, Galvestine-2 treated parasites showed morphological changes after 24 hpi and after 36 hpi the parasites additionally showed a decrease in the rate of development. At 60 hpi the parasites in the Galvestine-2 treated culture entered the schizont/merozoite stage, whereas the control culture already developed into rings in the subsequent life cycle at the same time point. Although Galvestine-2 had a lagging effect on the parasite's life cycle development, parasites treated with this compound did not show complete life cycle arrest at the concentrations tested, even after 60 hpi.
322
+
323
+ In contrast *P. falciparum* parasites treated with A51B1C1\_1 continued to show similarities to the control culture through the ring stage and early trophozoite stages. However, at 48 hpi, the control untreated parasites entered the merozoite stage, but the A51B1C1\_1 treated parasites became pyknotic and remained so for the rest of the life cycle and was unable to progress to a new life cycle. These parasites could not invade new erythrocytes and form new rings, unlike the control culture.
324
+
325
+ {19}------------------------------------------------
326
+
327
+ ![](_page_19_Figure_0.jpeg)
328
+
329
+ #### Figure 2.6 Morphological study of *P. falciparum* 3D7 parasites over 72 h after treatment with A51B1C1 1 and Galvestine-2.
330
+
331
+ After 24 hpi the Galvestine-2 treated culture showed a decrease in the rate of development, but still continued with the life cycle. The Galvestine-2 culture reached the merozoite stage after 60 hpi, whereas the control culture was already at the ring-stage of the next development cycle. The A51B1C1\_1 treated culture showed signs of stress after 48 hpi and after prolonged incubation, the stressed parasites became smaller (pyknotic) (red arrows).
332
+
333
+ {20}------------------------------------------------
334
+
335
+ The morphological monitoring of the two compounds was repeated in two independent experiments and graphical analyses were used to delineate the time point at which growth inhibition by the compounds occurred (Figure 2.7).
336
+
337
+ ![](_page_20_Figure_1.jpeg)
338
+
339
+ ![](_page_20_Figure_2.jpeg)
340
+
341
+ ![](_page_20_Figure_3.jpeg)
342
+
343
+ Graphical analyses of intra-erythrocytic development of *P. falciparum* (3D7) Figure 2.7 parasites after treatment with herbicide derivatives.
344
+
345
+ Untreated parasites (A1 and B1) were morphologically compared to parasites treated with 2xIC<sub>50</sub> of either Galvestine-2 (A2) or A51B1C1\_1 (B2). Parasite cultures were microscopically monitored every 6 h with ring stage parasites characterised as ring shaped single nucleated parasites (blue), trophozoites as dark stained parasites containing prominent haemozoin crystals but without multiple nuclei (Orange: Early Trophozoites, Green: Trophozoites), schizonts contained haemozoin as well as distinct multiple nuclei (Yellow) and merozoites are indicated in black.
346
+
347
+ The parasites treated with A51B1C1\_1 showed a decrease in growth rate after 36-42 hpi. The parasites did continue to grow at this slow rate and schizonts formed only after 54 hpi. However, not all the parasites entered this developmental stage with the majority (40%) remaining as trophozoites. After 48 hpi, the parasites showed clear
348
+
349
+ {21}------------------------------------------------
350
+
351
+ stress characteristics, whereas the control culture already had 60% of parasites in the schizont stage after 48 hpi and only 10% trophozoites remained.
352
+
353
+ For Galvestine-2, the change in parasite morphology was not as noticeable as for compound A51B1C1\_1, but the parasites did show a decrease in the rate of growth. Some of the parasites entered the schizont stage at 40 hpi, about $\pm$ 10 h later than the control culture and the release of merozoites only occurred at 54 h, 12 h later than the control culture.
354
+
355
+ #### 2.3.3 RNA isolation
356
+
357
+ After the data from the morphology studies were analysed, two time points were selected: time point 1 at 28 hpi and time point 2 at 36 hpi. Samples of treated and untreated, control *P. falciparum* cultures (10 ml) at 10% parasitaemia and 5% haematocrit were taken at these time points and RNA was isolated. Eight samples were chosen randomly to confirm the integrity and purity of the RNA using the Experion system (Bio-Rad). The virtual gel image is shown in Figure 2.8 (176). The dark bands at about 3500 and 2000 bp (indicated by black arrows) represent the 18S and 28S rRNA units without excessive smears, which would have been indicative of RNA degradation or DNA contamination (177). The small band at 50 bp is the internal standard included in the sample buffer used by the Experion software to align the sample on the virtual gel. The faint band in all the lanes a few bp smaller than 200 bp are the 5S rRNA in each sample.
358
+
359
+ {22}------------------------------------------------
360
+
361
+ ![](_page_22_Figure_0.jpeg)
362
+
363
+ #### The virtual gel image indicating the purity of the RNA. Figure 2.8
364
+
365
+ Lane 1:Control Biological Rep 1 Time point 1 (8.9;1.44), lane 2: Control Biological Rep 1 Time point 2 (9.2;1.5), lane 3: Control Biological Rep 2 Time point 1 (9.7;1.51), lane 4: Control Biological Rep 2 Time point 2 (9.7;1.52), lane 5: Treated Biological Rep 1 Time point 2 (9.6;1.48), lane 6: Treated Biological Rep 2 Time point 2 (9.6;1.58), M is the Molecular marker. The numbers in brackets represent the RQI number and the 28S/18S ratio, respectively. The band at 50 bp is the standard included in the sample buffer supplied by the manufacture. The bands in all the lanes just smaller than 200 bp indicate that trace amounts of 5S rRNA are present in each sample. The 18S and 28S rRNA units are the clear and dark bands at about 3500 and 2000 bp (indicated by black arrows).
366
+
367
+ The gel image indicates the presence of small amounts of contaminating DNA and RNA degradation products other than the large bands at $\sim$ 3500 bp and $\sim$ 2000 bp in lane 1, lane 3 and 5. To determine if the RNA from these samples were still of useful quality, electropherogram data from the Experion system were analysed (Figure 2.9). Peaks representing the 18S and 28S ribosomal RNA (rRNA) units are observed in RNA samples of high purity as most of the 5S rRNA unit has been removed during the RNA isolation with the RNeasy kit (Manual Qiagen 2006). The small peaks/smears before the 18S peak, 20-40 s region (s=running time) are generally indications of RNA degradation. Small peaks between the 18S and 28S peak usually indicate the presence of small amounts of 28S RNA breakdown products. DNA contamination would have been indicated as peaks or smears after the 28S peak if present, which was not the case in these isolations (177).
368
+
369
+ {23}------------------------------------------------
370
+
371
+ ![](_page_23_Figure_0.jpeg)
372
+
373
+ Figure 2.9 The electropherogram as an indication of the purity of the RNA.
374
+
375
+ Small peaks and smearing between 20-40 seconds (Running time) are degraded RNA. Peaks between the 18S and 28S peaks are products of 28S rRNA degradation. Any peaks and smears after the 28S peak is usually DNA contamination.
376
+
377
+ The gel image and the electropherogram indicated that the RNA samples are acceptable for use in microarray studies and that degradation is minimal. No DNA contamination is visible as there are no peaks between 55–70 s region. Additionally, an indication of the quality of the RNA can be obtained through RQI values, where a value of 10 refers to RNA of high purity and 1 is an indication of fully degraded RNA. The RQI numbers, 28S/18S ratio and the concentration of the tested RNA samples isolated are indicated in Table 2.3 as a summary.
378
+
379
+ | RNA Sample | Lane<br>no.* | 28S/18S<br>ratioa | RQI<br>numberb | Concentration<br>ng/μl |
380
+ |---------------------------------------|--------------|-------------------|----------------|------------------------|
381
+ | Control Biological Rep 1 Time point 1 | 1 | 1.44 | 8.9 | 248.12 |
382
+ | Control Biological Rep 1 Time point 2 | 2 | 1.5 | 9.2 | 269.55 |
383
+ | Control Biological Rep 2 Time point 1 | 3 | 1.51 | 9.7 | 145.56 |
384
+ | Control Biological Rep 2 Time point 2 | 4 | 1.52 | 9.7 | 249.22 |
385
+ | Treated Biological Rep 1 Time point 2 | 5 | 1.48 | 9.6 | 235.6 |
386
+ | Treated Biological Rep 2 Time point 2 | 6 | 1.58 | 9.6 | 245.52 |
387
+
388
+ Table 2.3 The quality and purity of the RNA sample tested on the Experion.
389
+
390
+ a - Ratio of the ribosomal bands (28S:18S) which should have a ratio of 2.0 for high quality RNA.
391
+
392
+ b - RNA Quality Indicator.
393
+
394
+ \*- Corresponding lane numbers to the virtual gel image in Figure 2.11.
395
+
396
+ {24}------------------------------------------------
397
+
398
+ The 28S/18S ratio and the RQI number of each RNA sample in Table 2.3 confirm the results in Figure 2.8 and Figure 2.9. The RNA samples all have RQI values of >8. which is an indication of pure, high-quality RNA.
399
+
400
+ #### 2.3.4 Microarray
401
+
402
+ Transcriptome profiling of *P. falciparum* 3D7 parasites treated with A51B1C1\_1 was performed with the in-house designed 8x15K *Plasmodium* Agilent microarray platform (166). A reference design was used during the microarray experiment, which included a reference pool of all the samples that was hybridised with the various samples. Figure 2.10 provides an example of an Agilent array and the difference between the treated and the control arrays at the same time point (time point 1 at 28 hpi).
403
+
404
+ The most prominent difference between the arrays of the control and the treated parasites was the overall colour. The treated arrays had a yellowish colour which is associated with the differences in expression of transcripts in the control and treated samples as the parasites progress through their life cycle (133). These differences are visible on the enlarged sections included in Figure 2.10. The majority of the spots on the treated array were either green or yellow, whereas the control array's spots were red, yellow and green. Agilent includes control spots on each corner of the array for quality control and the assessment of the hybridisation of the samples tested. These control spots include dark corners, which act as negative controls and light corners which acts as positive controls (Figure 2.10).
405
+
406
+ {25}------------------------------------------------
407
+
408
+ ![](_page_25_Figure_0.jpeg)
409
+
410
+ ### Figure 2.10 Agilent arrays of *P. falciparum* parasites treated with A51B1C1\_1.
411
+
412
+ The overall colour of the treated array (left hand panel) consists of mostly yellow or green spots (enlarged section A). The control array (right hand panel) consists of red, green and yellow spots. This colour difference is because of the differences in transcript abundance between the treated and the control samples. Agilent includes controls on their slides to ensure quality control (enlarged sections B-E). The bright corners act as a positive control and should always be hybridised. The dark corners should always stay dark after hybridisation.
413
+
414
+ #### $2.3.4.1$ **Data analysis**
415
+
416
+ Analyses of the arrays were done using GenePix 6.0 and included spot finding and manual checking of the indicated spots. The parameters in Figure 2.11 were entered into GenePix 6.0 and saturated and bad quality spots were removed from the subsequent set of spots.
417
+
418
+ {26}------------------------------------------------
419
+
420
+ ## 2.4 Discussion
421
+
422
+ The resistance that *P. falciparum* parasites have developed against current antimalarials implies an urgent need to identify novel drug targets and to develop new drugs. Relatively little attention has been focused on the apicoplast of *P. falciparum* parasites. However, there is an evolutionary relationship between the multi-membrane apicoplast of these parasites and the photosynthetic chloroplast of plants (2), therefore it is hypothesised that compounds that inhibit the growth of plants, may also inhibit the development of the parasite (118).
423
+
424
+ Galvestine-2 and A51B1C1 1 are modified from Galvestine-1 as parent compound. which is a herbicide derivative. In *Arabidopsis*, Galvestine-1 inhibits the activity of MGD1 in membrane vesicles and also in the released micelles (118). This indicated the potency of the compound and potentially its derivatives as galactolipid synthesis inhibitors. Kinetic analyses of the enzymatic (MGD1) inhibition on mixed micelles, showed that Galvestine-1 competes with the binding of DAG, the substrate for MGDG and DGDG synthesis (184). DGDG-like epitopes have been detected in the plasma membrane and inner membrane complex of $P$ . *falciparum* but not in the apicoplast as found in *Arabidopsis* (Personal communication, Eric Maréchal, Pretoria, 2009). *Arabidopsis* treatment with Galvestine-1 triggered an *in vivo* inhibition of MGDG synthase activity and resulted in the overall reduction of the galactolipids, especially MGDG. The effect was seen mainly on the chloroplast, with a decrease in the membrane expansion of the chloroplast resulting in a decrease in the overall size of the chloroplasts. No other subcellular structure was affected by the treatment. Transcriptomic effects after treatment of *Arabidopsis* with Galvestine-1 indicated a decrease in abundance of gene transcripts involved in galactolipid synthesis (118).
425
+
426
+ One method for the determination of the $IC_{50}$ for potential antimalarials *in vitro* is the measurement of $^{3}$ H-hypoxanthine incorporation (185). However, this assay is expensive and the use of radioactive materials poses safety hazards and disposal problems (186). Enzymatic reactions and antibodies are also used as alternative methods to detect the presence of lactate dehydrogenase (187), and histidine-rich protein II, respectively (188) which indicate the number of viable parasites after treatment. These assays are not suited for high-throughput antimalarial drug screening as they are time consuming (189). Non-radioactive DNA stains, such as SYBR Green I
427
+
428
+ {27}------------------------------------------------
429
+
430
+ (160) and PICO green<sup>®</sup> (190) are safe and cost-effective methods to determine the $IC_{50}$ s of anti-*Plasmodium* compounds, compared to other published methods. The IC<sub>50</sub> of compound A51B1C1 1 determined in a preliminary study in Grenoble, France (collaborator E. Maréchal), using the <sup>3</sup>H-hypoxanthine incorporation method, was found to be 180 nM (Personal communication, Eric Maréchal, Pretoria, 2009). The IC<sub>50</sub> value determined in this study using the SYBR Green I method was shown to be $447 \pm 16$ nM. The different values may be due to the techniques used to determine the $IC_{50}$ as was previously observed (186, 191, 192). Values below 1 $\mu$ M comply with the MMV requirements (www.mmv.org) and are an indication that a compound may have potential against *P. falciparum*. .
431
+
432
+ Delayed death is a phenomenon where the perturbation only causes the death of the parasite in the next life cycle. Examples are the effects seen after treatment of $P$ . *falciparum* parasites with drugs such as tetracycline (97, 111, 126) and macrolides (48, 193, 194) which are both translation inhibitions and Gyrase inhibition caused by ciprofloxacin (48, 111, 195). The reason for the delayed death was found to be the transfer of non-functional apicoplasts to the progeny and thus lack of apicoplast protein production (196). In contrast, drugs affecting non-housekeeping processes in the apicoplast of *P. falciparum* have been shown to not display delayed death phenomena (126). These drugs result in visible stress symptoms and growth arrest of the parasites within the first life cycle after exposure. Analyses of *P. falciparum* parasites treated with the herbicide derivative A51B1C1 1 $(2xIC_{50})$ revealed morphological signs of stress in the form of pyknotic parasites within the first 48 h. The formation of pyknotic bodies was also seen by Deponte and Becker in 2004 (197) in a study where they treated either *P. falciparum* parasites with antimalarials followed by exposure to oxidant stress or left to starve. These parasites showed signs of programmed cell death, with cell shrinkage (pyknotic bodies), membrane blebbing and nuclear fragmentation visible (197). Stressed forms of *P. falciparum* parasites under A51B1C1 1 pressure could indicate that either 1) this compound targets non-housekeeping processes of the apicoplast with no delayed death phenotype; or 2) the compound targets another biological process in the parasite not associated with the apicoplast.
433
+
434
+ Morphological monitoring of *P. falciparum* parasites under A51B1C1\_1 pressure revealed that the parasites' life cycle development was slowed down to a point where the life cycle was halted within the first life cycle. Whereas untreated parasites were at
435
+
436
+ {28}------------------------------------------------
437
+
438
+ the mature schizont stage after 48 h, A51B1C1 1 treated *P. falciparum* were present as stressed (possibly dead) trophozoites with only a few maturing to merozoites for invasion of new erythrocytes. In contrast to this, Galvestine-2 affected the parasites to a lesser degree. After the first life cycle (48 h), the Galvestine-2 treated parasites were in the late trophozoite stage with a few schizonts with no morphological signs of stress and slightly delayed life cycle of the parasite, rather than causing pyknotic parasites (death) as observed for $A51B1C1_1$ treatment.
439
+
440
+ DNA microarray allows global analyses of the complete transcriptome of an organism, at any moment in its lifecycle, in a single experiment (198). This holds a challenge for the analysis to make sense of the large amount of high-quality raw data. The Plasmodial Agilent platform was chosen because the overall spot quality of this platform is a significant improvement on previously Plasmodial arrays used in our lab (133, 166, 199). The higher quality in the microarray data results allows higher confidence in the analysis of the data (133, 200). The A+T richness of the genome of *P. falciparum* causes difficulties in the PCR reaction to generate more starting material (201) and one advantage of the Agilent platform is that it needs less starting material than other methods.
441
+
442
+ Using the information obtained from the morphological studies, the microarray study was designed and two time points were selected for the extraction of RNA from synchronised parasites. The background noise from multiple life stages is reduced in synchronised cultures enabling the detection of abundance differences in transcript levels above the normal levels in the IDC (158). Small changes on transcriptomic level may have been missed if the cultures used in this study were not synchronised. Synchronised cultures have been used in previous studies for the above mentioned reason and these authors were able to detect perturbation induced transcriptional changes, independent of the normal transcript production due to life cycle development (158, 180, 202, 203). There have been reports on *P. falciparum* studies using nonsynchronised cultures (139, 141) however, in most of these studies the authors failed to notice transcriptional responses to perturbation. The time points selected for the sampling of RNA was 28 hpi and 36 hpi, which covers the life stages in which a morphological effect is seen after treatment with A51B1C1\_1.
443
+
444
+ {29}------------------------------------------------
445
+
446
+ For confidence and statistical analysis, the experiment included two biological and two technical replicates. A reference design for the microarray set up was performed on the Plasmodial Agilent platform, which allow for the simultaneous analysis of the samples. This design also enables easy comparison between the samples. The reference is a pool of all the samples (treated and control) in equal amounts to ensure a representative sample. The quality of the data resulting from the Agilent arrays was of high standard (200). The statistical significance of transcripts showing a change in abundance levels was calculated with t-test (167, 168) and only the transcripts with at least a $log_2$ -ratio ≥ 0.75 or ≤ -0.75 (fold change of 1.7) with 95% confidence were regarded as differentially expressed.
447
+
448
+ Normalisation, data correction and local background subtraction are essential processes in DNA microarray data analyses to allow for correction of variation between technical and biological replicates and accurate identification of differentially expressed transcripts between two sample sets. Various algorithms are available for the correction of variation between spots on the same array in an intensity-dependent manner. In this study two methods were tested to determine which gives the best normalisation. Global loess makes the assumption that most of the genes are not expressed differentially, which cannot be assumed in this study and resulted in box plots of different sizes and outliers that influenced the results, which are an indication of the success of the normalisation (167). Robust spline was used in the experiments as it uses 5 parameter regression splines instead of curves as in loess and was designed to deal with outliers is such a way that it does not affect the mean (167, 178). This resulted in boxes of similar size and around the same M value (Figure 2.13). The same box size is an indication of equal contribution of the samples and the same M-value proves successful normalisation of the intensity of the different arrays.
449
+
450
+ For between-array normalisation both Aqualtile and Gqualtile were tested. Aquantile was designed for experiments where the samples are labelled with Cy3 and the reference with Cv5 (149). This resulted in a density plot with multiple green lines. representing the reference. Gquantile normalisation was used in this study as this was designed for Cy3 labelling of the common reference. Gquantile was used for the correction of background effects between arrays, which enables the comparison across the different arrays and resulted in acceptable normalisation of the A51B1C1\_1 datasets. The Agilent platform and reference design microarray strategy has been
451
+
452
+ 79
453
+
454
+ {30}------------------------------------------------
455
+
456
+ previously used, and in these instances, success was achieved with Gauantile normalisation. (166, 199). Fulfilment of all the set criteria and thorough analyses resulted in valid, high quality data sets, which are reproducible and reliable (198).
457
+
458
+ Pearson correlations indicated that the time points chosen for sampling of RNA from $P$ . *falciparum* parasites after treatment with A51B1C1 1 for the microarray were early enough to enable direct comparison between the treated and the control parasites at a specific time point and identification of differentially expressed transcripts (Table 2.4). The same direct comparison was used by Dahl et al. (2006) after 55 h treatment with doxycycline because of the little variation between the treated and the control samples (97). Since transcripts in the *P. falciparum* IDC are only produced once needed (137), it is essential that comparisons between parasite populations only detect drug-specific responses and not normal life stage specific responses (133). In the case where direct comparisons of treated and control parasites are not possible, the use of a $t_0$ strategy was developed to determine the point of transcriptional arrest and using that point as the reference to make comparisons between the treated and control samples (133).
459
+
460
+ In this study, the steady state levels of the affected transcripts at a specific time point were determined. A total of 1504 transcripts (Appendix 1) were affected by treatment with compound A51B1C1 1 of which 805 decreased in abundance and 699 increased in abundance. The larger number of transcripts with decreased abundance correlates to other perturbation studies done on *P. falciparum* (133, 199). Additionally, the range of fold changes of the transcripts detected is between 1.9 (increased abundance) and -3.9 (decreased abundance), and is in agreement with previous transcriptomic reports (97). The transcripts with decreased abundance showed a much higher level of confidence and the fold of decrease was also higher than the fold of increase, similar to published data (133, 141, 199). However, the number of transcripts affected is much higher (in most cases, more than double the number of transcripts) than for most other reported studies (97, 133, 183, 199, 202).
461
+
462
+ A paradoxical effect is seen after treatment of *P. falciparum* parasites with A51B1C1 1. The treatment caused both an increase and a decrease in abundance of transcripts representing the same biological process (cluster). This is not uncommon in treated $P$ . *falciparum* and has been seen numerous times with methionine and polyamine metabolism inhibition (133, 199). Both the 699 transcripts with increased abundance 80
463
+
464
+ {31}------------------------------------------------
465
+
466
+ and the 805 transcripts with decreased abundance were clustered into the same 14 functional groups (transcripts with decreased abundance contained an extra cluster) (Figure 2.17) using GO annotations obtained from PlasmoDB and MADIBA. The largest clusters of affected transcripts with an increase in abundance were RNA metabolic processes (9%), translation (6%), primary metabolism (5%) and post-translational modifications (5%). Transcripts with a decrease in abundance are involved in parasitehost interactions (6%), post-translational modifications (4%), transport (3%) and proteolysis (3%). As seen in previous studies, the clusters consist of transcripts with increased abundance, as well as transcripts with decreased abundance (133, 166, 199).
467
+
468
+ The 1504 transcript data set includes ten transcripts (three with increased abundance and seven with decreased abundance, Table 2.6) involved in lipid biosynthesis or fatty acid biosynthesis. The presence of these transcripts (all have Log<sub>2</sub> FC of about 1) in the data set indicates the effect of the treatment on lipid biosynthesis (three in glycerophospholipid metabolism, Figure 2.20, and two in glycerolipid metabolism, Figure 2.21). In glycerophospholipid metabolism, three transcripts were affected of which one increased in abundance (PF14\_0097, EC 2.7.7.41 – Cytidine-diphosphate-DAG synthase) and two with decreased abundance (PFI 1370, EC 4.1.1.65 -Phosphatidyl serine-decarboxylase and PF14 0020, EC 2.7.1.32 – Choline kinase) (Figure 2.20). In glycerolipid metabolism two transcripts (PFC0995c, EC 2.3.1.20 -DAG O-acyltransferase and PFI 1485, EC 2.7.1.107 – DAG kinase) both decreased in abundance (Figure 2.21). DAG O-acyltransferase is the enzyme that convert DAG into triacylglycerol (TAG), one of the lipids proven by Maréchal *et al.* (2002) to be present in the *P. falciparum* parasite (119). The *P. falciparum* genome was found to only have one open reading frame for the enzyme DAG O-acyltransferase (204). The function of this enzyme in the IDC is still under debate (204), but reports show an increase in TAG levels in mature forms of the parasites (205) and TAG metabolism and trafficking in $P$ . *falciparum* infected erythrocytes is stage-specific (206). In other organisms, TAG serves as highly reduced store of oxidisable energy (206) and also plays a vital role in the homeostasis of the organism (207). The enzyme DAG kinase is responsible for the conversion of DAG to 1,2-diacyl-sn-glycerol-3-phosphate.
469
+
470
+ A comparison between the transcriptomic data obtained after treatment with A51B1C1\_1 and other Plasmodial perturbation studies was done to determine the level of specificity of the A51B1C1\_1 inhibited transcriptome dataset and to identify 81
471
+
472
+ {32}------------------------------------------------
473
+
474
+ transcripts due to the natural stress response by the parasite. These studies included treatment of *P. falciparum* with chloroquine, artemisinin, antifolates and effects of high temperatures together with the 20 perturbations investigated by Hu *et al.* (179). From these comparisons 579 transcripts were identified as unique to the transcriptomic profile of the parasites treated with the herbicide-derived compound A51B1C1\_1 (Table 2.7). Two transcripts involved in fatty acid or lipid synthesis were among the 579 unique transcripts (PFC0050 - long chain fatty acid ligase and PFI 1485 –DAG kinase.
backend/prompts/transition_prompt.txt CHANGED
@@ -25,5 +25,8 @@ TRANSITION INSTRUCTIONS:
25
  - Maintain momentum - this is a continuation, not a restart
26
  - Stay focused on phenomenological understanding and observables
27
 
 
 
 
28
  Start by acknowledging their {action} choice, then smoothly transition to introducing the new section and begin your first question about {next_chunk}. Don't get ahead of yourself, and adapt the difficulty of your questions based on the user's previous answers.
29
  Also, don't ask questions about material that you haven't talked about.
 
25
  - Maintain momentum - this is a continuation, not a restart
26
  - Stay focused on phenomenological understanding and observables
27
 
28
+ 5) Maintain the same type of persona as described before:
29
+ - No LLM-fluff. Assume the user is a genius, and just needs help with framing.
30
+
31
  Start by acknowledging their {action} choice, then smoothly transition to introducing the new section and begin your first question about {next_chunk}. Don't get ahead of yourself, and adapt the difficulty of your questions based on the user's previous answers.
32
  Also, don't ask questions about material that you haven't talked about.
frontend/src/components/DocumentProcessor.jsx CHANGED
The diff for this file is too large to render. See raw diff
 
frontend/src/hooks/useDocumentProcessor.js CHANGED
@@ -47,44 +47,44 @@ export const useDocumentProcessor = () => {
47
  // Use hardcoded chunks for the document
48
  const hardcodedChunks = [
49
  {
50
- "topic": "The Foundation: Proximal Policy Optimization (PPO)",
51
- "text": "### 4.1.1. From PPO to GRPO\n\nProximal Policy Optimization (PPO) (Schulman et al., 2017) is an actor-critic RL algorithm that is widely used in the RL fine-tuning stage of LLMs (Ouyang et al., 2022). In particular, it optimizes LLMs by maximizing the following surrogate objective:\n\n$$\\mathcal{J}_{PPO}(\\theta) = \\mathbb{E}\\left[q \\sim P(Q), o \\sim \\pi_{\\theta_{old}}(O|q)\\right] \\frac{1}{|o|} \\sum_{t=1}^{|o|} \\min\\left[\\frac{\\pi_{\\theta}(o_t|q, o_{\\leq t})}{\\pi_{\\theta_{old}}(o_t|q, o_{\\leq t})} A_t, \\text{clip}\\left(\\frac{\\pi_{\\theta}(o_t|q, o_{\\leq t})}{\\pi_{\\theta_{old}}(o_t|q, o_{\\leq t})}, 1 - \\varepsilon, 1 + \\varepsilon\\right) A_t\\right], \\tag{1}$$\n\nwhere $\\pi_{\\theta}$ and $\\pi_{\\theta_{old}}$ are the current and old policy models, and *q*, *o* are questions and outputs sampled from the question dataset and the old policy $\\pi_{\\theta_{old}}$ , respectively. $\\varepsilon$ is a clipping-related hyper-parameter introduced in PPO for stabilizing training. $A_t$ is the advantage, which is computed by applying Generalized Advantage Estimation (GAE) (Schulman et al., 2015), based on the rewards $\\{r_{\\geq t}\\}$ and a learned value function $V_{\\psi}$ . Thus, in PPO, a value function needs to be trained alongside the policy model and to mitigate over-optimization of the reward model, the standard approach is to add a per-token KL penalty from a reference model in the reward at each token (Ouyang et al., 2022), i.e.,\n\n$$r_t = r_{\\varphi}(q, o_{\\leq t}) - \\beta \\log \\frac{\\pi_{\\theta}(o_t|q, o_{\\leq t})}{\\pi_{ref}(o_t|q, o_{\\leq t})},\\tag{2}$$\n\nwhere $r_{\\varphi}$ is the reward model, $\\pi_{ref}$ is the reference model, which is usually the initial SFT model, and $\\beta$ is the coefficient of the KL penalty."
52
  },
53
  {
54
- "topic": "The Problem with PPO: Why a New Approach is Needed",
55
- "text": "As the value function employed in PPO is typically another model of comparable size as the policy model, it brings a substantial memory and computational burden. Additionally, during RL training, the value function is treated as a baseline in the calculation of the advantage for variance reduction. While in the LLM context, usually only the last token is assigned a reward score by the reward model, which may complicate the training of a value function that is accurate at each token."
56
  },
57
  {
58
- "topic": "The Solution: Introducing Group Relative Policy Optimization (GRPO)",
59
- "text": "To address this, as shown in Figure 4, we propose Group Relative Policy Optimization (GRPO), which obviates the need for additional value function approximation as in PPO, and instead uses the average reward of multiple sampled outputs, produced in response to the same question, as the baseline.\n\n![](_page_12_Figure_0.jpeg)\n\nFigure 4 | Demonstration of PPO and our GRPO. GRPO foregoes the value model, instead estimating the baseline from group scores, significantly reducing training resources."
60
  },
61
  {
62
- "topic": "The GRPO Objective Function (Equation 3)",
63
- "text": "More specifically, for each question $q$ , GRPO samples a group of outputs $\\{o_1, o_2, \\cdots, o_G\\}$ from the old policy $\\pi_{\\theta_{old}}$ and then optimizes the policy model by maximizing the following objective:\n\n$$\\mathcal{J}_{GRPO}(\\theta) = \\mathbb{E}[q \\sim P(Q), \\{o_{i}\\}_{i=1}^{G} \\sim \\pi_{\\theta_{old}}(O|q)]\\n$$\n\n$$\\n\\frac{1}{G} \\sum_{i=1}^{G} \\frac{1}{|o_{i}|} \\sum_{t=1}^{|o_{i}|} \\left\\{ \\min \\left[ \\frac{\\pi_{\\theta}(o_{i,t}|q, o_{i,< t})}{\\pi_{\\theta_{old}}(o_{i,t}|q, o_{i,< t})} \\hat{A}_{i,t}, \\operatorname{clip} \\left( \\frac{\\pi_{\\theta}(o_{i,t}|q, o_{i,< t})}{\\pi_{\\theta_{old}}(o_{i,t}|q, o_{i,< t})}, 1 - \\varepsilon, 1 + \\varepsilon \\right) \\hat{A}_{i,t} \\right] - \\beta \\mathbb{D}_{KL} \\left[ \\pi_{\\theta} || \\pi_{ref} \\right] \\right\\}, \\n$$\n(3)\n\nwhere $\\varepsilon$ and $\\beta$ are hyper-parameters, and $\\hat{A}_{i,t}$ is the advantage calculated based on relative rewards of the outputs inside each group only, which will be detailed in the following subsections."
64
  },
65
  {
66
- "topic": "Key Feature 1: Group Relative Advantage Calculation",
67
- "text": "The group relative way that GRPO leverages to calculate the advantages, aligns well with the comparative nature of rewards models, as reward models are typically trained on datasets of comparisons between outputs on the same question. Also note that, instead of adding KL penalty in the reward, GRPO regularizes by directly adding the KL divergence between the trained policy and the reference policy to the loss, avoiding complicating the calculation of $\\hat{A}_{i,t}$ ."
68
  },
69
  {
70
- "topic": "Key Feature 2: KL Divergence as a Direct Penalty (Equation 4)",
71
- "text": "And different from the KL penalty term used in (2), we estimate the KL divergence with the following unbiased estimator (Schulman, 2020):\n\n$$\\mathbb{D}_{KL}\\left[\\pi_{\\theta}||\\pi_{ref}\\right] = \\frac{\\pi_{ref}(o_{i,t}|q, o_{i,< t})}{\\pi_{\\theta}(o_{i,t}|q, o_{i,< t})} - \\log\\frac{\\pi_{ref}(o_{i,t}|q, o_{i,< t})}{\\pi_{\\theta}(o_{i,t}|q, o_{i,< t})} - 1,\\tag{4}$$\n\nwhich is guaranteed to be positive."
72
  },
73
  {
74
- "topic": "Application 1: Outcome Supervision RL with GRPO",
75
- "text": "#### 4.1.2. Outcome Supervision RL with GRPO\n\nFormally, for each question q, a group of outputs $\\{o_1, o_2, \\cdots, o_G\\}$ are sampled from the old policy model $\\pi_{\\theta_{old}}$ . A reward model is then used to score the outputs, yielding *G* rewards $\\mathbf{r} = \\{r_1, r_2, \\cdots, r_G\\}$ correspondingly. Subsequently, these rewards are normalized by subtracting the group average and dividing by the group standard deviation. Outcome supervision provides the normalized reward at the end of each output $o_i$ and sets the advantages $\\hat{A}_{i,t}$ of all tokens in the output as the normalized reward, i.e., $\\hat{A}_{i,t} = \\widetilde{r}_i = \\frac{r_i - \\text{mean}(\\mathbf{r})}{\\text{std}(\\mathbf{r})}$ , and then optimizes the policy by maximizing the objective defined in equation $(3)$ ."
76
  },
77
  {
78
- "topic": "Application 2: Process Supervision RL with GRPO",
79
- "text": "### 4.1.3. Process Supervision RL with GRPO\n\nOutcome supervision only provides a reward at the end of each output, which may not be sufficient and efficient to supervise the policy in complex mathematical tasks. Following Wang et al. (2023b), we also explore process supervision, which provides a reward at the end of each reasoning step. Formally, given the question q and G sampled outputs $\\{o_1, o_2, \\cdots, o_G\\}$ , a process reward model is used to score each step of the outputs, yielding corresponding rewards: $\\mathbf{R} = \\{\\{r_1^{index(1)}, \\cdots, r_1^{index(K_1)}\\}, \\cdots, \\{r_G^{index(1)}, \\cdots, r_G^{index(K_G)}\\}\\}, \\text{ where } index(j) \\text{ is the end token index}$ of the $j$ -th step, and $K_i$ is the total number of steps in the $i$ -th output. We also normalize these rewards with the average and the standard deviation, i.e., $\\widetilde{r}_{i}^{\\text{index}(j)} = \\frac{r_{i}^{\\text{index}(j)} - \\text{mean}(\\mathbf{R})}{\\text{std}(\\mathbf{R})}$ . Subsequently, the process supervision calculates the advantage of each token as the sum of the normalized rewards from the following steps, i.e., $\\hat{A}_{i,t} = \\sum_{index(j) \\ge t} \\tilde{r}_i^{index(j)}$ , and then optimizes the policy by maximizing the objective defined in equation $(3)$ ."
80
  },
81
  {
82
- "topic": "The Full Training Loop: Iterative RL with GRPO",
83
- "text": "### 4.1.4. Iterative RL with GRPO\n\nAs the reinforcement learning training process progresses, the old reward model may not be sufficient to supervise the current policy model. Therefore, we also explore the iterative RL with GRPO. As shown in Algorithm 1, in iterative GRPO, we generate new training sets for the reward model based on the sampling results from the policy model and continually train the old reward model using a replay mechanism that incorporates 10% of historical data. Then, we set the reference model as the policy model, and continually train the policy model with the new reward model.\n\n### **Algorithm 1** Iterative Group Relative Policy Optimization\n\n**Input** initial policy model $\\pi_{\\theta_{\\text{init}}}$ ; reward models $r_{\\varphi}$ ; task prompts $\\mathcal{D}$ ; hyperparameters $\\varepsilon$ , $\\beta$ , $\\mu$ \n\n- 1: policy model $\\pi_{\\theta} \\leftarrow \\pi_{\\theta_{\\text{init}}}$ 2: **for** iteration = $1, \\ldots, I$ **do** 3: reference model $\\pi_{ref} \\leftarrow \\pi_{\\theta}$\n- 4: for step = $1, \\ldots, M$ do\n- Sample a batch $\\mathcal{D}_b$ from $\\mathcal{D}$ 5:\n- Update the old policy model $\\pi_{\\theta_{old}} \\leftarrow \\pi_{\\theta}$ 6:\n- 7:\n- Sample *G* outputs $\\{o_i\\}_{i=1}^G \\sim \\pi_{\\theta_{old}}(\\cdot \\mid q)$ for each question $q \\in \\mathcal{D}_b$ <br>Compute rewards $\\{r_i\\}_{i=1}^G$ for each sampled output $o_i$ by running $r_{\\varphi}$ 8:\n- Compute $\\hat{A}_{i,t}$ for the *t*-th token of $o_i$ through group relative advantage estimation. 9:\n- **for** GRPO iteration = $1, \\ldots, \\mu$ **do** 10:\n- Update the policy model $\\pi_{\\theta}$ by maximizing the GRPO objective (Equation 21) 11:\n- 12: Update $r_{\\varphi}$ through continuous training using a replay mechanism.\n\nOutput $\\pi_{\\theta}$ "
84
  },
85
  {
86
- "topic": "Why GRPO Makes Sense: The Benefit of a Graded Reward Signal",
87
- "text": "The algorithm processes the reward signal to the gradient coefficient to update the model parameter. We divide the reward function as 'Rule' and 'Model' in our experiments. Rule refers to judging the quality of a response based on the correctness of the answer, and Model denotes that we train a reward model to score each response. The training data of the reward model is based on the rule judgment. Equations 10 and 21 highlight a key difference between GRPO and Online RFT: GRPO uniquely adjusts its gradient coefficient based on the reward value provided by the reward model. This allows for differential reinforcement and penalization of responses according to their varying magnitudes. In contrast, Online RFT lacks this feature; it does not penalize incorrect responses and uniformly reinforces all responses with correct answers at the same level of intensity.\n\nAs demonstrated in Figure 5, GRPO surpasses online RFT, thereby highlighting the efficiency of altering positive and negative gradient coefficients. In addition, GRPO+PS shows superior performance compared to GRPO+OS, indicating the benefits of using fine-grained, step-aware gradient coefficients. Furthermore, we explore the iterative RL, in our experiments, we conduct two rounds of iteration. As shown in Figure 6, we notice that the iterative RL significantly improves the performance, especially at the first iteration.\n\n![](_page_18_Figure_2.jpeg)\n\nFigure 5 | Performance of the DeepSeekMath-Instruct 1.3B model, which was further trained using various methods, on two benchmarks."
88
  }
89
  ];
90
 
 
47
  // Use hardcoded chunks for the document
48
  const hardcodedChunks = [
49
  {
50
+ "topic": "The Rationale: Why Target a Plant-Like Organelle in the Malaria Parasite?",
51
+ "text": "With *P. falciparum* containing a plant-like plastid apicoplast, the membranes surrounding the apicoplast suggested that glycerolipids that are unique to algae and plant plastids might also be present in the apicoplast membranes. As such, MGDG and DGDG may be present in Apicomplexa like *P. falciparum* and *T. gondii*. Investigations of galactolipid biosynthesis and content in membranes of these Apicomplexa have indicated that radioactively labelled UPD-galactose is incorporated into both MGDG and DGDG. The latter was also immunologically detected in parasite lysates (Figure 1.10). Distinct enzymatic processes or amino acid derivation of the synthases were involved since no clear identification of MGDG or DGDG synthase orthologues could be identified in the *P. falciparum* genome utilising only bioinformatic searches (119)."
52
  },
53
  {
54
+ "topic": "The Strategy: Using Herbicides as a Starting Point for Antimalarials",
55
+ "text": "MGDG synthase has been shown to be essential to plant cell growth, with knock-outs of MGD1 in *Arabidopsis* as a member of the multigene MGDG synthase family, leading to a complete lack of chlorophyll, chloroplast ultrastructure disruption and severe plant growth inhibition (118). This data provides support for galactolipid biosynthesis as a valid growth inhibition strategy. As this process is unique in *P. falciparum* and not found in humans, it is an enticing strategy for the development of novel antimalarials."
56
  },
57
  {
58
+ "topic": "The Specific Compound: Introducing A51B1C1_1",
59
+ "text": "Due to the presence of MGDG and DGDG in the plant-derived apicoplast of $P.$ falciparum, an interesting speculation is that Galvestine-1 and its derivatives might have growth-inhibitory capacity against the malaria parasite by targeting lipid biosynthesis processes in the apicoplast. This study therefore presents the determination of the antimalarial property of one of the lead MGDG synthase inhibitors from the Botté study, A51B1C1_1 (Figure 1.13).\n\nOne major advantage of this strategy would be that these compounds are herbicidederived and could therefore, if they are active against *P. falciparum*, prove to be highly selective to the parasite without targeting any metabolic process in humans. This compound furthermore provides a novel chemical scaffold unrelated to any current antimalarials, which would be a novel action in the parasite compared to currently used antimalarials, and be able to overcome the resistance mechanisms against current antimalarials. Thus, if these compounds prove to be active against the malaria parasite, they may be developed into new antimalarial drugs."
60
  },
61
  {
62
+ "topic": "The Research Aims: A Multi-Faceted Approach",
63
+ "text": "The primary objective of this study was to determine the antimalarial potential of compound A51B1C1_1 as well as the physiological response of *P. falciparum* after treatment with this compound by employing a comprehensive functional genomics approach.\n\nChapter 2 focuses on determining the antimalarial activity of A51B1C1_1 through morphological investigation of *P. falciparum* after treatment with this compound. This is followed by a complete transcriptome analysis employing DNA microarray to identify responsive transcripts in *P. falciparum* that were differentially regulated upon treatment with this compound.\n\nChapter 3 introduces the use of higher-level functional genomics analyses of the response of *P. falciparum* to A51B1C1 1 treatment by investigating the proteome of the parasites after perturbation with this herbicide-derived compound."
64
  },
65
  {
66
+ "topic": "Method 1: Measuring Potency (IC50 Determination)",
67
+ "text": "#### 2.3.1 $IC_{50}$ determinations\n\nDose-response curves were established to determine the median inhibitory concentration ( $IC_{50}$ ) of the herbicide derivative A51B1C1_1 using a fluorescent SYBR Green I assay (MSF assay) on the chloroquine-sensitive *P. falciparum* strain 3D7. The average $IC_{50}$ value of A51B1C1_1 determined in four individual experiments was found to be 447 ±16 nM (Figure 2.5)."
68
  },
69
  {
70
+ "topic": "Method 2: Observing the Physical Effects (Morphology Studies)",
71
+ "text": "#### 2.3.2 Morphology studies\n\nTwo independent *P. falciparum* (3D7) parasite cultures were treated at $2xIC_{50}$ A51B1C1_1 and Galvestine-2 (Data for Galvestine-2 treated *P. falciparum* were obtained from a previous study by Mr J.C. Verlinden (166) and was included in this study as an additional analogue of the Galvestine-1 parent compound) and the effects on the morphology of the parasites were observed for 72 h. [...] In contrast *P. falciparum* parasites treated with A51B1C1_1 continued to show similarities to the control culture through the ring stage and early trophozoite stages. However, at 48 hpi, the control untreated parasites entered the merozoite stage, but the A51B1C1_1 treated parasites became pyknotic and remained so for the rest of the life cycle and was unable to progress to a new life cycle. These parasites could not invade new erythrocytes and form new rings, unlike the control culture."
72
  },
73
  {
74
+ "topic": "Interpretation: The Compound Is Potent and Fast-Acting",
75
+ "text": "The IC<sub>50</sub> of compound A51B1C1 1 determined in a preliminary study in Grenoble, France (collaborator E. Maréchal), using the <sup>3</sup>H-hypoxanthine incorporation method, was found to be 180 nM (Personal communication, Eric Maréchal, Pretoria, 2009). The IC<sub>50</sub> value determined in this study using the SYBR Green I method was shown to be $447 \\pm 16$ nM. The different values may be due to the techniques used to determine the $IC_{50}$ as was previously observed (186, 191, 192). Values below 1 $\\mu$ M comply with the MMV requirements (www.mmv.org) and are an indication that a compound may have potential against *P. falciparum*."
76
  },
77
  {
78
+ "topic": "Interpretation: A Lack of 'Delayed Death' Suggests a Non-Housekeeping Target",
79
+ "text": "Delayed death is a phenomenon where the perturbation only causes the death of the parasite in the next life cycle. [...] In contrast, drugs affecting non-housekeeping processes in the apicoplast of *P. falciparum* have been shown to not display delayed death phenomena (126). These drugs result in visible stress symptoms and growth arrest of the parasites within the first life cycle after exposure. Analyses of *P. falciparum* parasites treated with the herbicide derivative A51B1C1 1 $(2xIC_{50})$ revealed morphological signs of stress in the form of pyknotic parasites within the first 48 h. [...] Stressed forms of *P. falciparum* parasites under A51B1C1 1 pressure could indicate that either 1) this compound targets non-housekeeping processes of the apicoplast with no delayed death phenotype; or 2) the compound targets another biological process in the parasite not associated with the apicoplast."
80
  },
81
  {
82
+ "topic": "Method 3: Pinpointing the Target with Transcriptomics",
83
+ "text": "Using the information obtained from the morphological studies, the microarray study was designed and two time points were selected for the extraction of RNA from synchronised parasites. The background noise from multiple life stages is reduced in synchronised cultures enabling the detection of abundance differences in transcript levels above the normal levels in the IDC (158). [...] The time points selected for the sampling of RNA was 28 hpi and 36 hpi, which covers the life stages in which a morphological effect is seen after treatment with A51B1C1_1."
84
  },
85
  {
86
+ "topic": "Result: Transcript Data Confirms Lipid Metabolism as the Target",
87
+ "text": "The 1504 transcript data set includes ten transcripts (three with increased abundance and seven with decreased abundance, Table 2.6) involved in lipid biosynthesis or fatty acid biosynthesis. The presence of these transcripts (all have Log<sub>2</sub> FC of about 1) in the data set indicates the effect of the treatment on lipid biosynthesis (three in glycerophospholipid metabolism, Figure 2.20, and two in glycerolipid metabolism, Figure 2.21). In glycerophospholipid metabolism, three transcripts were affected of which one increased in abundance (PF14_0097, EC 2.7.7.41 Cytidine-diphosphate-DAG synthase) and two with decreased abundance (PFI 1370, EC 4.1.1.65 -Phosphatidyl serine-decarboxylase and PF14 0020, EC 2.7.1.32 Choline kinase) (Figure 2.20). In glycerolipid metabolism two transcripts (PFC0995c, EC 2.3.1.20 -DAG O-acyltransferase and PFI 1485, EC 2.7.1.107 DAG kinase) both decreased in abundance (Figure 2.21)."
88
  }
89
  ];
90