File size: 27,331 Bytes
939aa1c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6b72c99
939aa1c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6b72c99
939aa1c
 
 
 
 
 
 
 
 
 
 
6b72c99
939aa1c
 
 
 
 
 
 
 
 
6b72c99
939aa1c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6b72c99
939aa1c
 
 
 
 
 
 
 
 
 
 
 
6b72c99
939aa1c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6b72c99
939aa1c
 
 
 
 
 
 
6b72c99
939aa1c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6b72c99
939aa1c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6b72c99
939aa1c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6b72c99
939aa1c
 
 
 
 
 
 
 
 
 
6b72c99
939aa1c
 
 
 
 
 
 
 
 
 
 
6b72c99
939aa1c
 
 
 
 
 
 
 
 
6b72c99
939aa1c
 
 
 
 
 
 
 
 
 
6b72c99
939aa1c
 
6b72c99
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
939aa1c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "a386eccf-6880-46e2-91e0-c54836a41b1a",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/lynn/Documents/Twinkl/grant-applications-app/.venv/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
      "  from .autonotebook import tqdm as notebook_tqdm\n"
     ]
    }
   ],
   "source": [
    "import pandas as pd\n",
    "from transformers import pipeline\n",
    "import time"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "9ac61aaf-9e36-4aef-98d4-e54a4ac6e643",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "960\n",
      "Index(['Id', 'Date/Time Requested', 'Giveaway Title', 'Customer Name',\n",
      "       'Email Address', 'School Name', 'Postal Address', 'Address Line 2',\n",
      "       'Address City', 'Postcode', 'Additional Info', 'Unnamed: 11'],\n",
      "      dtype='object')\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Id</th>\n",
       "      <th>Date/Time Requested</th>\n",
       "      <th>Giveaway Title</th>\n",
       "      <th>Customer Name</th>\n",
       "      <th>Email Address</th>\n",
       "      <th>School Name</th>\n",
       "      <th>Postal Address</th>\n",
       "      <th>Address Line 2</th>\n",
       "      <th>Address City</th>\n",
       "      <th>Postcode</th>\n",
       "      <th>Additional Info</th>\n",
       "      <th>Unnamed: 11</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>304399.0</td>\n",
       "      <td>01/03/2025 00:52</td>\n",
       "      <td>March Community Collection</td>\n",
       "      <td>Susan Bushnell</td>\n",
       "      <td>susan.bushnell@googlemail.com</td>\n",
       "      <td>Southfield Junior School</td>\n",
       "      <td>Shrivenham Road</td>\n",
       "      <td>Highworth</td>\n",
       "      <td>Swindon</td>\n",
       "      <td>SN6 7BZ</td>\n",
       "      <td>I would love to use it to spread the love of r...</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>305004.0</td>\n",
       "      <td>02/03/2025 19:52</td>\n",
       "      <td>March Community Collection</td>\n",
       "      <td>Sarah Arabestani</td>\n",
       "      <td>sarah.a@sandringhamnursery.com</td>\n",
       "      <td>Sandringham Nursery</td>\n",
       "      <td>16 Sandringham Road</td>\n",
       "      <td>Penylan</td>\n",
       "      <td>Cardiff</td>\n",
       "      <td>CF23 5BJ</td>\n",
       "      <td>We would like to introduce early years yoga an...</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>305493.0</td>\n",
       "      <td>05/03/2025 14:34</td>\n",
       "      <td>March Community Collection</td>\n",
       "      <td>Rebecca Asker</td>\n",
       "      <td>mrsrasker@gmail.com</td>\n",
       "      <td>Newhaven PRU Outreach</td>\n",
       "      <td>Newhaven Gardens</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Greenwich</td>\n",
       "      <td>SE96HR</td>\n",
       "      <td>£500 would enable us to set up a small sensor...</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         Id Date/Time Requested              Giveaway Title     Customer Name  \\\n",
       "0  304399.0    01/03/2025 00:52  March Community Collection    Susan Bushnell   \n",
       "1  305004.0    02/03/2025 19:52  March Community Collection  Sarah Arabestani   \n",
       "2  305493.0    05/03/2025 14:34  March Community Collection     Rebecca Asker   \n",
       "\n",
       "                    Email Address               School Name  \\\n",
       "0   susan.bushnell@googlemail.com  Southfield Junior School   \n",
       "1  sarah.a@sandringhamnursery.com       Sandringham Nursery   \n",
       "2             mrsrasker@gmail.com     Newhaven PRU Outreach   \n",
       "\n",
       "        Postal Address Address Line 2 Address City  Postcode  \\\n",
       "0      Shrivenham Road      Highworth      Swindon   SN6 7BZ   \n",
       "1  16 Sandringham Road        Penylan      Cardiff  CF23 5BJ   \n",
       "2     Newhaven Gardens            NaN    Greenwich    SE96HR   \n",
       "\n",
       "                                     Additional Info Unnamed: 11  \n",
       "0  I would love to use it to spread the love of r...              \n",
       "1  We would like to introduce early years yoga an...              \n",
       "2  £500 would enable us to set up a small sensor...              "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('data/feb-march-data.csv')\n",
    "\n",
    "print(len(df))\n",
    "print(df.columns)\n",
    "\n",
    "df.head(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "cdc0e046-0490-45c0-94e2-902ec9a22248",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['id', 'date/time_requested', 'giveaway_title', 'customer_name',\n",
       "       'email_address', 'school_name', 'postal_address', 'address_line_2',\n",
       "       'address_city', 'postcode', 'additional_info', 'unnamed:_11'],\n",
       "      dtype='object')"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.columns = df.columns.str.lower().str.replace(' ','_')\n",
    "\n",
    "df.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "001e0102-f2f6-4812-99ec-6c78d218cb15",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Device set to use mps:0\n"
     ]
    }
   ],
   "source": [
    "# Initialize the emotion classification pipeline\n",
    "emotion_classifier = pipeline('text-classification', model='SamLowe/roberta-base-go_emotions', top_k=None)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "de0b70c2-e3a2-430f-988b-cc9a5e3b06c4",
   "metadata": {},
   "source": [
    "Let's th eclassifier for a single application to know what kind of output we can expect"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "e6548cb4-e763-4758-9d58-92ac880485b2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>additional_info</th>\n",
       "      <th>word_count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>I would love to use it to spread the love of r...</td>\n",
       "      <td>69</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>We would like to introduce early years yoga an...</td>\n",
       "      <td>46</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>£500 would enable us to set up a small sensor...</td>\n",
       "      <td>86</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                     additional_info  word_count\n",
       "0  I would love to use it to spread the love of r...          69\n",
       "1  We would like to introduce early years yoga an...          46\n",
       "2  £500 would enable us to set up a small sensor...          86"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# first let's find a long applciation\n",
    "\n",
    "df['word_count'] = df['additional_info'].apply(lambda x: len(str(x).split()))\n",
    "\n",
    "df[['additional_info', 'word_count']].head(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "d0e5c20e-ccc4-470a-aafb-7a834d633d17",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Horton Grange Primary School is located in the Great Horton area of Bradford, West Yorkshire, serving a well-established Pakistani community. The school caters to children aged 2 to 11, with a total enrollment of approximately 726 pupils. The student body is diverse, with 91% of students using English as an additional language. Additionally, 21% of students have Special Educational Needs, and 32 students are considered disadvantaged. In terms of academic performance, 66% of pupils meet the expected standards in reading, writing, and mathematics at Key Stage 2, with 11% achieving a higher standard. The school's most recent Ofsted inspection in September 2024 rated the quality of education as 'Good' and highlighted 'Outstanding' ratings for both behaviour and attitudes, and personal development. The school is part of the Exceed Academies Trust and is led by Headteacher Miss Rebecca Marshall. It offers a range of extracurricular activities, including educational outings and experiences that enrich learning. The school's ethos emphasizes creating a safe and happy environment where children can flourish and begin their lifelong learning journey.\n",
      "In terms of the money we can get, the school will be able to use this for the following;\n",
      "Books for the Library – Purchase diverse and engaging books to encourage reading.\n",
      "Stationery and Art Supplies – Stock up on essentials like notebooks, pencils, and craft materials.\n",
      "Maths and Science Kits – Hands-on resources to support STEM learning.\n",
      "Tablets or Learning Apps – A contribution toward devices or educational subscriptions.\n",
      "Headphones for ICT Use – Useful for online learning and accessibility.\n",
      "Sensory Equipment – Support students with additional needs by purchasing fidget toys, weighted blankets, or calming tools.\n",
      "Outdoor Play Resources – Items like skipping ropes, hula hoops, etc.\n"
     ]
    }
   ],
   "source": [
    "test_application = df.loc[df['word_count'].idxmax(), 'additional_info']\n",
    "\n",
    "print(test_application)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "2f6f9d3d-71c6-4c0c-8aea-5cd17ba81fce",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[[{'label': 'neutral', 'score': 0.5856776237487793},\n",
       "  {'label': 'approval', 'score': 0.4809421896934509},\n",
       "  {'label': 'admiration', 'score': 0.08151720464229584},\n",
       "  {'label': 'realization', 'score': 0.027009719982743263},\n",
       "  {'label': 'optimism', 'score': 0.018872186541557312},\n",
       "  {'label': 'caring', 'score': 0.010474747978150845},\n",
       "  {'label': 'disapproval', 'score': 0.008761710487306118},\n",
       "  {'label': 'annoyance', 'score': 0.0053910380229353905},\n",
       "  {'label': 'disappointment', 'score': 0.004287778399884701},\n",
       "  {'label': 'relief', 'score': 0.004131054040044546},\n",
       "  {'label': 'pride', 'score': 0.004071239847689867},\n",
       "  {'label': 'joy', 'score': 0.00316843600012362},\n",
       "  {'label': 'gratitude', 'score': 0.002936755074188113},\n",
       "  {'label': 'desire', 'score': 0.0022684938739985228},\n",
       "  {'label': 'sadness', 'score': 0.0020141159184277058},\n",
       "  {'label': 'love', 'score': 0.0017706562066450715},\n",
       "  {'label': 'confusion', 'score': 0.0015989093808457255},\n",
       "  {'label': 'excitement', 'score': 0.0014885042328387499},\n",
       "  {'label': 'disgust', 'score': 0.0008889383752830327},\n",
       "  {'label': 'curiosity', 'score': 0.0008583810413256288},\n",
       "  {'label': 'anger', 'score': 0.0007978692301549017},\n",
       "  {'label': 'fear', 'score': 0.0007784575573168695},\n",
       "  {'label': 'grief', 'score': 0.0005341923679225147},\n",
       "  {'label': 'remorse', 'score': 0.0005059443647041917},\n",
       "  {'label': 'nervousness', 'score': 0.0004846185038331896},\n",
       "  {'label': 'embarrassment', 'score': 0.00036206969525665045},\n",
       "  {'label': 'surprise', 'score': 0.00034910510294139385},\n",
       "  {'label': 'amusement', 'score': 0.0003461150045040995}]]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "emotions = emotion_classifier(test_application)\n",
    "emotions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "a6a954f4-ad3d-40f6-84cb-a8b83ed912f5",
   "metadata": {},
   "outputs": [],
   "source": [
    "more_intense_example = \"\"\"\"I run outdoor learning at my school, we are in a very deprived area, in a central area of Milton Keynes in the middle of a housing estate, the school has very little. I currently pay for as much as possible. We have now got an allotment area, but really need some help with filling it!! We are trying hard to run outdoor learning sessions, as we have so many children without gardens who live in converted shipping containers and their faces when they tackle ceratin skills and plant and see things grow is beautiful. please please help us!!!\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "9af1aceb-11f5-4755-af2f-de08bd18be6c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[[{'label': 'admiration', 'score': 0.6447672843933105},\n",
       "  {'label': 'desire', 'score': 0.3310450613498688},\n",
       "  {'label': 'optimism', 'score': 0.08578585088253021},\n",
       "  {'label': 'approval', 'score': 0.08077544718980789},\n",
       "  {'label': 'neutral', 'score': 0.0645425021648407},\n",
       "  {'label': 'disappointment', 'score': 0.034120526164770126},\n",
       "  {'label': 'caring', 'score': 0.026644788682460785},\n",
       "  {'label': 'sadness', 'score': 0.014421308413147926},\n",
       "  {'label': 'annoyance', 'score': 0.01335776224732399},\n",
       "  {'label': 'disapproval', 'score': 0.012340354733169079},\n",
       "  {'label': 'excitement', 'score': 0.008422383107244968},\n",
       "  {'label': 'realization', 'score': 0.008331868797540665},\n",
       "  {'label': 'love', 'score': 0.007035400252789259},\n",
       "  {'label': 'curiosity', 'score': 0.006367261987179518},\n",
       "  {'label': 'pride', 'score': 0.005947182886302471},\n",
       "  {'label': 'joy', 'score': 0.004438790027052164},\n",
       "  {'label': 'gratitude', 'score': 0.0034672864712774754},\n",
       "  {'label': 'relief', 'score': 0.0027506654150784016},\n",
       "  {'label': 'disgust', 'score': 0.0023999481927603483},\n",
       "  {'label': 'confusion', 'score': 0.0022132378071546555},\n",
       "  {'label': 'grief', 'score': 0.002203976968303323},\n",
       "  {'label': 'anger', 'score': 0.001952152932062745},\n",
       "  {'label': 'nervousness', 'score': 0.0017981012351810932},\n",
       "  {'label': 'surprise', 'score': 0.0016994696343317628},\n",
       "  {'label': 'fear', 'score': 0.0016026603989303112},\n",
       "  {'label': 'remorse', 'score': 0.0015317321522161365},\n",
       "  {'label': 'embarrassment', 'score': 0.0006112103583291173},\n",
       "  {'label': 'amusement', 'score': 0.0006051507662050426}]]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "emotions = emotion_classifier(more_intense_example)\n",
    "\n",
    "emotions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "a7049b70-c14a-4475-a852-7938de44850e",
   "metadata": {},
   "outputs": [],
   "source": [
    "def classify_emotions(text):\n",
    "    \"\"\"\n",
    "    Classifies the emotions in a given text and returns a dictionary of emotions and their scores.\n",
    "    \"\"\"\n",
    "    if isinstance(text, str):\n",
    "        emotions = emotion_classifier(text)\n",
    "        # The output is a list of lists of dictionaries. We need to process it.\n",
    "        if emotions and isinstance(emotions[0], list):\n",
    "            return {item['label']: item['score'] for item in emotions[0]}\n",
    "        return {}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "e348575d-df4c-47d3-a74a-3135d1b467d6",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processing Time: 17.377334117889404 seconds\n"
     ]
    }
   ],
   "source": [
    "start_time = time.time()\n",
    "\n",
    "df['emotion'] = df['additional_info'].map(classify_emotions)\n",
    "\n",
    "end_time = time.time()\n",
    "\n",
    "print(f\"Processing Time: {end_time-start_time} seconds\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "09b69317-9052-4232-839b-a3989a9456a4",
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_top_emotion(emotion_dict):\n",
    "    \"\"\"\n",
    "    Returns the emotion with the highest score from a dictionary of emotions.\n",
    "    \"\"\"\n",
    "    if not emotion_dict:\n",
    "        return None, 0.0\n",
    "    top_emotion = max(emotion_dict, key=emotion_dict.get)\n",
    "    return top_emotion, emotion_dict[top_emotion]\n",
    "\n",
    "# Apply the function to the 'emotions' column\n",
    "df[['top_emotion', 'top_emotion_score']] = df['emotion'].apply(get_top_emotion).apply(pd.Series)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "7864a200-28a8-476a-8486-d6cdaa7090e6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "top_emotion\n",
       "neutral           319\n",
       "desire            212\n",
       "approval           88\n",
       "love               79\n",
       "gratitude          58\n",
       "admiration         51\n",
       "optimism           36\n",
       "disappointment     16\n",
       "excitement         15\n",
       "caring             12\n",
       "sadness            11\n",
       "joy                 9\n",
       "realization         2\n",
       "nervousness         1\n",
       "confusion           1\n",
       "disapproval         1\n",
       "Name: count, dtype: int64"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['top_emotion'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "28d7fd5a-1be6-4311-a0bb-937852040690",
   "metadata": {},
   "outputs": [],
   "source": [
    "not_list = ['neutral', 'approval', 'love', 'admiration', 'optimism', 'excitement', 'joy']\n",
    "\n",
    "not_df = df[~df['top_emotion'].isin(not_list)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "d9bb5250-4da8-4a57-bf62-2baa0390b6ff",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "363"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(not_df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "3be2721c-9ce9-4e2c-a99e-e60ebc152891",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "top_emotion\n",
       "desire            212\n",
       "gratitude          58\n",
       "disappointment     16\n",
       "caring             12\n",
       "sadness            11\n",
       "realization         2\n",
       "nervousness         1\n",
       "confusion           1\n",
       "disapproval         1\n",
       "Name: count, dtype: int64"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "not_df['top_emotion'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "c88f6008-aac0-47f3-bd9d-d0f9807955c0",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "293 We're a small, rural school in deficit as pupil numbers country-wide are dwindling. We need any extra money we can to buy new resources for different subjects e.g. art supplies, Maths equipment, science resources.\n",
      "\n",
      "------------------------------------------------------------\n",
      "\n",
      "823 Due to the cohort that we have this year lots of our toys and resources have been broken.\n",
      "We would love to be able to replace them for the other children to enjoy. Due to our budget we are not in a good place and as such we have cars with no garage, and dolls houses with no furniture or dolls\n",
      "\n",
      "------------------------------------------------------------\n",
      "\n",
      "319 My school is very small. We don't have much in our budget in fact we cannot ask for anything until the new financial year...\n",
      "The school is looking rather tired and the toys for the children are mostly broken...\n",
      "We don't have much for our SEN children as well... we wish we could have the resources for them to help regulate them but it is just not in the budget...\n",
      "\n",
      "------------------------------------------------------------\n",
      "\n",
      "691 We are very lucky to have a forest school area in our school. But unfortunately, our Forest School lead passed away last year. It was a big shock to the school community. For this reason, we are trying to raise some funds to re-instate the wildlife pond that he was passionate about. It was one of the projects he was planning to do before he passed and as a nod to him, we want to complete this. To do this we need to empty to water out of the pond as it’s stagnant. Clean out the pond and re-fill it. There are a few trees that need to come down around the pond as well. We were hoping to install some walkways so help children assess the pond for pond dipping to help with education. This will also help local wildlife and have a positive effect on the fauna and flour of the forest school area. As you can imagine this is a costly project to do and any help would be amazing.\n",
      "\n",
      "------------------------------------------------------------\n",
      "\n",
      "739 Unfortunately most of our SEN resources need upgrading as we have a lot more children being diagnosed with additional needs in our school and we need specialist equipment to support them whilst in school.\n",
      "\n",
      "------------------------------------------------------------\n",
      "\n",
      "676 Our school is located in a deprived area of Birmingham. We would use the money to buy resources to support the children with their learning. At the moment there are lots of resources/additional experiences which we have found that would support the children however with school budget cuts we are unable to afford them.\n",
      "\n",
      "------------------------------------------------------------\n",
      "\n",
      "151 We are in a very deprived area and often have quite high turnover of children meaning that we have low amount of children across the school and classes that are not full. This has an effect on the funding that we get. We are very low in resources so things like: colouring in books, board games for indoor lunch provisions, glue sticks and whiteboard pens, nice paper and card. This would be lovely for our children.\n",
      "\n",
      "------------------------------------------------------------\n",
      "\n",
      "139 The school is in a deprived area and every little helps.\n",
      "\n",
      "------------------------------------------------------------\n",
      "\n",
      "32 The library is under going a much needed make over, unfortunately the cost is much greater than previously estimated. The £500 would help buy new table and bean bags to boost the comfort of the library significantly. The last time the library was improved was in the late 80s, it is time to reignite library usage within my school.\n",
      "\n",
      "------------------------------------------------------------\n",
      "\n",
      "12 Our school library is in desperate need of more cosy seating. We've had a big push to encourage reading throughout school over the last year and the library has become really popular. This does mean we don't have enough comfy seating or books to keep up with demand.\n",
      "\n",
      "------------------------------------------------------------\n",
      "\n"
     ]
    }
   ],
   "source": [
    "for idx, record in not_df[not_df['top_emotion'] == 'disappointment'].sample(10).iterrows():\n",
    "    print(idx, record['additional_info'])\n",
    "    print('\\n' + '-'*60 + '\\n')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c1b12bf5-687a-480f-94a9-1bdcc27e93f3",
   "metadata": {},
   "source": [
    "## Insights\n",
    "\n",
    "After exploring some applications in this subset of emotions, `sadness`, `caring`, and `disappointment` appear to be the ones that most closely align with with the expectations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8e0bb77f-09a7-4fa2-a00b-cc20351ff5b6",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}