Spaces:

ACRLab
/

FraleyLabAttachmentBot

Sleeping

App Files Files Community

AjithKSenthil commited on May 16, 2023

Commit

5aab893

1 Parent(s): 840f6e0

added guide in comments for adding additional features

Browse files

Files changed (1) hide show

ChatAttachmentAnalysis.py +24 -0

ChatAttachmentAnalysis.py CHANGED Viewed

@@ -13,6 +13,7 @@ df = pd.read_csv(datafile_path)
 # Convert embeddings to numpy arrays
 df['embedding'] = df['embedding'].apply(lambda x: [float(num) for num in x.strip('[]').split(',')])
 # Split the data into features (X) and labels (y)
 X = list(df.embedding.values)
 y = df[['avoide', 'avoida', 'avoidb', 'avoidc', 'avoidd', 'anxietye', 'anxietya', 'anxietyb', 'anxietyc', 'anxietyd']].values
@@ -49,3 +50,26 @@ print(f"Chat transcript embeddings performance: mse={mse:.2f}, mae={mae:.2f}")
 # Both MSE and MAE are loss functions that we want to minimize. Lower values for both indicate better model performance.
 # In general, the lower these values, the better the model's predictions are.

 # Convert embeddings to numpy arrays
 df['embedding'] = df['embedding'].apply(lambda x: [float(num) for num in x.strip('[]').split(',')])
 # Split the data into features (X) and labels (y)
 X = list(df.embedding.values)
 y = df[['avoide', 'avoida', 'avoidb', 'avoidc', 'avoidd', 'anxietye', 'anxietya', 'anxietyb', 'anxietyc', 'anxietyd']].values
 # Both MSE and MAE are loss functions that we want to minimize. Lower values for both indicate better model performance.
 # In general, the lower these values, the better the model's predictions are.
+# Guide for adding additional features to improve performance:
+# Additional Features Extraction
+# To add new features to the data, you will need to create new columns in the DataFrame
+# Each new feature will be a new column, which can be created by applying a function to the text data
+# For example, if you were adding a feature for the length of the chat, you would do something like this:
+# df['text_length'] = df['ChatTranscript'].apply(len)
+# If you are using an external library to compute a feature (like NLTK for tokenization or sentiment analysis), you would need to import that library and use its functions.
+# For example, to compute a sentiment score with TextBlob, you might do something like this:
+# from textblob import TextBlob
+# df['sentiment'] = df['ChatTranscript'].apply(lambda text: TextBlob(text).sentiment.polarity)
+# Make sure to handle any potential errors or exceptions in your function.
+# For example, if a chat is empty, trying to compute its length or sentiment might cause an error.
+# After you've added your new features, you can include them in your model by adding them to your features array when you split the data into training and testing sets.
+# For example, if 'text_length' and 'sentiment' are new features, you might do this:
+# X = df[['embedding', 'text_length', 'sentiment']].values
+# Always be sure to check your data after adding new features to make sure everything looks correct.