Update README files with latest improvements and features
Browse files- README.hf.md +17 -3
- README.md +14 -3
README.hf.md
CHANGED
|
@@ -48,9 +48,20 @@ Foundational fine‑tuned model developed by CogniX LTD.
|
|
| 48 |
|
| 49 |
### Dataset Sources:
|
| 50 |
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
- **
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 54 |
|
| 55 |
|
| 56 |
### Model Details:
|
|
@@ -108,6 +119,9 @@ Our evaluation framework operationalizes Google's Responsible AI Principles:
|
|
| 108 |
|
| 109 |
*Full evaluation suite and rubrics available at [https://github.com/CogniX-LTD/Cogni-OpenModel].*
|
| 110 |
|
|
|
|
|
|
|
|
|
|
| 111 |
|
| 112 |
### Generation Configuration:
|
| 113 |
|
|
|
|
| 48 |
|
| 49 |
### Dataset Sources:
|
| 50 |
|
| 51 |
+
Our training data prioritizes real therapy conversations from licensed professionals over synthetic data for authenticity. Primary sources include:
|
| 52 |
+
|
| 53 |
+
- **Amod/mental_health_counseling_conversations**: Real counseling platform Q&A.
|
| 54 |
+
- **nbertagnolli/counsel-chat**: Licensed therapist responses.
|
| 55 |
+
- **EmoCareAI/Psych8k**: Transcripts from real counseling sessions.
|
| 56 |
+
- **vzeizer/MentalHealth_Analysis**: mental health condition recognition / classification eg. depression, anxiety, suicidal ideation.
|
| 57 |
+
|
| 58 |
+
These are supplemented with synthetic data (generated using **GPT‑4o/Claude Sonnet 4.5** with safety filtering) to enhance coverage of specific scenarios while maintaining therapeutic quality.
|
| 59 |
+
|
| 60 |
+
#### Curation Rationale:
|
| 61 |
+
"There is a lack of high quality open source mental health data available for study in NLP. Most datasets revolve around forums like Reddit, which can provide great insights, but don't capture the type of language often used by counselors. This dataset seeks to help bridge that gap and provide additional data of counselors interacting with patients in need."
|
| 62 |
+
|
| 63 |
+
#### Release & Licensing:
|
| 64 |
+
Full dataset provenance will be released with our open dataset under a **CC‑BY 4.0** license.
|
| 65 |
|
| 66 |
|
| 67 |
### Model Details:
|
|
|
|
| 119 |
|
| 120 |
*Full evaluation suite and rubrics available at [https://github.com/CogniX-LTD/Cogni-OpenModel].*
|
| 121 |
|
| 122 |
+
We will continuously evaluate our model using DeepEval/GEval to monitor therapeutic quality and safety metrics, ensuring that real data grounding remains effective as we scale.
|
| 123 |
+
|
| 124 |
+
|
| 125 |
|
| 126 |
### Generation Configuration:
|
| 127 |
|
README.md
CHANGED
|
@@ -31,9 +31,20 @@ Foundational fine‑tuned model developed by CogniX LTD.
|
|
| 31 |
|
| 32 |
### Dataset Sources:
|
| 33 |
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
- **
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
|
| 39 |
### Model Details:
|
|
|
|
| 31 |
|
| 32 |
### Dataset Sources:
|
| 33 |
|
| 34 |
+
Our training data prioritizes real therapy conversations from licensed professionals over synthetic data for authenticity. Primary sources include:
|
| 35 |
+
|
| 36 |
+
- **Amod/mental_health_counseling_conversations**: Real counseling platform Q&A.
|
| 37 |
+
- **nbertagnolli/counsel-chat**: Licensed therapist responses.
|
| 38 |
+
- **EmoCareAI/Psych8k**: Transcripts from real counseling sessions.
|
| 39 |
+
- **vzeizer/MentalHealth_Analysis**: mental health condition recognition / classification eg. depression, anxiety, suicidal ideation.
|
| 40 |
+
|
| 41 |
+
These are supplemented with synthetic data (generated using **GPT‑4o/Claude Sonnet 4.5** with safety filtering) to enhance coverage of specific scenarios while maintaining therapeutic quality.
|
| 42 |
+
|
| 43 |
+
#### Curation Rationale:
|
| 44 |
+
"There is a lack of high quality open source mental health data available for study in NLP. Most datasets revolve around forums like Reddit, which can provide great insights, but don't capture the type of language often used by counselors. This dataset seeks to help bridge that gap and provide additional data of counselors interacting with patients in need."
|
| 45 |
+
|
| 46 |
+
#### Release & Licensing:
|
| 47 |
+
Full dataset provenance will be released with our open dataset under a **CC‑BY 4.0** license.
|
| 48 |
|
| 49 |
|
| 50 |
### Model Details:
|