Update app.py
Browse files
app.py
CHANGED
|
@@ -72,20 +72,26 @@ Here is an outline of some of the most exciting recent developments in AI:
|
|
| 72 |
9. [xP3](https://paperswithcode.com/dataset/xp3)
|
| 73 |
10. [DiaBLa](https://paperswithcode.com/dataset/diabla)
|
| 74 |
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 89 |
""")
|
| 90 |
|
| 91 |
demo.launch()
|
|
|
|
| 72 |
9. [xP3](https://paperswithcode.com/dataset/xp3)
|
| 73 |
10. [DiaBLa](https://paperswithcode.com/dataset/diabla)
|
| 74 |
|
| 75 |
+
# Deep RL ML Strategy π§
|
| 76 |
+
|
| 77 |
+
The AI strategies are:
|
| 78 |
+
- Language Model Preparation using Human Augmented with Supervised Fine Tuning π€
|
| 79 |
+
- Reward Model Training with Prompts Dataset Multi-Model Generate Data to Rank π
|
| 80 |
+
- Fine Tuning with Reinforcement Reward and Distance Distribution Regret Score π―
|
| 81 |
+
- Proximal Policy Optimization Fine Tuning π€
|
| 82 |
+
- Variations - Preference Model Pretraining π€
|
| 83 |
+
- Use Ranking Datasets Sentiment - Thumbs Up/Down, Distribution π
|
| 84 |
+
- Online Version Getting Feedback π¬
|
| 85 |
+
- OpenAI - InstructGPT - Humans generate LM Training Text π
|
| 86 |
+
- DeepMind - Advantage Actor Critic Sparrow, GopherCite π¦
|
| 87 |
+
- Reward Model Human Prefence Feedback π
|
| 88 |
+
|
| 89 |
+
For more information on specific techniques and implementations, check out the following resources:
|
| 90 |
+
- OpenAI's paper on [GPT-3](https://arxiv.org/abs/2005.14165) which details their Language Model Preparation approach
|
| 91 |
+
- DeepMind's paper on [SAC](https://arxiv.org/abs/1801.01290) which describes the Advantage Actor Critic algorithm
|
| 92 |
+
- OpenAI's paper on [Reward Learning](https://arxiv.org/abs/1810.06580) which explains their approach to training Reward Models
|
| 93 |
+
- OpenAI's blog post on [GPT-3's fine-tuning process](https://openai.com/blog/fine-tuning-gpt-3/)
|
| 94 |
+
|
| 95 |
""")
|
| 96 |
|
| 97 |
demo.launch()
|