Update app.py
Browse files
app.py
CHANGED
|
@@ -22,84 +22,64 @@ with gr.Blocks() as demo:
|
|
| 22 |
with gr.Row():
|
| 23 |
gr.Markdown("""
|
| 24 |
|
| 25 |
-
#
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
3.
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
1. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model [Paper](https://arxiv.org/abs/2211.05100)
|
| 78 |
-
2. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism [Paper](https://arxiv.org/abs/1909.08053)
|
| 79 |
-
3. 8-bit Optimizers via Block-wise Quantization [Paper](https://arxiv.org/abs/2110.02861)
|
| 80 |
-
4. Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation [Paper](https://arxiv.org/abs/2108.12409)
|
| 81 |
-
5. [Paper](https://huggingface.co/models?other=doi:10.57967/hf/0003)
|
| 82 |
-
6. 217 Other Models optimizing use of bloom via specialization: [Paper](https://huggingface.co/models?other=bloom)
|
| 83 |
-
|
| 84 |
-
# Datasets
|
| 85 |
-
1. [Universal Dependencies](https://paperswithcode.com/dataset/universal-dependencies)
|
| 86 |
-
2. [WMT 2014](https://paperswithcode.com/dataset/wmt-2014)
|
| 87 |
-
3. [The Pile](https://paperswithcode.com/dataset/the-pile)
|
| 88 |
-
4. [HumanEval](https://paperswithcode.com/dataset/humaneval)
|
| 89 |
-
5. [FLORES-101](https://paperswithcode.com/dataset/flores-101)
|
| 90 |
-
6. [CrowS-Pairs](https://paperswithcode.com/dataset/crows-pairs)
|
| 91 |
-
7. [WikiLingua](https://paperswithcode.com/dataset/wikilingua)
|
| 92 |
-
8. [MTEB](https://paperswithcode.com/dataset/mteb)
|
| 93 |
-
9. [xP3](https://paperswithcode.com/dataset/xp3)
|
| 94 |
-
10. [DiaBLa](https://paperswithcode.com/dataset/diabla)
|
| 95 |
-
|
| 96 |
-
# Deep RL ML Strategy
|
| 97 |
1. Language Model Preparation, Human Augmented with Supervised Fine Tuning
|
| 98 |
2. Reward Model Training with Prompts Dataset Multi-Model Generate Data to Rank
|
| 99 |
3. Fine Tuning with Reinforcement Reward and Distance Distribution Regret Score
|
| 100 |
4. Proximal Policy Optimization Fine Tuning
|
| 101 |
|
| 102 |
-
|
| 103 |
1. Use Ranking Datasets Sentiment - Thumbs Up/Down, Distribution
|
| 104 |
2. Online Version Getting Feedback
|
| 105 |
3. OpenAI - InstructGPT - Humans generate LM Training Text
|
|
|
|
| 22 |
with gr.Row():
|
| 23 |
gr.Markdown("""
|
| 24 |
|
| 25 |
+
# Outline of Exciting AI Developments! π€π»π¬
|
| 26 |
+
|
| 27 |
+
Here is an outline of some of the most exciting recent developments in AI:
|
| 28 |
+
|
| 29 |
+
## Language Models π£οΈ
|
| 30 |
+
|
| 31 |
+
π Bloom sets new record for most performant and efficient AI model in science! πΈ
|
| 32 |
+
|
| 33 |
+
### Comparison of Large Language Models
|
| 34 |
+
|
| 35 |
+
| Model Name | Model Size (in Parameters) |
|
| 36 |
+
| ----------------- | -------------------------- |
|
| 37 |
+
| BigScience-tr11-176B | 176 billion |
|
| 38 |
+
| GPT-3 | 175 billion |
|
| 39 |
+
| OpenAI's DALL-E 2.0 | 500 million |
|
| 40 |
+
| NVIDIA's Megatron | 8.3 billion |
|
| 41 |
+
| Transformer-XL | 250 million |
|
| 42 |
+
| XLNet | 210 million |
|
| 43 |
+
|
| 44 |
+
## ChatGPT Datasets π
|
| 45 |
+
|
| 46 |
+
- WebText
|
| 47 |
+
- Common Crawl
|
| 48 |
+
- BooksCorpus
|
| 49 |
+
- English Wikipedia
|
| 50 |
+
- Toronto Books Corpus
|
| 51 |
+
- OpenWebText
|
| 52 |
+
|
| 53 |
+
## Big Science Model π
|
| 54 |
+
|
| 55 |
+
- π Papers:
|
| 56 |
+
1. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model [Paper](https://arxiv.org/abs/2211.05100)
|
| 57 |
+
2. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism [Paper](https://arxiv.org/abs/1909.08053)
|
| 58 |
+
3. 8-bit Optimizers via Block-wise Quantization [Paper](https://arxiv.org/abs/2110.02861)
|
| 59 |
+
4. Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation [Paper](https://arxiv.org/abs/2108.12409)
|
| 60 |
+
5. [Other papers related to Big Science](https://huggingface.co/models?other=doi:10.57967/hf/0003)
|
| 61 |
+
6. [217 other models optimized for use with Bloom](https://huggingface.co/models?other=bloom)
|
| 62 |
+
|
| 63 |
+
- π Datasets:
|
| 64 |
+
1. [Universal Dependencies](https://paperswithcode.com/dataset/universal-dependencies)
|
| 65 |
+
2. [WMT 2014](https://paperswithcode.com/dataset/wmt-2014)
|
| 66 |
+
3. [The Pile](https://paperswithcode.com/dataset/the-pile)
|
| 67 |
+
4. [HumanEval](https://paperswithcode.com/dataset/humaneval)
|
| 68 |
+
5. [FLORES-101](https://paperswithcode.com/dataset/flores-101)
|
| 69 |
+
6. [CrowS-Pairs](https://paperswithcode.com/dataset/crows-pairs)
|
| 70 |
+
7. [WikiLingua](https://paperswithcode.com/dataset/wikilingua)
|
| 71 |
+
8. [MTEB](https://paperswithcode.com/dataset/mteb)
|
| 72 |
+
9. [xP3](https://paperswithcode.com/dataset/xp3)
|
| 73 |
+
10. [DiaBLa](https://paperswithcode.com/dataset/diabla)
|
| 74 |
+
|
| 75 |
+
## Deep RL ML Strategy π§
|
| 76 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 77 |
1. Language Model Preparation, Human Augmented with Supervised Fine Tuning
|
| 78 |
2. Reward Model Training with Prompts Dataset Multi-Model Generate Data to Rank
|
| 79 |
3. Fine Tuning with Reinforcement Reward and Distance Distribution Regret Score
|
| 80 |
4. Proximal Policy Optimization Fine Tuning
|
| 81 |
|
| 82 |
+
## Variations - Preference Model Pretraining π€
|
| 83 |
1. Use Ranking Datasets Sentiment - Thumbs Up/Down, Distribution
|
| 84 |
2. Online Version Getting Feedback
|
| 85 |
3. OpenAI - InstructGPT - Humans generate LM Training Text
|