File size: 2,103 Bytes
edc2efd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
469dc47
edc2efd
 
 
 
1fba478
edc2efd
 
 
 
 
 
 
 
 
 
 
 
902fd69
 
 
 
 
 
edc2efd
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
---
license: bigscience-bloom-rail-1.0
pipeline_tag: text-generation
library_name: transformers
tags:
- dolly
- bloomz
- Spanish
- French
- German
datasets:
- argilla/databricks-dolly-15k-multilingual
inference: false
widget:
- text: >-
    Below is an instruction that describes a task, paired with an input that
    provides further context.

    Write a response that appropriately completes the request.

    ### Instruction:

    Tell me about alpacas
language:
- es
- fr
- de
---

<div style="text-align:center;width:250px;height:250px;">
    <img src="https://huggingface.co/mrm8488/dollcerberoom/resolve/main/dollcerberoom_logo.png" alt="dollcerberoom logo"">
</div>



# DOLLcerberOOM: 3 x Dolly 🐑 + BLOOMz 💮


## Adapter Description
This adapter was created with the [PEFT](https://github.com/huggingface/peft) library and allowed the base model **BigScience/BLOOMz 7B1** to be fine-tuned on the **Dolly's Dataset (tanslated to Spanish, French and German by Argilla)** by using the method **LoRA**.

## Model Description
Instruction Tuned version of BigScience Large Open-science Open-access Multilingual.

[BLOOMz 7B1 MT](https://huggingface.co/bigscience/bloomz-7b1-mt)

## Training data

This collection of datasets are machine-translated (and soon curated) versions of the `databricks-dolly-15k` [dataset](https://github.com/databrickslabs/dolly/tree/master/data) originally created by Databricks, Inc. in 2023.

The goal is to give practitioners a starting point for training open-source instruction-following models beyond English. However, as the translation quality will not be perfect, we highly recommend dedicating time to curate and fix translation issues. Below we explain how to load the datasets into [Argilla for data curation and fixing](https://github.com/argilla-io/argilla). Additionally, we'll be improving the datasets made available here, with the help of different communities.

**We highly recommend dataset curation beyond proof-of-concept experiments.**


### Supported Tasks and Leaderboards

TBA

### Training procedure

TBA

## How to use

TBA

## Citation