Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,10 +1,92 @@
|
|
| 1 |
---
|
| 2 |
title: README
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
-
colorTo:
|
| 6 |
sdk: static
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
title: README
|
| 3 |
+
emoji: 🐶
|
| 4 |
+
colorFrom: purple
|
| 5 |
+
colorTo: indigo
|
| 6 |
sdk: static
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
+
### InstructLab
|
| 11 |
+
|
| 12 |
+
**Project Name**: InstructLab
|
| 13 |
+
|
| 14 |
+
**Description**:
|
| 15 |
+
[InstructLab](https://instructlab.ai) (based on [the Large-scale Alignment for ChatBots technique](https://arxiv.org/abs/2403.01081))
|
| 16 |
+
is an innovative open-source initiative led by Red Hat and IBM.
|
| 17 |
+
The project aims to enhance the capabilities of Large Language Models
|
| 18 |
+
(LLMs) through a community-driven approach that leverages a novel
|
| 19 |
+
taxonomy-based curation process and synthetic data generation. InstructLab
|
| 20 |
+
provides tools for users to engage with and improve LLMs, contributing skills
|
| 21 |
+
and knowledge to the project’s taxonomy repository.
|
| 22 |
+
|
| 23 |
+
**Key Features**:
|
| 24 |
+
- **ilab Command-Line Interface (CLI)**: Allows users to interact
|
| 25 |
+
with, train, and fine-tune LLMs using custom taxonomy data. The CLI
|
| 26 |
+
supports various platforms including macOS, Fedora Linux, and Windows.
|
| 27 |
+
- **Synthetic Data Generation**: Enhances LLM training through the
|
| 28 |
+
creation of synthetic datasets.
|
| 29 |
+
- **Taxonomy Repository**: A structured repository where users can
|
| 30 |
+
submit and manage their contributions of skills and knowledge.
|
| 31 |
+
|
| 32 |
+
**Core Components**:
|
| 33 |
+
1. **ilab CLI Tool**: Facilitates model interaction, training, and
|
| 34 |
+
data generation.
|
| 35 |
+
2. **Taxonomy Tree**: Organizes skills and knowledge contributions for
|
| 36 |
+
model tuning.
|
| 37 |
+
3. **Community Collaboration**: Encourages open-source contributions,
|
| 38 |
+
including new features, bug fixes, and documentation improvements.
|
| 39 |
+
|
| 40 |
+
**Granite and Merlinite Models**:
|
| 41 |
+
- **Merlinite**: Merlinite is instruct-tuned from the Mistral model,
|
| 42 |
+
providing overall better accuracy than Mistral. It is continuously
|
| 43 |
+
improved using user-submitted data from the taxonomy repository,
|
| 44 |
+
incorporating both skills and knowledge.
|
| 45 |
+
|
| 46 |
+
|
| 47 |
+
- **Granite**: [Granite](https://huggingface.co/ibm-granite/granite-7b-base)
|
| 48 |
+
is a base model developed from scratch by IBM Research, trained on 2 trillion
|
| 49 |
+
tokens. The datasets the model was trained on are openly cited in [its
|
| 50 |
+
HuggingFace model card](https://huggingface.co/ibm-granite/granite-7b-base).
|
| 51 |
+
|
| 52 |
+
**Installation and Usage**:
|
| 53 |
+
- [Detailed instructions are available for setting up the `ilab` CLI
|
| 54 |
+
tool](https://github.com/instructlab/instructlab) on various operating systems. Key steps include installing
|
| 55 |
+
necessary dependencies, creating a virtual environment, and
|
| 56 |
+
initializing the `ilab` tool.
|
| 57 |
+
- The CLI supports commands for chatting with models, generating
|
| 58 |
+
synthetic data, downloading pre-trained models, and training models
|
| 59 |
+
with user-generated data.
|
| 60 |
+
|
| 61 |
+
**Community and Contribution**:
|
| 62 |
+
- InstructLab welcomes contributions from the open-source community.
|
| 63 |
+
Users can submit pull requests to the taxonomy repository, participate
|
| 64 |
+
in discussions, and contribute to ongoing development.
|
| 65 |
+
- The project maintains [a comprehensive guide for contributors](https://github.com/instructlab/community),
|
| 66 |
+
outlining best practices and governance.
|
| 67 |
+
|
| 68 |
+
**Getting Started**:
|
| 69 |
+
1. **Install ilab CLI**: Follow the installation instructions specific
|
| 70 |
+
to your operating system.
|
| 71 |
+
2. **Initialize ilab**: Set up the local environment and clone the
|
| 72 |
+
taxonomy repository.
|
| 73 |
+
3. **Contribute**: Create and submit new skills and knowledge to improve LLMs.
|
| 74 |
+
|
| 75 |
+
**Repository Links**:
|
| 76 |
+
- [InstructLab Main Repository](https://github.com/instructlab/instructlab)
|
| 77 |
+
- [Taxonomy Repository](https://github.com/instructlab/taxonomy)
|
| 78 |
+
- [Community Repository](https://github.com/instructlab/community)
|
| 79 |
+
|
| 80 |
+
**Contact and Support**:
|
| 81 |
+
- Join the InstructLab community on
|
| 82 |
+
[Slack](https://instruct-lab.slack.com.) for support and
|
| 83 |
+
collaboration.
|
| 84 |
+
- Refer to the [documentation](https://github.com/instructlab/instructlab)
|
| 85 |
+
for detailed guides and troubleshooting tips.
|
| 86 |
+
|
| 87 |
+
**Licenses**:
|
| 88 |
+
- InstructLab is released under the Apache-2.0 license.
|
| 89 |
+
|
| 90 |
+
For more details and to get involved, visit the [InstructLab GitHub
|
| 91 |
+
page](https://github.com/instructlab).
|
| 92 |
+
|