Commit ·
1780da1
1
Parent(s): db673fc
Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,43 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Redactable-LLM
|
| 2 |
+
The high-level overview for integrating multiple Open Source Large Language Models within the AutoGen Framework is as follows:
|
| 3 |
+
|
| 4 |
+
### Development of Custom Agents
|
| 5 |
+
- **Agent Design**: Tasks include NLP/NER/PII identification, interpreting natural language commands, executing document redaction, and final verification.
|
| 6 |
+
- **Customization**: Custom agents trained on specific tasks related to each aspect of the redaction process.
|
| 7 |
+
- **Human Interaction**: Implement features to facilitate seamless human-agent interaction, allowing users to input commands and queries naturally (Optional)
|
| 8 |
+
|
| 9 |
+
### LLM & VLLM AutoGen Integration
|
| 10 |
+
|
| 11 |
+
- **Model Selection**: Automatic, task-dependent agent selection.
|
| 12 |
+
- **Enhanced Inference**: Enhanced LLM inference features for optimal performance, including tuning, caching, error handling, and templating.
|
| 13 |
+
- **Quality Control**: Vision agents analyze redacted documents using Set-of-Mark (SoM) prompting. Rejected documents are reprocessed and reviewed.
|
| 14 |
+
-
|
| 15 |
+

|
| 16 |
+
|
| 17 |
+
### System Optimization
|
| 18 |
+
- **Workflow Automation**: Automate the redaction workflow using a blend of LLMs, custom agents, and human inputs for efficient detection and redaction of sensitive information.
|
| 19 |
+
- **Performance Maximization**: Optimize the system for both efficiency and accuracy, utilizing AutoGen's complex workflow management features.
|
| 20 |
+
|
| 21 |
+
### User Interface Development
|
| 22 |
+
- **Interface Design**: Develop a user-friendly interface that enables non-technical users to interact with the system via natural language prompts.
|
| 23 |
+
- **Feedback Integration**: Implement a feedback loop to continuously refine the system's accuracy and user-friendliness based on user inputs.
|
| 24 |
+
- **User Knowledgebase**: (Optional) User account, profile, and domain knowledge will be accessible by the `Research` agent, for personalized interaction and results.
|
| 25 |
+
|
| 26 |
+
### Training, Testing and Validation
|
| 27 |
+
- **Model Training**: Develop new datasets, focused on document understanding related to redaction.
|
| 28 |
+
- **Unit Testing**: Conduct extensive unit tests to ensure individual system components function correctly.
|
| 29 |
+
- **System Testing**: Perform comprehensive end-to-end testing to validate the entire redaction process, from user input to output.
|
| 30 |
+
- **User Trials**: Facilitate user trials to gather feedback and make necessary system adjustments.
|
| 31 |
+
---
|
| 32 |
+
|
| 33 |
+
- #### Mistral AI (LLM)
|
| 34 |
+
[Paper](https://mistral.ai/news/mixtral-of-experts/) | [Model](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)
|
| 35 |
+
|
| 36 |
+
- #### QwenLM (VLLM)
|
| 37 |
+
[Paper](https://arxiv.org/abs/2308.12966) | [Code](https://github.com/QwenLM/Qwen-VL?tab=readme-ov-file) | [Paper: Set-of-Mark Prompting](https://arxiv.org/abs/2310.11441)
|
| 38 |
+
|
| 39 |
+
- #### AutoGen
|
| 40 |
+
[Paper](https://arxiv.org/abs/2308.08155) | [Code](https://github.com/microsoft/autogen/tree/main)
|
| 41 |
+
|
| 42 |
+
- #### Gretel AI (Synthetic Dataset Generation)
|
| 43 |
+
[Model Page](https://gretel.ai/solutions/public-sector) | [Code](https://github.com/gretelai) | [Paper: Textbooks Are All You Need II](https://arxiv.org/abs/2309.05463)
|