---
license: mit
datasets:
- RaThorat/doc_chunks
language:
- nl
base_model:
- GroNLP/bert-base-dutch-cased
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->

This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->
Het doel is een schaalbare, privacyschone oplossing die gebruik maakt van openbare gegevens van DUS-I (zoals beleidsdocumenten en nieuwsberichten) om medewerkers snel en accuraat te informeren.

### Model Sources [optional]

<!-- Provide the basic links for the model. -->

- **Repository:** https://github.com/RaThorat/my-chatbot-project

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
Identificatie van vragen: Veelvoorkomende onderwerpen zijn subsidie-informatie, beleidsontwikkelingen en handleidingen.

### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
Tijd besparen door snel informatie te leveren aan medewerkers via AI.

[More Information Needed]


## Training Details

### Training Data

46 txt, pdf en odt documenten van de DUS-I website zijn gebruikt om Chunks (200 woorden per chunk) te maken in JSON-formaat.

[More Information Needed]

### Training Procedure

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

#### Preprocessing [optional]

Documenten gegroepeerd (groeperen_segment_text_to_jsonl.py) in labels zoals: PROJECT, HANDLEIDING, OVEREENKOMST, PLAN, BELEID, SUBSIDIE.


#### Training Hyperparameters

- **Training regime:** Uitgevoerd met GroNLP/bert-base-dutch-cased model (110 miljoen parameters). <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->


### Results

[More Information Needed]

#### Summary

Script voor textcat model: https://github.com/RaThorat/my-chatbot-project/blob/main/scripts/train_textcat_model.py


## Technical Specifications [optional]

### Model Architecture and Objective

46 txt, pdf en odt documenten van de DUS-I website zijn gebruikt om Chunks (200 woorden per chunk) te maken in JSON-formaat.
Voor text categorization model: dezelfde documenten omgezet naar JSONL-formaat.

### Compute Infrastructure

[More Information Needed]

#### Hardware

8 vCPU's en 64 GB RAM was vereist.