|
|
--- |
|
|
license: cc-by-4.0 |
|
|
language: |
|
|
- th |
|
|
base_model: |
|
|
- airesearch/wangchanberta-base-att-spm-uncased |
|
|
pipeline_tag: token-classification |
|
|
--- |
|
|
library_name: transformers |
|
|
tags: [ner, thai, food, review, token-classification] |
|
|
--- |
|
|
|
|
|
# Model Card for wttw/modchelin_thainer-base-model |
|
|
|
|
|
This model performs Named Entity Recognition (NER) on Thai-language food reviews. It is designed to extract domain-specific aspects such as dish names, ingredients, restaurant service, and sentiment-related phrases from customer-written content. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
This is the model card of a 🤗 Transformers model that has been pushed to the Hugging Face Hub. |
|
|
|
|
|
- **Developed by:** Vitawat Kitipatthavorn |
|
|
- **Finetuned from model:** `airesearch/wangchanberta-base-att-spm-uncased` |
|
|
- **Model type:** Token Classification (NER) |
|
|
- **Language(s) (NLP):** Thai |
|
|
- **License:** cc-by-sa-4.0 |
|
|
- **Shared by:** wttw |
|
|
- **Model ID:** `wttw/modchelin_thainer-base-model` |
|
|
|
|
|
## Uses |
|
|
|
|
|
### Direct Use |
|
|
|
|
|
This model is designed for extracting domain-specific entities from Thai-language food reviews. It identifies and classifies named entities related to: |
|
|
|
|
|
- Food/menu items |
|
|
- Taste |
|
|
- Service |
|
|
- Ambiance |
|
|
- Price and value |
|
|
- Other aspects relevant to customer dining experiences |
|
|
|
|
|
**Example:** |
|
|
|
|
|
- **Input:** `"ต้มยำกุ้งอร่อยมาก แต่บริการช้า"` |
|
|
- **Output:** |
|
|
- `ต้มยำกุ้ง: FOOD` |
|
|
- `บริการ: SERVICE` |
|
|
|
|
|
The model is suitable for NLP pipelines aimed at analyzing restaurant reviews, powering sentiment dashboards, or supporting aspect-based sentiment analysis (ABSA). |
|
|
|
|
|
### Downstream Use |
|
|
|
|
|
The model can be integrated into: |
|
|
|
|
|
- Thai ABSA pipelines |
|
|
- Restaurant feedback summarization systems |
|
|
- Chatbots or moderation tools for food delivery and review platforms |
|
|
|
|
|
### Out-of-Scope Use |
|
|
|
|
|
The model is not designed for: |
|
|
|
|
|
- Non-food-related documents (e.g., legal, clinical, political) |
|
|
- General-purpose Thai NER tasks |
|
|
- Use cases requiring high confidence on ambiguous or out-of-domain text |
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
|
|
The model is trained specifically on food review content and may: |
|
|
|
|
|
- Struggle with informal slang or regional dialects |
|
|
- Over-predict `FOOD` entities in unrelated contexts |
|
|
- Misclassify ambiguous phrases without surrounding context |
|
|
|
|
|
### Recommendations |
|
|
|
|
|
Users should: |
|
|
|
|
|
- Avoid applying this model outside food-related domains |
|
|
- Fine-tune further if working with reviews in specific dialects or contexts |
|
|
- Evaluate on a sample of target data before production use |
|
|
- Consider setting confidence thresholds before using predictions downstream |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForTokenClassification |
|
|
from transformers import pipeline |
|
|
|
|
|
model_name = "wttw/modchelin_thainer-base-model" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForTokenClassification.from_pretrained(model_name) |
|
|
|
|
|
ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple") |
|
|
|
|
|
example = "ต้มยำกุ้งอร่อยมาก แต่บริการช้า" |
|
|
entities = ner_pipeline(example) |
|
|
|
|
|
print(entities) |