--- base_model: - Qwen/Qwen2.5-7B tags: - text-generation-inference - transformers - unsloth - qwen2 license: apache-2.0 language: - en datasets: - joackimagno/FILIPINO_RECIPES_2K_V2 metrics: - bleu - rouge - meteor model-index: - name: MASID-v3 results: - task: name: Text Generation type: text-generation dataset: name: joackimagno/FILIPINO_RECIPES_2K_V2 type: joackimagno/FILIPINO_RECIPES_2K_V2 split: test metrics: - name: BLEU-4 type: bleu value: 0.07 - name: METEOR type: meteor value: 0.35 - name: ROUGE-L (F1) type: rouge value: 0.32 unit: f1 config: rougeL --- # MASID-v3 **MASID-v3** is a fine-tuned version of **Qwen2.5-7B** trained specifically for **Filipino recipe generation**, with a focus on main dish preparation. This model was trained on the **Filipino Recipes 2K V2 dataset**, a curated collection of ~2,000 authentic Filipino recipes. Unlike earlier variants that explored multi-stage fine-tuning, **MASID-v3 was trained directly from Qwen2.5-7B** using this dataset to specialize the model toward Filipino culinary knowledge. The goal of MASID-v3 is to generate structured and culturally accurate Filipino main dish recipes, covering a wide range of traditional cooking methods and ingredient combinations. --- ## Model Details - **Base Model**: [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B) - **Dataset**: Filipino Recipes 2K V2 (~2,000 samples) - **Training Objective**: Recipe text generation (Filipino cuisine, main dishes) - **Method**: Direct fine-tuning from Qwen2.5-7B --- ## Intended Use - Assisting in **recipe writing** - Exploring **Filipino food culture** - Generating **cooking instructions** in natural language --- ## Limitations - The model was trained on a relatively **small dataset (~2k samples)**. - May sometimes produce **hallucinated ingredients** or **inaccurate cooking steps**. - Not suitable for use as a **nutritional or food safety reference**. - Best used for **research, education, and creative applications**. --- ## Evaluation | Dataset | Split | BLEU-4 | METEOR | ROUGE-L (F1) | |------------------------------------|:-----:|:------:|:------:|:------------:| | joackimagno/FILIPINO_RECIPES_2K_V2 | test | 0.07 | 0.35 | 0.32 | --- --- This Qwen2 model was trained **2× faster** with [Unsloth](https://github.com/unslothai/unsloth) and Hugging Face’s TRL library. [](https://github.com/unslothai/unsloth) ## Example Usage ```python from typing import List import torch from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig # Load model and tokenizer model_name = "joackimagno/MASID-v3" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.float16, device_map="auto", ) # ============================================================== # Alpaca-style prompt # ============================================================== SYSTEM_INSTRUCTION = ( "You are a Filipino chef. Generate Filipino MAIN DISH recipes.\n" "Follow these output rules:\n" "1) Use standard stovetop or oven methods.\n" "2) Keep steps concise and logically ordered.\n" "3) Output FORMAT and ORDER must be exactly:\n" " Recipe name, Prep time, Cook time, Total time, Servings,\n" " Full Ingredients (numbered list), Instructions (numbered list)" ) ALPACA_TEMPLATE = ( "Below is an instruction that describes a task, paired with an input that " "provides further context. Write a response that appropriately completes the request.\n\n" "### Instruction:\n{}\n\n### Input:\n{}\n\n### Response:\n{}" ) def make_model_input_from_ing(ing_names: List[str]) -> str: return ( "Ingredients to use: " + ", ".join(ing_names) + ".\n" "Task: create a Filipino main dish recipe using these ingredients. " "Keep steps concise, clear, and coherent." ) # Example input ing_names = ["Beef", "Potato", "Sili", "Carrot", "Sayote"] alpaca_prompt = ALPACA_TEMPLATE.format( SYSTEM_INSTRUCTION, make_model_input_from_ing(ing_names), "" # leave response empty for model to generate ) # ============================================================== # Run inference # ============================================================== inputs = tokenizer(alpaca_prompt, return_tensors="pt").to(model.device) gen_config = GenerationConfig( max_new_tokens=512, temperature=0.7, top_p=0.9, do_sample=True, ) outputs = model.generate(**inputs, generation_config=gen_config) generated = tokenizer.decode( outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True ) print(generated.strip())