Report section: Title Page • Title: Term Project • Authors: Saksham Lakhera and Ahmed Zaher • Course: CSE 555 — Introduction to Pattern Recognition • Date: July 20 2025 Abstract NLP Engineering Perspective This project addresses the challenge of improving recipe recommendation systems through advanced semantic search capabilities using transformer-based language models. Traditional keyword-based search methods often fail to capture the nuanced relationships between ingredients, cooking techniques, and user preferences in culinary contexts. Our approach leverages BERT (Bidirectional Encoder Representations from Transformers) fine-tuning on a custom recipe dataset to develop a semantic understanding of culinary content. We preprocessed and structured a subset of 15,000 recipes into standardized sequences organized by food categories (proteins, vegetables, legumes, etc.) to create training data optimized for the BERT architecture. The model was fine-tuned to learn contextual embeddings that capture semantic relationships between ingredients and tags. At the end, we generated embeddings for all recipes in our dataset and implemented a cosine similarity-based retrieval system that returns the top-K most relevant recipes based on user search queries. Our evaluation demonstrates [PLACEHOLDER: key quantitative results - e.g., Recall@10 = X.XX, MRR = X.XX, improvement over baseline = +XX%]. This work provides practical experience in transformer fine-tuning for domain-specific applications and demonstrates the effectiveness of structured data preprocessing for improving semantic search in the culinary domain. Computer-Vision Engineering Perspective (Reserved – to be completed by CV author) Introduction NLP Engineering Perspective This term project, carried out for CSE 555, serves primarily as an educational exercise aimed at giving graduate students end-to-end exposure to building a modern NLP system. Our goal is to construct a semantic recipe-search engine that demonstrates how domain-specific fine-tuning of BERT can substantially improve retrieval quality over simple keyword matching. We created a preprocessing pipeline that restructures 15 000 recipes into standardized ingredient-sequence representations and then fine-tuned BERT on this corpus. Key contributions include (i) a cleaned, category-labelled recipe subset, (ii) training scripts that yield domain-adapted contextual embeddings, and (iii) a production-ready retrieval service that returns the top-K most relevant recipes for an arbitrary user query via cosine-similarity ranking. A comparative evaluation against classical lexical baselines will be presented in Section 9 [PLACEHOLDER: baseline summary]. The project thus provides a compact blueprint of the full NLP workflow—from data curation through deployment. Computer-Vision Engineering Perspective The Computer-Vision track followed a three-phase pipeline designed to simulate the data-engineering challenges of real-world projects. Phase 1 consisted of collecting more than 6 000 food photographs under diverse lighting conditions and backgrounds, deliberately introducing noise to improve model robustness. Phase 2 handled image preprocessing, augmentation, and the subsequent training and evaluation of a convolutional neural network whose weights capture salient visual features of dishes. Phase 3 integrated the trained network into the shared web application so that users can upload an image and receive 5–10 recipe recommendations that match both visually and semantically. Detailed architecture choices and quantitative results will be provided in later sections [PLACEHOLDER: CV performance metrics]. Background / Related Work   • Survey of prior methods and the state of the art   • Clear positioning of your approach relative to existing literature Dataset and Pre-processing   • Data source(s), collection or selection criteria   • Cleaning, normalization, augmentation, class balancing, etc. Methodology   • Theoretical foundations and algorithms used   • Model architecture, feature extraction, hyper-parameters   • Assumptions and justifications Experimental Setup   • Hardware / software environment   • Train / validation / test split, cross-validation strategy   • Evaluation metrics (accuracy, F1-score, ROC-AUC, etc.) Results   • Quantitative tables and charts   • Qualitative examples (e.g., confusion matrix, sample outputs)   • Statistical significance tests where applicable Discussion    • Interpretation of results (why methods worked or failed)    • Comparison with baselines or published benchmarks    • Limitations of your study Conclusion    • Recap of contributions and findings    • Practical implications Future Work    • Concrete next steps or open problems Acknowledgments (if appropriate)    • Funding sources, collaborators, data providers References    • Properly formatted bibliography (IEEE, APA, etc.) Appendices (optional)    • Supplementary proofs, additional graphs, extensive tables, code snippets