mnist-digit-classifier / docs /problem_statement.md
faizan
docs: add project documentation and reference files
4508e42

A newer version of the Gradio SDK is available: 6.9.0

Upgrade

Data-Driven Machine Learning: Ensuring Quality in Model Development

Project Overview:

This project provides practical experience in machine learning, covering aspects from data quality and preprocessing to model development and deployment. Students will choose one of the following tasks, each involving a different data type. The tasks are designed to highlight the importance of data quality in AI and also to engage students in the entire process of machine learning model development, including software engineering best practices and deployment on a modern platform like Hugging Face.

  • Objective: Develop a model for image recognition.
  • Key Considerations:
    • Data Augmentation: Enhance the dataset's diversity and robustness through augmentation techniques.
    • Model Architecture: Select or design a convolutional neural network (CNN) for image classification.
    • Evaluation Metrics: Use appropriate metrics like accuracy, precision, and recall for image-related tasks.
    • SE Best Practices: Follow SE best practices for code quality, including modularization and version control.
    • Dataset: MNIST dataset (Handwritten digits)Links to an external site..

General Instructions:

  • Model Development: Develop and evaluate the model, focusing on accuracy, efficiency, and interpretability while respecting SE best practices. Deployment: Deploy the model on the Hugging Face platform and showcase its application.
  • Data Quality Report: Include a detailed analysis of data quality, challenges faced, and measures taken to ensure data integrity. Report and Presentation: Include workflow pipeline, model development, data quality analysis, SE best practices, and key findings.
  • You can use the dataset of your choice.

Final Deliverables:

  • A 20-30 pages well-documented report about HOW you solved the problem, including all its steps (data cleaning, outlier detection, etc.)
  • A fully functioning ipynb to validate your solutions.