YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

This repository is for 2110446 Data Science and Data Engineering project.

Presentations: https://youtu.be/_z3PZ1i0JpA

How to run the project

Video: https://youtu.be/vnNhSXqAvZc

Prerequisites

  • git
  • docker

How to run

  1. Clone the repository
git clone https://huggingface.co/when-my-cat-learn-datasci/datasci-final-project-2024
  1. Change directory to the project
cd datasci-final-project-2024
  1. Start with command
chmod +x start.sh
./start.sh

OR

3 Build the docker image

docker compose build
  1. Start with docker-compose
docker-compose up

Project Structure

DataGathering

This folder is mainly for collecting data from other sources.

1. GoogleGeocoding

Gather geolocation of affiliation name using google geocoding API.

Directory/File Description
GoogleGeocoding.ipynb Python jupiter notebook for gathering latitude and longitude of country
geocode_aff_country.csv contains geolocation of each affiliation (aff_country, lat, lon)

2. ScopusAPI

Query abstract scopus data using scopus API

Directory/File Description
ScopusAPI.ipynb Python jupiter notebook for fetching abstract data from scopus API
example_abs_data.json contains example data fetched from abstract API from scopus
example_abs_data.json contains example data fetched from abstracts API from scopus

3. SubjectCode

Webscraping scopus subject areas

Directory/File Description
SubjectCode.ipynb Python jupiter notebook for scraping subject code from web
scopus_subject_areas.csv contains subject code and it corresponds name (code,name)

Pipeline

This folder is for data processing and data cleaning. It contains the following files:

Directory/File Description
Pipeline.ipynb Python jupiter notebook for extract data from raw data
spark-3.5.1-bin-hadoop3.tgz It will use in docker for connecting with spark

Raw and Raw_Extra

This folder contains raw data and extra raw data that will be used in the project.

Visualization

Directory/File Description
dockerfile Dockerfile with python image
main.py Python script used by streamlit to visualize the data
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support