{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# (Optional but recommeded) Set up Virtual Environment\n",
    "\n",
    "TODO"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1. Install pipeline dependencies\n",
    "In order to use the pipeline, you need to install some dependencies the pipeline relies on. Run the following command to install the dependencies defined in requirements.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install -r requirements.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2. Instantiate pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "5e6901339e274840a9447cffdba845e6",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "config.json:   0%|          | 0.00/829 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "b42b83a842aa4d338df4f42c4571080b",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "rpeaks_2_hrv_pipeline.py:   0%|          | 0.00/1.76k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "ccc1bab946ad45ee9bb7473e0ecac1e7",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "rpeaks2hrv.py:   0%|          | 0.00/7.54k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "A new version of the following files was downloaded from https://huggingface.co/hubii-world/rpeaks-to-hrv-pipeline:\n",
      "- rpeaks2hrv.py\n",
      ". Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.\n",
      "A new version of the following files was downloaded from https://huggingface.co/hubii-world/rpeaks-to-hrv-pipeline:\n",
      "- rpeaks_2_hrv_pipeline.py\n",
      "- rpeaks2hrv.py\n",
      ". Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "c572acc1c2654d9599b1545b1810ff5b",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "model.safetensors:   0%|          | 0.00/649k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Device set to use cpu\n"
     ]
    }
   ],
   "source": [
    "from transformers import pipeline\n",
    "\n",
    "rpeak2hrv_pipeline = pipeline(model = \"hubii-world/rpeaks-to-hrv-pipeline\", trust_remote_code=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3. Overview: Pipeline parameters\n",
    "The pipeline provides a variety of different parameters that can be set to adjust the preprocessing behavior. In the following, the different Parameters are explained and usage examples are provided.\n",
    "\n",
    "\n",
    "### Mandatory parameters\n",
    "In general, the pipeline relies on 2 mandatory parameters the user has to set for every parameter execution:\n",
    "- `inputs` : str | pd.DataFrame - The input that should be processed by the pipeline. This can either be a path to a file containing the data to process or the data itself\n",
    "- `sampling_rate` : int - The sampling rate of the continuous cardiac signal in which peaks occur. Default value is 1000\n",
    "\n",
    "\n",
    "### Optional parameters\n",
    "Besides the mandatory parameter, the pipeline offers multiple optional parameters that may be necessary to set in order to compute correct HRV-features:\n",
    "- `time_header` : str - The name of the column in the data that contains the timestamp to which the respective values in the same row are recorded. Default setting is 'SystemTime'\n",
    "- `rri_header` : str - The name of the column in the data that contains the RR-Intervals in msec. Default setting is 'interbeat_interval'\n",
    "- `windowing_method` : str - The method that should be applied to divide the raw data into windows. Default setting is None, so no windowing is applied\n",
    "- `window_size` : str - The size of a window in terms of a time frame. Only relevant if windowing should be applied to the data. Default setting is 60 seconds\n",
    "\n",
    "\n",
    "##  3.1 `Inputs`\n",
    "\n",
    "The `inputs` parameter represents the data the pipeline should process to HRV-Features. The following input formats are supported:\n",
    "- str\n",
    "- pd.Dataframe\n",
    "\n",
    "When providing the inputs as string, it has to represent a file path to a file containing the data to process. Supported file formats are .csv, .txt.\n",
    "\n",
    "Alternatively, you can also provide the data directly to the pipeline in form of a DataFrame.\n",
    "\n",
    "### 3.1.1 Example: Provide input as file path"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>HRV_MeanNN</th>\n",
       "      <th>HRV_SDNN</th>\n",
       "      <th>HRV_SDANN1</th>\n",
       "      <th>HRV_SDNNI1</th>\n",
       "      <th>HRV_SDANN2</th>\n",
       "      <th>HRV_SDNNI2</th>\n",
       "      <th>HRV_SDANN5</th>\n",
       "      <th>HRV_SDNNI5</th>\n",
       "      <th>HRV_RMSSD</th>\n",
       "      <th>HRV_SDSD</th>\n",
       "      <th>...</th>\n",
       "      <th>HRV_ULF</th>\n",
       "      <th>HRV_VLF</th>\n",
       "      <th>HRV_LF</th>\n",
       "      <th>HRV_HF</th>\n",
       "      <th>HRV_VHF</th>\n",
       "      <th>HRV_TP</th>\n",
       "      <th>HRV_LFHF</th>\n",
       "      <th>HRV_LFn</th>\n",
       "      <th>HRV_HFn</th>\n",
       "      <th>HRV_LnHF</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1006.894005</td>\n",
       "      <td>159.530641</td>\n",
       "      <td>72.830794</td>\n",
       "      <td>137.277955</td>\n",
       "      <td>56.912177</td>\n",
       "      <td>143.4821</td>\n",
       "      <td>45.675812</td>\n",
       "      <td>152.633402</td>\n",
       "      <td>107.280546</td>\n",
       "      <td>102.397785</td>\n",
       "      <td>...</td>\n",
       "      <td>0.000497</td>\n",
       "      <td>0.010289</td>\n",
       "      <td>0.024415</td>\n",
       "      <td>0.070705</td>\n",
       "      <td>0.026376</td>\n",
       "      <td>0.132282</td>\n",
       "      <td>0.345306</td>\n",
       "      <td>0.184566</td>\n",
       "      <td>0.5345</td>\n",
       "      <td>-2.649243</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1 rows × 35 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    HRV_MeanNN    HRV_SDNN  HRV_SDANN1  HRV_SDNNI1  HRV_SDANN2  HRV_SDNNI2  \\\n",
       "0  1006.894005  159.530641   72.830794  137.277955   56.912177    143.4821   \n",
       "\n",
       "   HRV_SDANN5  HRV_SDNNI5   HRV_RMSSD    HRV_SDSD  ...   HRV_ULF   HRV_VLF  \\\n",
       "0   45.675812  152.633402  107.280546  102.397785  ...  0.000497  0.010289   \n",
       "\n",
       "     HRV_LF    HRV_HF   HRV_VHF    HRV_TP  HRV_LFHF   HRV_LFn  HRV_HFn  \\\n",
       "0  0.024415  0.070705  0.026376  0.132282  0.345306  0.184566   0.5345   \n",
       "\n",
       "   HRV_LnHF  \n",
       "0 -2.649243  \n",
       "\n",
       "[1 rows x 35 columns]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "file_path = \"./Data/hr_rr_Elias-Rec_1703_162053.csv\"\n",
    "result = rpeak2hrv_pipeline(inputs=file_path, feature_domains=['time', 'freq'], sampling_rate=1000)\n",
    "result.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.2 `sampling_rate`\n",
    "The sampling_rate (Hz) represents the rate with which the sensor sampled data from the patient. It has to be provided as integer. In the example above, you can see a configuration where the sampling_rate is set to 2000.\n",
    "\n",
    "The default rate is 1000 Hz, meaning that the sensor sampled 1000 values per second."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.3 `time_header` & `rri_header`\n",
    "time_header and rri_header are important settings to define the structure of the data the pipeline has to process. In general, the pipeline supports two possible data formats:\n",
    "- R Peak Flags\n",
    "- RR-Intervals with timestamps\n",
    "\n",
    "### 3.3.1 R Peak Flags\n",
    "The first format option is defined by a Dataframe with one column named `'ECG_R_Peaks'`. The column values are simple binary flags indicating whether a R peak occured or not. \n",
    "\n",
    "This is the standard data format used by neurokit to represent R peaks. If you use this data format, you do not need to specify `time_header` and `rri_header`.\n",
    "\n",
    "__Important__: Make sure that the column has the correct name and that you specify the correct sampling rate, as these are indispensable information to compute the correct HRV-Features.\n",
    "\n",
    "#### Example: R Peak Flags\n",
    "\n",
    "TODO\n",
    "\n",
    "### 3.3.2 RR-Intervals with timestamps\n",
    "The second format option is defined by a DataFrame with two columns containing the RR-Intervals in milliseconds and the corresponding timestamps at which the RR-intervals have been recorded by the sensor. Here, `time_header` speficies the column name containing the timestamps and `rri_header` speficies the column containing the RR-intervals.\n",
    "The default column names are `'SystemTime'` and `'interbeat_intervals'`.\n",
    "#### Example: RR-Intervals with timestamps\n",
    "\n",
    "TODO"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.4 `windowing_method`\n",
    "The windowing_method defines the method to be used to divide the raw data into windows. The supported settings are:\n",
    "- 'rolling' - Creates a window rolling over the data. For more information see [pandas.DataFrame.rolling()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rolling.html)\n",
    "- 'first_interval' - Keeps the data values that are recorded within the first timeframe defined by window_size and omits the rest\n",
    "- 'last_interval'-  Keeps the data values that are recorded within the last timeframe defined by window_size and omits the rest\n",
    "\n",
    "\n",
    "# 3.4.1 Example: Use 'first_interval'-windowing\n",
    "The following code snipped shows an exemplary usage of first_interval windowing. In this example, only the values recorded within the first 60s of the data collection are used to compute HRV-Features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "c:\\Users\\georg\\Desktop\\HIWI\\ECG2HRV\\.venv\\Lib\\site-packages\\neurokit2\\hrv\\hrv_nonlinear.py:268: NeuroKitWarning: Missing interbeat intervals have been detected. Note that missing intervals can distort some HRV features, in particular nonlinear indices.\n",
      "  warn(\n",
      "c:\\Users\\georg\\Desktop\\HIWI\\ECG2HRV\\.venv\\Lib\\site-packages\\neurokit2\\hrv\\hrv_nonlinear.py:529: NeuroKitWarning: DFA_alpha2 related indices will not be calculated. The maximum duration of the windows provided for the long-term correlation is smaller than the minimum duration of windows. Refer to the `scale` argument in `nk.fractal_dfa()` for more information.\n",
      "  warn(\n",
      "C:\\Users\\georg\\.cache\\huggingface\\modules\\transformers_modules\\willergeorg\\rpeaks-to-hrv-pipeline\\354dfd60f24b9b5900052952ca632b87456e4701\\rpeaks2hrv.py:24: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.\n",
      "  hrv_values = pd.concat([hrv_values, hrv_time], ignore_index=True)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>window_start</th>\n",
       "      <th>window_end</th>\n",
       "      <th>HRV_MeanNN</th>\n",
       "      <th>HRV_SDNN</th>\n",
       "      <th>HRV_SDANN1</th>\n",
       "      <th>HRV_SDNNI1</th>\n",
       "      <th>HRV_SDANN2</th>\n",
       "      <th>HRV_SDNNI2</th>\n",
       "      <th>HRV_SDANN5</th>\n",
       "      <th>HRV_SDNNI5</th>\n",
       "      <th>...</th>\n",
       "      <th>HRV_SampEn</th>\n",
       "      <th>HRV_ShanEn</th>\n",
       "      <th>HRV_FuzzyEn</th>\n",
       "      <th>HRV_MSEn</th>\n",
       "      <th>HRV_CMSEn</th>\n",
       "      <th>HRV_RCMSEn</th>\n",
       "      <th>HRV_CD</th>\n",
       "      <th>HRV_HFD</th>\n",
       "      <th>HRV_KFD</th>\n",
       "      <th>HRV_LZC</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2025-03-17 16:20:54.760848</td>\n",
       "      <td>2025-03-17 16:21:53.596623680</td>\n",
       "      <td>1060.358416</td>\n",
       "      <td>153.515505</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>1.265666</td>\n",
       "      <td>5.657451</td>\n",
       "      <td>1.330316</td>\n",
       "      <td>0.856145</td>\n",
       "      <td>0.9913</td>\n",
       "      <td>1.267671</td>\n",
       "      <td>1.82538</td>\n",
       "      <td>2.016523</td>\n",
       "      <td>2.937446</td>\n",
       "      <td>1.227977</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1 rows × 84 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                window_start                    window_end   HRV_MeanNN  \\\n",
       "0 2025-03-17 16:20:54.760848 2025-03-17 16:21:53.596623680  1060.358416   \n",
       "\n",
       "     HRV_SDNN  HRV_SDANN1  HRV_SDNNI1  HRV_SDANN2  HRV_SDNNI2  HRV_SDANN5  \\\n",
       "0  153.515505         NaN         NaN         NaN         NaN         NaN   \n",
       "\n",
       "   HRV_SDNNI5  ...  HRV_SampEn  HRV_ShanEn  HRV_FuzzyEn  HRV_MSEn  HRV_CMSEn  \\\n",
       "0         NaN  ...    1.265666    5.657451     1.330316  0.856145     0.9913   \n",
       "\n",
       "   HRV_RCMSEn   HRV_CD   HRV_HFD   HRV_KFD   HRV_LZC  \n",
       "0    1.267671  1.82538  2.016523  2.937446  1.227977  \n",
       "\n",
       "[1 rows x 84 columns]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "file_path = \"./Data/hr_rr_Elias-Rec_1703_162053.csv\"\n",
    "result = rpeak2hrv_pipeline(inputs=file_path, windowing_method=\"first_interval\", window_size=\"60s\", sampling_rate=1000)\n",
    "result.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.5 `window_size`\n",
    "The window_size defines the size of the windows the data should be divided in. In general, the definition follows this pattern: '{any positive integer}{t}', where t is an element of {'d', 'h', 'm', 's'}.\n",
    "\n",
    "For example: the setting '20m' represents a window size of 20 minutes.\n",
    "\n",
    "The default setting is '60s' corresponding to a window size of a minute.\n",
    "\n",
    "Setting this parameter is only necessary, if you want to apply windowing."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example: Window size\n",
    "\n",
    "TODO"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 4. Supported file formats\n",
    "\n",
    "TODO"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}