{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "dd88f680",
   "metadata": {},
   "source": [
    "# Audio Data"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "9e4390a0",
   "metadata": {},
   "source": [
    "## Introduction\n",
    "\n",
    "Audio data - as recorded by smartphones or other portable devices - can carry important information about individuals' environments. This may give insights about the activity, sleep, and social interaction. However, using these data can be tricky due to privacy concerns, for example, conversations are highly identifiable. A possible solution is to compute more general characteristics (e.g. frequency) and use those instead to extract features. To address this last part, `niimpy` includes the function `extract_features_audio` to clean, downsample, and extract features from audio snippets that have been already anonymized.\n",
    "\n",
    "Audio dataframes should have the following columns (column names can be different, but in that case they must be provided as parameters):\n",
    "- `user`: Subject ID\n",
    "- `device`: Device ID\n",
    "- `is_silent`: Boolean value, indicates when audio is too quiet to record\n",
    "- `frequency`: Audio frequency in Hz\n",
    "- `decibels`: Audio volume in decibels\n",
    "\n",
    "Niimpy extracts the following audio features:\n",
    "- `audio_count_silent`: number of times when there has been some sound in the environment\n",
    "- `audio_count_speech`: number of times when there has been some sound in the environment that matches the range of human speech frequency (65 - 255Hz)\n",
    "- `audio_count_loud`: number of times when there has been some sound in the environment above 70dB\n",
    "- `audio_min_freq`: minimum frequency of the recorded audio snippets\n",
    "- `audio_max_freq`: maximum frequency of the recorded audio snippets\n",
    "- `audio_mean_freq`: mean frequency of the recorded audio snippets\n",
    "- `audio_median_freq`: median frequency of the recorded audio snippets\n",
    "- `audio_std_freq`: standard deviation of the frequency of the recorded audio snippets\n",
    "- `audio_min_db`: minimum decibels of the recorded audio snippets\n",
    "- `audio_max_db`: maximum decibels of the recorded audio snippets\n",
    "- `audio_mean_db`: mean decibels of the recorded audio snippets\n",
    "- `audio_median_db`: median decibels of the recorded audio snippets\n",
    "- `audio_std_db`: standard deviations of the recorded audio snippets decibels\n",
    "\n",
    "In the following, we will analyze audio snippets provided by `niimpy` as an example to illustrate the use of niimpy's audio preprocessing functions."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "1937680b",
   "metadata": {},
   "source": [
    "## 2. Read data\n",
    "\n",
    "Let's start by reading the example data provided in `niimpy`. These data have already been shaped in a format that meets the requirements of the data schema. Let's start by importing the needed modules. Firstly we will import the `niimpy` package and then we will import the module we will use (audio) and give it a short name for use convinience. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "8e00f1bf",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/u/24/rantahj1/unix/miniconda3/envs/niimpy/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
      "  from .autonotebook import tqdm as notebook_tqdm\n"
     ]
    }
   ],
   "source": [
    "import niimpy\n",
    "from niimpy import config\n",
    "import niimpy.preprocessing.audio as au\n",
    "import pandas as pd\n",
    "import warnings\n",
    "warnings.filterwarnings(\"ignore\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "00507e0e",
   "metadata": {},
   "source": [
    "Now let's read the example data provided in `niimpy`. The example data is in `csv` format, so we need to use the `read_csv` function. When reading the data, we can specify the timezone where the data was collected. This will help us handle daylight saving times easier. We can specify the timezone with the argument **tz**. The output is a dataframe. We can also check the number of rows and columns in the dataframe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "aa7d80df",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(33, 7)"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = niimpy.read_csv(config.MULTIUSER_AWARE_AUDIO_PATH, tz='Europe/Helsinki')\n",
    "data.shape"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "3fb22de0",
   "metadata": {},
   "source": [
    "The data was succesfully read. We can see that there are 33 datapoints with 7 columns in the dataset. However, we do not know yet what the data really looks like, so let's have a quick look:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "e416e790",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user</th>\n",
       "      <th>device</th>\n",
       "      <th>time</th>\n",
       "      <th>is_silent</th>\n",
       "      <th>double_decibels</th>\n",
       "      <th>double_frequency</th>\n",
       "      <th>datetime</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2020-01-09 02:08:03.895999908+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "      <td>1.578528e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>84</td>\n",
       "      <td>4935</td>\n",
       "      <td>2020-01-09 02:08:03.895999908+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 02:38:03.895999908+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "      <td>1.578530e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>89</td>\n",
       "      <td>8734</td>\n",
       "      <td>2020-01-09 02:38:03.895999908+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 03:08:03.895999908+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "      <td>1.578532e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>99</td>\n",
       "      <td>1710</td>\n",
       "      <td>2020-01-09 03:08:03.895999908+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 03:38:03.895999908+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "      <td>1.578534e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>77</td>\n",
       "      <td>9054</td>\n",
       "      <td>2020-01-09 03:38:03.895999908+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 04:08:03.895999908+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "      <td>1.578536e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>80</td>\n",
       "      <td>12265</td>\n",
       "      <td>2020-01-09 04:08:03.895999908+02:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                             user        device          time  \\\n",
       "2020-01-09 02:08:03.895999908+02:00  jd9INuQ5BBlW  3p83yASkOb_B  1.578528e+09   \n",
       "2020-01-09 02:38:03.895999908+02:00  jd9INuQ5BBlW  3p83yASkOb_B  1.578530e+09   \n",
       "2020-01-09 03:08:03.895999908+02:00  jd9INuQ5BBlW  3p83yASkOb_B  1.578532e+09   \n",
       "2020-01-09 03:38:03.895999908+02:00  jd9INuQ5BBlW  3p83yASkOb_B  1.578534e+09   \n",
       "2020-01-09 04:08:03.895999908+02:00  jd9INuQ5BBlW  3p83yASkOb_B  1.578536e+09   \n",
       "\n",
       "                                     is_silent  double_decibels  \\\n",
       "2020-01-09 02:08:03.895999908+02:00          0               84   \n",
       "2020-01-09 02:38:03.895999908+02:00          0               89   \n",
       "2020-01-09 03:08:03.895999908+02:00          0               99   \n",
       "2020-01-09 03:38:03.895999908+02:00          0               77   \n",
       "2020-01-09 04:08:03.895999908+02:00          0               80   \n",
       "\n",
       "                                     double_frequency  \\\n",
       "2020-01-09 02:08:03.895999908+02:00              4935   \n",
       "2020-01-09 02:38:03.895999908+02:00              8734   \n",
       "2020-01-09 03:08:03.895999908+02:00              1710   \n",
       "2020-01-09 03:38:03.895999908+02:00              9054   \n",
       "2020-01-09 04:08:03.895999908+02:00             12265   \n",
       "\n",
       "                                                               datetime  \n",
       "2020-01-09 02:08:03.895999908+02:00 2020-01-09 02:08:03.895999908+02:00  \n",
       "2020-01-09 02:38:03.895999908+02:00 2020-01-09 02:38:03.895999908+02:00  \n",
       "2020-01-09 03:08:03.895999908+02:00 2020-01-09 03:08:03.895999908+02:00  \n",
       "2020-01-09 03:38:03.895999908+02:00 2020-01-09 03:38:03.895999908+02:00  \n",
       "2020-01-09 04:08:03.895999908+02:00 2020-01-09 04:08:03.895999908+02:00  "
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "260eccd7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user</th>\n",
       "      <th>device</th>\n",
       "      <th>time</th>\n",
       "      <th>is_silent</th>\n",
       "      <th>double_decibels</th>\n",
       "      <th>double_frequency</th>\n",
       "      <th>datetime</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2019-08-13 15:02:17.657999992+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "      <td>1.565698e+09</td>\n",
       "      <td>1</td>\n",
       "      <td>44</td>\n",
       "      <td>2914</td>\n",
       "      <td>2019-08-13 15:02:17.657999992+03:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 15:28:59.657999992+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "      <td>1.565699e+09</td>\n",
       "      <td>1</td>\n",
       "      <td>49</td>\n",
       "      <td>7195</td>\n",
       "      <td>2019-08-13 15:28:59.657999992+03:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 15:59:01.657999992+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "      <td>1.565701e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>55</td>\n",
       "      <td>91</td>\n",
       "      <td>2019-08-13 15:59:01.657999992+03:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 16:29:03.657999992+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "      <td>1.565703e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>76</td>\n",
       "      <td>3853</td>\n",
       "      <td>2019-08-13 16:29:03.657999992+03:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 16:59:05.657999992+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "      <td>1.565705e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>84</td>\n",
       "      <td>7419</td>\n",
       "      <td>2019-08-13 16:59:05.657999992+03:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                             user        device          time  \\\n",
       "2019-08-13 15:02:17.657999992+03:00  iGyXetHE3S8u  Cq9vueHh3zVs  1.565698e+09   \n",
       "2019-08-13 15:28:59.657999992+03:00  iGyXetHE3S8u  Cq9vueHh3zVs  1.565699e+09   \n",
       "2019-08-13 15:59:01.657999992+03:00  iGyXetHE3S8u  Cq9vueHh3zVs  1.565701e+09   \n",
       "2019-08-13 16:29:03.657999992+03:00  iGyXetHE3S8u  Cq9vueHh3zVs  1.565703e+09   \n",
       "2019-08-13 16:59:05.657999992+03:00  iGyXetHE3S8u  Cq9vueHh3zVs  1.565705e+09   \n",
       "\n",
       "                                     is_silent  double_decibels  \\\n",
       "2019-08-13 15:02:17.657999992+03:00          1               44   \n",
       "2019-08-13 15:28:59.657999992+03:00          1               49   \n",
       "2019-08-13 15:59:01.657999992+03:00          0               55   \n",
       "2019-08-13 16:29:03.657999992+03:00          0               76   \n",
       "2019-08-13 16:59:05.657999992+03:00          0               84   \n",
       "\n",
       "                                     double_frequency  \\\n",
       "2019-08-13 15:02:17.657999992+03:00              2914   \n",
       "2019-08-13 15:28:59.657999992+03:00              7195   \n",
       "2019-08-13 15:59:01.657999992+03:00                91   \n",
       "2019-08-13 16:29:03.657999992+03:00              3853   \n",
       "2019-08-13 16:59:05.657999992+03:00              7419   \n",
       "\n",
       "                                                               datetime  \n",
       "2019-08-13 15:02:17.657999992+03:00 2019-08-13 15:02:17.657999992+03:00  \n",
       "2019-08-13 15:28:59.657999992+03:00 2019-08-13 15:28:59.657999992+03:00  \n",
       "2019-08-13 15:59:01.657999992+03:00 2019-08-13 15:59:01.657999992+03:00  \n",
       "2019-08-13 16:29:03.657999992+03:00 2019-08-13 16:29:03.657999992+03:00  \n",
       "2019-08-13 16:59:05.657999992+03:00 2019-08-13 16:59:05.657999992+03:00  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.tail()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "0956889d",
   "metadata": {},
   "source": [
    "By exploring the head and tail of the dataframe we can form an idea of its entirety. From the data, we can see that:\n",
    "\n",
    "- rows are observations, indexed by timestamps, i.e. each row represents a snippet that has been recorded at a given time and date\n",
    "- columns are characteristics for each observation, for example, the user whose data we are analyzing\n",
    "- there are at least two different users in the dataframe\n",
    "- there are two main columns: `decibels` and `frequency`.\n",
    "\n",
    "In fact, we can check the first three elements for each user"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "aa599198",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user</th>\n",
       "      <th>device</th>\n",
       "      <th>time</th>\n",
       "      <th>is_silent</th>\n",
       "      <th>double_decibels</th>\n",
       "      <th>double_frequency</th>\n",
       "      <th>datetime</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2020-01-09 02:08:03.895999908+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "      <td>1.578528e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>84</td>\n",
       "      <td>4935</td>\n",
       "      <td>2020-01-09 02:08:03.895999908+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 02:38:03.895999908+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "      <td>1.578530e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>89</td>\n",
       "      <td>8734</td>\n",
       "      <td>2020-01-09 02:38:03.895999908+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 03:08:03.895999908+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "      <td>1.578532e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>99</td>\n",
       "      <td>1710</td>\n",
       "      <td>2020-01-09 03:08:03.895999908+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 07:28:27.657999992+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "      <td>1.565671e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>51</td>\n",
       "      <td>7735</td>\n",
       "      <td>2019-08-13 07:28:27.657999992+03:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 07:58:29.657999992+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "      <td>1.565672e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>90</td>\n",
       "      <td>13609</td>\n",
       "      <td>2019-08-13 07:58:29.657999992+03:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 08:28:31.657999992+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "      <td>1.565674e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>81</td>\n",
       "      <td>7690</td>\n",
       "      <td>2019-08-13 08:28:31.657999992+03:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                             user        device          time  \\\n",
       "2020-01-09 02:08:03.895999908+02:00  jd9INuQ5BBlW  3p83yASkOb_B  1.578528e+09   \n",
       "2020-01-09 02:38:03.895999908+02:00  jd9INuQ5BBlW  3p83yASkOb_B  1.578530e+09   \n",
       "2020-01-09 03:08:03.895999908+02:00  jd9INuQ5BBlW  3p83yASkOb_B  1.578532e+09   \n",
       "2019-08-13 07:28:27.657999992+03:00  iGyXetHE3S8u  Cq9vueHh3zVs  1.565671e+09   \n",
       "2019-08-13 07:58:29.657999992+03:00  iGyXetHE3S8u  Cq9vueHh3zVs  1.565672e+09   \n",
       "2019-08-13 08:28:31.657999992+03:00  iGyXetHE3S8u  Cq9vueHh3zVs  1.565674e+09   \n",
       "\n",
       "                                     is_silent  double_decibels  \\\n",
       "2020-01-09 02:08:03.895999908+02:00          0               84   \n",
       "2020-01-09 02:38:03.895999908+02:00          0               89   \n",
       "2020-01-09 03:08:03.895999908+02:00          0               99   \n",
       "2019-08-13 07:28:27.657999992+03:00          0               51   \n",
       "2019-08-13 07:58:29.657999992+03:00          0               90   \n",
       "2019-08-13 08:28:31.657999992+03:00          0               81   \n",
       "\n",
       "                                     double_frequency  \\\n",
       "2020-01-09 02:08:03.895999908+02:00              4935   \n",
       "2020-01-09 02:38:03.895999908+02:00              8734   \n",
       "2020-01-09 03:08:03.895999908+02:00              1710   \n",
       "2019-08-13 07:28:27.657999992+03:00              7735   \n",
       "2019-08-13 07:58:29.657999992+03:00             13609   \n",
       "2019-08-13 08:28:31.657999992+03:00              7690   \n",
       "\n",
       "                                                               datetime  \n",
       "2020-01-09 02:08:03.895999908+02:00 2020-01-09 02:08:03.895999908+02:00  \n",
       "2020-01-09 02:38:03.895999908+02:00 2020-01-09 02:38:03.895999908+02:00  \n",
       "2020-01-09 03:08:03.895999908+02:00 2020-01-09 03:08:03.895999908+02:00  \n",
       "2019-08-13 07:28:27.657999992+03:00 2019-08-13 07:28:27.657999992+03:00  \n",
       "2019-08-13 07:58:29.657999992+03:00 2019-08-13 07:58:29.657999992+03:00  \n",
       "2019-08-13 08:28:31.657999992+03:00 2019-08-13 08:28:31.657999992+03:00  "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.drop_duplicates(['user','time']).groupby('user').head(3)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "beac76e5",
   "metadata": {},
   "source": [
    "Sometimes the data may come in a disordered manner, so just to make sure, let's order the dataframe and compare the results. We will use the columns \"user\" and \"datetime\" since we would like to order the information according to firstly, participants, and then, by time in order of happening. Luckily, in our dataframe, the index and datetime are the same."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "560cd6ad",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user</th>\n",
       "      <th>device</th>\n",
       "      <th>time</th>\n",
       "      <th>is_silent</th>\n",
       "      <th>double_decibels</th>\n",
       "      <th>double_frequency</th>\n",
       "      <th>datetime</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2019-08-13 07:28:27.657999992+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "      <td>1.565671e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>51</td>\n",
       "      <td>7735</td>\n",
       "      <td>2019-08-13 07:28:27.657999992+03:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 07:58:29.657999992+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "      <td>1.565672e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>90</td>\n",
       "      <td>13609</td>\n",
       "      <td>2019-08-13 07:58:29.657999992+03:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 08:28:31.657999992+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "      <td>1.565674e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>81</td>\n",
       "      <td>7690</td>\n",
       "      <td>2019-08-13 08:28:31.657999992+03:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 02:08:03.895999908+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "      <td>1.578528e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>84</td>\n",
       "      <td>4935</td>\n",
       "      <td>2020-01-09 02:08:03.895999908+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 02:38:03.895999908+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "      <td>1.578530e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>89</td>\n",
       "      <td>8734</td>\n",
       "      <td>2020-01-09 02:38:03.895999908+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 03:08:03.895999908+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "      <td>1.578532e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>99</td>\n",
       "      <td>1710</td>\n",
       "      <td>2020-01-09 03:08:03.895999908+02:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                             user        device          time  \\\n",
       "2019-08-13 07:28:27.657999992+03:00  iGyXetHE3S8u  Cq9vueHh3zVs  1.565671e+09   \n",
       "2019-08-13 07:58:29.657999992+03:00  iGyXetHE3S8u  Cq9vueHh3zVs  1.565672e+09   \n",
       "2019-08-13 08:28:31.657999992+03:00  iGyXetHE3S8u  Cq9vueHh3zVs  1.565674e+09   \n",
       "2020-01-09 02:08:03.895999908+02:00  jd9INuQ5BBlW  3p83yASkOb_B  1.578528e+09   \n",
       "2020-01-09 02:38:03.895999908+02:00  jd9INuQ5BBlW  3p83yASkOb_B  1.578530e+09   \n",
       "2020-01-09 03:08:03.895999908+02:00  jd9INuQ5BBlW  3p83yASkOb_B  1.578532e+09   \n",
       "\n",
       "                                     is_silent  double_decibels  \\\n",
       "2019-08-13 07:28:27.657999992+03:00          0               51   \n",
       "2019-08-13 07:58:29.657999992+03:00          0               90   \n",
       "2019-08-13 08:28:31.657999992+03:00          0               81   \n",
       "2020-01-09 02:08:03.895999908+02:00          0               84   \n",
       "2020-01-09 02:38:03.895999908+02:00          0               89   \n",
       "2020-01-09 03:08:03.895999908+02:00          0               99   \n",
       "\n",
       "                                     double_frequency  \\\n",
       "2019-08-13 07:28:27.657999992+03:00              7735   \n",
       "2019-08-13 07:58:29.657999992+03:00             13609   \n",
       "2019-08-13 08:28:31.657999992+03:00              7690   \n",
       "2020-01-09 02:08:03.895999908+02:00              4935   \n",
       "2020-01-09 02:38:03.895999908+02:00              8734   \n",
       "2020-01-09 03:08:03.895999908+02:00              1710   \n",
       "\n",
       "                                                               datetime  \n",
       "2019-08-13 07:28:27.657999992+03:00 2019-08-13 07:28:27.657999992+03:00  \n",
       "2019-08-13 07:58:29.657999992+03:00 2019-08-13 07:58:29.657999992+03:00  \n",
       "2019-08-13 08:28:31.657999992+03:00 2019-08-13 08:28:31.657999992+03:00  \n",
       "2020-01-09 02:08:03.895999908+02:00 2020-01-09 02:08:03.895999908+02:00  \n",
       "2020-01-09 02:38:03.895999908+02:00 2020-01-09 02:38:03.895999908+02:00  \n",
       "2020-01-09 03:08:03.895999908+02:00 2020-01-09 03:08:03.895999908+02:00  "
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.sort_values(by=['user', 'datetime'], inplace=True)\n",
    "data.drop_duplicates(['user','time']).groupby('user').head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b4988507",
   "metadata": {},
   "source": [
    "The main column names in our dataframe do not match the Niimpy schema. We could provide these column names as parameters but it easier to rename them here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "5f17abe7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user</th>\n",
       "      <th>device</th>\n",
       "      <th>time</th>\n",
       "      <th>is_silent</th>\n",
       "      <th>double_decibels</th>\n",
       "      <th>double_frequency</th>\n",
       "      <th>datetime</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2019-08-13 07:28:27.657999992+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "      <td>1.565671e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>51</td>\n",
       "      <td>7735</td>\n",
       "      <td>2019-08-13 07:28:27.657999992+03:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 07:58:29.657999992+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "      <td>1.565672e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>90</td>\n",
       "      <td>13609</td>\n",
       "      <td>2019-08-13 07:58:29.657999992+03:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 08:28:31.657999992+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "      <td>1.565674e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>81</td>\n",
       "      <td>7690</td>\n",
       "      <td>2019-08-13 08:28:31.657999992+03:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 08:58:33.657999992+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "      <td>1.565676e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>58</td>\n",
       "      <td>8347</td>\n",
       "      <td>2019-08-13 08:58:33.657999992+03:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 09:28:35.657999992+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "      <td>1.565678e+09</td>\n",
       "      <td>1</td>\n",
       "      <td>36</td>\n",
       "      <td>13592</td>\n",
       "      <td>2019-08-13 09:28:35.657999992+03:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                             user        device          time  \\\n",
       "2019-08-13 07:28:27.657999992+03:00  iGyXetHE3S8u  Cq9vueHh3zVs  1.565671e+09   \n",
       "2019-08-13 07:58:29.657999992+03:00  iGyXetHE3S8u  Cq9vueHh3zVs  1.565672e+09   \n",
       "2019-08-13 08:28:31.657999992+03:00  iGyXetHE3S8u  Cq9vueHh3zVs  1.565674e+09   \n",
       "2019-08-13 08:58:33.657999992+03:00  iGyXetHE3S8u  Cq9vueHh3zVs  1.565676e+09   \n",
       "2019-08-13 09:28:35.657999992+03:00  iGyXetHE3S8u  Cq9vueHh3zVs  1.565678e+09   \n",
       "\n",
       "                                     is_silent  double_decibels  \\\n",
       "2019-08-13 07:28:27.657999992+03:00          0               51   \n",
       "2019-08-13 07:58:29.657999992+03:00          0               90   \n",
       "2019-08-13 08:28:31.657999992+03:00          0               81   \n",
       "2019-08-13 08:58:33.657999992+03:00          0               58   \n",
       "2019-08-13 09:28:35.657999992+03:00          1               36   \n",
       "\n",
       "                                     double_frequency  \\\n",
       "2019-08-13 07:28:27.657999992+03:00              7735   \n",
       "2019-08-13 07:58:29.657999992+03:00             13609   \n",
       "2019-08-13 08:28:31.657999992+03:00              7690   \n",
       "2019-08-13 08:58:33.657999992+03:00              8347   \n",
       "2019-08-13 09:28:35.657999992+03:00             13592   \n",
       "\n",
       "                                                               datetime  \n",
       "2019-08-13 07:28:27.657999992+03:00 2019-08-13 07:28:27.657999992+03:00  \n",
       "2019-08-13 07:58:29.657999992+03:00 2019-08-13 07:58:29.657999992+03:00  \n",
       "2019-08-13 08:28:31.657999992+03:00 2019-08-13 08:28:31.657999992+03:00  \n",
       "2019-08-13 08:58:33.657999992+03:00 2019-08-13 08:58:33.657999992+03:00  \n",
       "2019-08-13 09:28:35.657999992+03:00 2019-08-13 09:28:35.657999992+03:00  "
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = data.rename(columns={'decibels': 'decibels', 'frequency': 'frequency'})\n",
    "data.head()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "d72f467c",
   "metadata": {},
   "source": [
    "Ok, it seems like our dataframe was in order. We can start extracting features. However, we need to understand the data format requirements first.\n",
    "\n",
    "## * TIP! Data format requirements (or what should our data look like)\n",
    "\n",
    "Data can take other shapes and formats. However, the `niimpy` data schema requires it to be in a certain shape. This means the dataframe needs to have at least the following characteristics:\n",
    "1. One row per call. Each row should store information about one call only\n",
    "2. Each row's index should be a timestamp\n",
    "3. The following five columns are required: \n",
    "    - index: date and time when the event happened (timestamp)\n",
    "    - user: stores the user name whose data is analyzed. Each user should have a unique name or hash (i.e. one hash for each unique user)\n",
    "    - is_silent: stores whether the decibel level is above a set threshold (usually 50dB)\n",
    "    - decibels: stores the decibels of the recorded snippet\n",
    "    - frequency: the frequency of the recorded snippet in Hz\n",
    "    - NOTE: most of our audio examples come from data recorded with the Aware Framework, if you want to know more about the frequency and decibels, please read https://github.com/denzilferreira/com.aware.plugin.ambient_noise\n",
    "4. Additional columns are allowed.\n",
    "5. The names of the columns do not need to be exactly \"user\", \"is_silent\", \"decibels\" or \"frequency\" as we can pass our own names in an argument.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b8a7a20d",
   "metadata": {},
   "source": [
    "Column names in our data do not match the Niimpy schema. We could provide these column names as parameters to niimpy functions, but it is simpler to rename them here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "9436998e",
   "metadata": {},
   "outputs": [],
   "source": [
    "data = data.rename(columns={'double_decibels': 'decibels', 'double_frequency': 'frequency'})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f2e6e2d6",
   "metadata": {},
   "source": [
    "Below is an example of a dataframe that complies with these minimum requirements"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "8c66c6b3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user</th>\n",
       "      <th>is_silent</th>\n",
       "      <th>decibels</th>\n",
       "      <th>frequency</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2019-08-13 07:28:27.657999992+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>0</td>\n",
       "      <td>51</td>\n",
       "      <td>7735</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 07:58:29.657999992+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>0</td>\n",
       "      <td>90</td>\n",
       "      <td>13609</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 08:28:31.657999992+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>0</td>\n",
       "      <td>81</td>\n",
       "      <td>7690</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                             user  is_silent  decibels  \\\n",
       "2019-08-13 07:28:27.657999992+03:00  iGyXetHE3S8u          0        51   \n",
       "2019-08-13 07:58:29.657999992+03:00  iGyXetHE3S8u          0        90   \n",
       "2019-08-13 08:28:31.657999992+03:00  iGyXetHE3S8u          0        81   \n",
       "\n",
       "                                     frequency  \n",
       "2019-08-13 07:28:27.657999992+03:00       7735  \n",
       "2019-08-13 07:58:29.657999992+03:00      13609  \n",
       "2019-08-13 08:28:31.657999992+03:00       7690  "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "example_dataschema = data[['user','is_silent','decibels','frequency']]\n",
    "example_dataschema.head(3)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "7508a0bf",
   "metadata": {},
   "source": [
    "## 4. Extracting features\n",
    "There are two ways to extract features. We could use each function separately or we could use `niimpy`'s ready-made wrapper. Both ways will require us to specify arguments to pass to the functions/wrapper in order to customize the way the functions work. These arguments are specified in dictionaries. Let's first understand how to extract features using stand-alone functions.\n",
    "\n",
    "### 4.1 Extract features using stand-alone functions\n",
    "We can use `niimpy`'s functions to compute communication features. Each function will require two inputs:\n",
    "- (mandatory) dataframe that must comply with the minimum requirements (see '* TIP! Data requirements above)\n",
    "- (optional) arguments for stand-alone functions\n",
    "\n",
    "#### 4.1.1 The argument dictionary for stand-alone functions (or how we specify the way a function works)\n",
    "We can input two types of arguments to customize the way a stand-alone function works:\n",
    "- the name of the columns to be preprocessed: Since the dataframe may have different columns, we need to specify which column has the data we would like to be preprocessed. To do so, we can simply pass the name of the column to the argument `audio_column_name`. \n",
    "\n",
    "- the way we resample: resampling options are specified in `niimpy` as a dictionary. `niimpy`'s resampling and aggregating relies on `pandas.DataFrame.resample`, so mastering the use of this pandas function will help us greatly in `niimpy`'s preprocessing. Please familiarize yourself with the pandas resample function before continuing. \n",
    "    Briefly, to use the `pandas.DataFrame.resample` function, we need a rule. This rule states the intervals we would like to use to resample our data (e.g., 15-seconds, 30-minutes, 1-hour). Neverthless, we can input more details into the function to specify the exact sampling we would like. For example, we could use the *close* argument if we would like to specify which side of the interval is closed, or we could use the *offset* argument if we would like to start our binning with an offset, etc. There are plenty of options to use this command, so we strongly recommend having `pandas.DataFrame.resample` documentation at hand. All features for the `pandas.DataFrame.resample` will be specified in a dictionary where keys are the arguments' names for the `pandas.DataFrame.resample`, and the dictionary's values are the values for each of these selected arguments. This dictionary will be passed as a value to the key `resample_args` in `niimpy`.\n",
    "\n",
    "Let's see some examples of these parameters:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "be26e793",
   "metadata": {},
   "source": [
    "```python\n",
    "au.audio_count_loud(data, audio_column_name = \"frequency\", resample_args = {\"rule\":\"1D\"})\n",
    "au.audio_count_loud(data, audio_column_name = \"random_name\", resample_args = {\"rule\":\"30min\"})\n",
    "au.audio_count_loud(data, audio_column_name = \"other_name\", resample_args = {\"rule\":\"45T\",\"origin\":\"end\"})\n",
    "```"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "393cd2dd",
   "metadata": {},
   "source": [
    "Here, we have three basic feature dictionaries. \n",
    "\n",
    "- The first example will analyze the data stored in the column `frequency` in our dataframe. The data will be binned in one day periods\n",
    "- The second example will analyze the data stored in the column `random_name` in our dataframe. The data will be aggregated in 30-minutes bins\n",
    "- The third example will analyze the data stored in the column `other_name` in our dataframe. The data will be binned in 45-minutes bins, but the binning will start from the last timestamp in the dataframe. \n",
    "\n",
    "**Default values:** if no arguments are passed, `niimpy`'s will aggregate the data in 30-min bins, and will select the audio_column_name according to the most suitable column. For example, if we are computing the minimum frequency, `niimpy` will select *frequency* as the column name. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1d64934a",
   "metadata": {},
   "source": [
    "#### 4.1.2 Using the functions\n",
    "Now that we understand how the functions are customized, it is time we compute our first audio feature. Suppose that we are interested in extracting the total number of times our recordings were loud every 50 minutes. We will need `niimpy`'s `audio_count_loud` function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "98a0af37",
   "metadata": {},
   "outputs": [],
   "source": [
    "my_loud_times = au.audio_count_loud(\n",
    "    data,\n",
    "    audio_column_name = \"decibels\",\n",
    "    resample_args = {\"rule\":\"50T\"}\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "3f6a607d",
   "metadata": {},
   "source": [
    "Let's look at some values for one of the subjects."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "ae8260cb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user</th>\n",
       "      <th>audio_count_loud</th>\n",
       "      <th>device</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2020-01-09 01:40:00+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>1</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 02:30:00+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>2</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 03:20:00+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>2</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 04:10:00+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>0</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 05:00:00+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>1</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 05:50:00+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>1</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 06:40:00+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>1</td>\n",
       "      <td>OWd1Uau8POix</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 07:30:00+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>0</td>\n",
       "      <td>OWd1Uau8POix</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 08:20:00+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>1</td>\n",
       "      <td>OWd1Uau8POix</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 09:10:00+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>1</td>\n",
       "      <td>OWd1Uau8POix</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 10:00:00+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>2</td>\n",
       "      <td>OWd1Uau8POix</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                   user  audio_count_loud        device\n",
       "2020-01-09 01:40:00+02:00  jd9INuQ5BBlW                 1  3p83yASkOb_B\n",
       "2020-01-09 02:30:00+02:00  jd9INuQ5BBlW                 2  3p83yASkOb_B\n",
       "2020-01-09 03:20:00+02:00  jd9INuQ5BBlW                 2  3p83yASkOb_B\n",
       "2020-01-09 04:10:00+02:00  jd9INuQ5BBlW                 0  3p83yASkOb_B\n",
       "2020-01-09 05:00:00+02:00  jd9INuQ5BBlW                 1  3p83yASkOb_B\n",
       "2020-01-09 05:50:00+02:00  jd9INuQ5BBlW                 1  3p83yASkOb_B\n",
       "2020-01-09 06:40:00+02:00  jd9INuQ5BBlW                 1  OWd1Uau8POix\n",
       "2020-01-09 07:30:00+02:00  jd9INuQ5BBlW                 0  OWd1Uau8POix\n",
       "2020-01-09 08:20:00+02:00  jd9INuQ5BBlW                 1  OWd1Uau8POix\n",
       "2020-01-09 09:10:00+02:00  jd9INuQ5BBlW                 1  OWd1Uau8POix\n",
       "2020-01-09 10:00:00+02:00  jd9INuQ5BBlW                 2  OWd1Uau8POix"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "my_loud_times[my_loud_times[\"user\"]==\"jd9INuQ5BBlW\"]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "6ffd53b7",
   "metadata": {},
   "source": [
    "Let's remember how the original data looks like for this subject"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "e085424f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user</th>\n",
       "      <th>device</th>\n",
       "      <th>time</th>\n",
       "      <th>is_silent</th>\n",
       "      <th>decibels</th>\n",
       "      <th>frequency</th>\n",
       "      <th>datetime</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2020-01-09 02:08:03.895999908+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "      <td>1.578528e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>84</td>\n",
       "      <td>4935</td>\n",
       "      <td>2020-01-09 02:08:03.895999908+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 02:38:03.895999908+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "      <td>1.578530e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>89</td>\n",
       "      <td>8734</td>\n",
       "      <td>2020-01-09 02:38:03.895999908+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 03:08:03.895999908+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "      <td>1.578532e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>99</td>\n",
       "      <td>1710</td>\n",
       "      <td>2020-01-09 03:08:03.895999908+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 03:38:03.895999908+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "      <td>1.578534e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>77</td>\n",
       "      <td>9054</td>\n",
       "      <td>2020-01-09 03:38:03.895999908+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 04:08:03.895999908+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "      <td>1.578536e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>80</td>\n",
       "      <td>12265</td>\n",
       "      <td>2020-01-09 04:08:03.895999908+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 04:38:03.895999908+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "      <td>1.578537e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>52</td>\n",
       "      <td>7281</td>\n",
       "      <td>2020-01-09 04:38:03.895999908+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 05:08:03.895999908+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "      <td>1.578539e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>63</td>\n",
       "      <td>14408</td>\n",
       "      <td>2020-01-09 05:08:03.895999908+02:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                             user        device          time  \\\n",
       "2020-01-09 02:08:03.895999908+02:00  jd9INuQ5BBlW  3p83yASkOb_B  1.578528e+09   \n",
       "2020-01-09 02:38:03.895999908+02:00  jd9INuQ5BBlW  3p83yASkOb_B  1.578530e+09   \n",
       "2020-01-09 03:08:03.895999908+02:00  jd9INuQ5BBlW  3p83yASkOb_B  1.578532e+09   \n",
       "2020-01-09 03:38:03.895999908+02:00  jd9INuQ5BBlW  3p83yASkOb_B  1.578534e+09   \n",
       "2020-01-09 04:08:03.895999908+02:00  jd9INuQ5BBlW  3p83yASkOb_B  1.578536e+09   \n",
       "2020-01-09 04:38:03.895999908+02:00  jd9INuQ5BBlW  3p83yASkOb_B  1.578537e+09   \n",
       "2020-01-09 05:08:03.895999908+02:00  jd9INuQ5BBlW  3p83yASkOb_B  1.578539e+09   \n",
       "\n",
       "                                     is_silent  decibels  frequency  \\\n",
       "2020-01-09 02:08:03.895999908+02:00          0        84       4935   \n",
       "2020-01-09 02:38:03.895999908+02:00          0        89       8734   \n",
       "2020-01-09 03:08:03.895999908+02:00          0        99       1710   \n",
       "2020-01-09 03:38:03.895999908+02:00          0        77       9054   \n",
       "2020-01-09 04:08:03.895999908+02:00          0        80      12265   \n",
       "2020-01-09 04:38:03.895999908+02:00          0        52       7281   \n",
       "2020-01-09 05:08:03.895999908+02:00          0        63      14408   \n",
       "\n",
       "                                                               datetime  \n",
       "2020-01-09 02:08:03.895999908+02:00 2020-01-09 02:08:03.895999908+02:00  \n",
       "2020-01-09 02:38:03.895999908+02:00 2020-01-09 02:38:03.895999908+02:00  \n",
       "2020-01-09 03:08:03.895999908+02:00 2020-01-09 03:08:03.895999908+02:00  \n",
       "2020-01-09 03:38:03.895999908+02:00 2020-01-09 03:38:03.895999908+02:00  \n",
       "2020-01-09 04:08:03.895999908+02:00 2020-01-09 04:08:03.895999908+02:00  \n",
       "2020-01-09 04:38:03.895999908+02:00 2020-01-09 04:38:03.895999908+02:00  \n",
       "2020-01-09 05:08:03.895999908+02:00 2020-01-09 05:08:03.895999908+02:00  "
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[data[\"user\"]==\"jd9INuQ5BBlW\"].head(7)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "dbea7c11",
   "metadata": {},
   "source": [
    "We see that the bins are indeed 50-minutes bins, however, they are adjusted to fixed, predetermined intervals, i.e. the bin does not start on the time of the first datapoint. Instead, `pandas` starts the binning at 00:00:00 of everyday and counts 50-minutes intervals from there. \n",
    "\n",
    "If we want the binning to start from the first datapoint in our dataset, we need the origin parameter and a for loop."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "d7ff80f4",
   "metadata": {},
   "outputs": [],
   "source": [
    "users = list(data['user'].unique())\n",
    "results = []\n",
    "for user in users:\n",
    "    start_time = data[data[\"user\"]==user].index.min()\n",
    "    results.append(au.audio_count_loud(\n",
    "        data[data[\"user\"]==user],\n",
    "        audio_column_name=\"decibels\",\n",
    "        resample_args={\"rule\":\"50T\"}\n",
    "    ))\n",
    "my_loud_times = pd.concat(results)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "427ab240",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user</th>\n",
       "      <th>audio_count_loud</th>\n",
       "      <th>device</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2019-08-13 07:30:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>1</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 08:20:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>1</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 09:10:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>1</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 10:00:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>1</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 10:50:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>2</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 11:40:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>1</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 12:30:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>0</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 13:20:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>0</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 14:10:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>1</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 15:00:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>0</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 15:50:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>1</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 16:40:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>1</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 01:40:00+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>1</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 02:30:00+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>2</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 03:20:00+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>2</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 04:10:00+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>0</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 05:00:00+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>1</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 05:50:00+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>1</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 06:40:00+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>1</td>\n",
       "      <td>OWd1Uau8POix</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 07:30:00+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>0</td>\n",
       "      <td>OWd1Uau8POix</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 08:20:00+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>1</td>\n",
       "      <td>OWd1Uau8POix</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 09:10:00+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>1</td>\n",
       "      <td>OWd1Uau8POix</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 10:00:00+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>2</td>\n",
       "      <td>OWd1Uau8POix</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                   user  audio_count_loud        device\n",
       "2019-08-13 07:30:00+03:00  iGyXetHE3S8u                 1  Cq9vueHh3zVs\n",
       "2019-08-13 08:20:00+03:00  iGyXetHE3S8u                 1  Cq9vueHh3zVs\n",
       "2019-08-13 09:10:00+03:00  iGyXetHE3S8u                 1  Cq9vueHh3zVs\n",
       "2019-08-13 10:00:00+03:00  iGyXetHE3S8u                 1  Cq9vueHh3zVs\n",
       "2019-08-13 10:50:00+03:00  iGyXetHE3S8u                 2  Cq9vueHh3zVs\n",
       "2019-08-13 11:40:00+03:00  iGyXetHE3S8u                 1  Cq9vueHh3zVs\n",
       "2019-08-13 12:30:00+03:00  iGyXetHE3S8u                 0  Cq9vueHh3zVs\n",
       "2019-08-13 13:20:00+03:00  iGyXetHE3S8u                 0  Cq9vueHh3zVs\n",
       "2019-08-13 14:10:00+03:00  iGyXetHE3S8u                 1  Cq9vueHh3zVs\n",
       "2019-08-13 15:00:00+03:00  iGyXetHE3S8u                 0  Cq9vueHh3zVs\n",
       "2019-08-13 15:50:00+03:00  iGyXetHE3S8u                 1  Cq9vueHh3zVs\n",
       "2019-08-13 16:40:00+03:00  iGyXetHE3S8u                 1  Cq9vueHh3zVs\n",
       "2020-01-09 01:40:00+02:00  jd9INuQ5BBlW                 1  3p83yASkOb_B\n",
       "2020-01-09 02:30:00+02:00  jd9INuQ5BBlW                 2  3p83yASkOb_B\n",
       "2020-01-09 03:20:00+02:00  jd9INuQ5BBlW                 2  3p83yASkOb_B\n",
       "2020-01-09 04:10:00+02:00  jd9INuQ5BBlW                 0  3p83yASkOb_B\n",
       "2020-01-09 05:00:00+02:00  jd9INuQ5BBlW                 1  3p83yASkOb_B\n",
       "2020-01-09 05:50:00+02:00  jd9INuQ5BBlW                 1  3p83yASkOb_B\n",
       "2020-01-09 06:40:00+02:00  jd9INuQ5BBlW                 1  OWd1Uau8POix\n",
       "2020-01-09 07:30:00+02:00  jd9INuQ5BBlW                 0  OWd1Uau8POix\n",
       "2020-01-09 08:20:00+02:00  jd9INuQ5BBlW                 1  OWd1Uau8POix\n",
       "2020-01-09 09:10:00+02:00  jd9INuQ5BBlW                 1  OWd1Uau8POix\n",
       "2020-01-09 10:00:00+02:00  jd9INuQ5BBlW                 2  OWd1Uau8POix"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "my_loud_times"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "41b3cbd2",
   "metadata": {},
   "source": [
    "### 4.2 Extract features using the wrapper\n",
    "We can use `niimpy`'s ready-made wrapper to extract one or several features at the same time. The wrapper will require two inputs:\n",
    "- (mandatory) dataframe that must comply with the minimum requirements (see '* TIP! Data requirements above)\n",
    "- (optional) an argument dictionary for wrapper\n",
    "\n",
    "#### 4.2.1 The argument dictionary for wrapper (or how we specify the way the wrapper works)\n",
    "The argument dictionary contains the arguments for each stand-alone function we would like to employ. Its keys are the feature functions we want to compute. Its values are argument dictionaries created for each stand-alone function we will employ. \n",
    "Let's see some examples of wrapper dictionaries:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "87d9d44d",
   "metadata": {},
   "outputs": [],
   "source": [
    "wrapper_features1 = {au.audio_count_loud:{\"audio_column_name\":\"decibels\",\"resample_args\":{\"rule\":\"1D\"}},\n",
    "                     au.audio_max_freq:{\"audio_column_name\":\"frequency\",\"resample_args\":{\"rule\":\"1D\"}}}"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "7a67b446",
   "metadata": {},
   "source": [
    "- `wrapper_features1` will be used to analyze two features, `audio_count_loud` and `audio_max_freq`. For the feature audio_count_loud, we will use the data stored in the column `decibels` in our dataframe and the data will be binned in one day periods.  For the feature audio_max_freq, we will use the data stored in the column `frequency` in our dataframe and the data will be binned in one day periods. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "d3332573",
   "metadata": {},
   "outputs": [],
   "source": [
    "wrapper_features2 = {au.audio_mean_db:{\"audio_column_name\":\"random_name\",\"resample_args\":{\"rule\":\"1D\"}},\n",
    "                     au.audio_count_speech:{\"audio_column_name\":\"decibels\", \"audio_freq_name\":\"frequency\", \"resample_args\":{\"rule\":\"5H\",\"offset\":\"5min\"}}}"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "205c28ba",
   "metadata": {},
   "source": [
    "- `wrapper_features2` will be used to analyze two features, `audio_mean_db` and `audio_count_speech`. For the feature audio_mean_db, we will use the data stored in the column `random_name` in our dataframe and the data will be binned in one day periods.  For the feature audio_count_speech, we will use the data stored in the column `decibels` in our dataframe and the data will be binned in 5-hour periods with a 5-minute offset. Note that for this feature we will also need another column named \"audio_freq_column\", this is because the speech is not only defined by the amplitude of the recording, but the frequency range. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "a2570c5b",
   "metadata": {},
   "outputs": [],
   "source": [
    "wrapper_features3 = {au.audio_mean_db:{\"audio_column_name\":\"one_name\",\"resample_args\":{\"rule\":\"1D\",\"offset\":\"5min\"}},\n",
    "                     au.audio_min_freq:{\"audio_column_name\":\"one_name\",\"resample_args\":{\"rule\":\"5H\"}},\n",
    "                     au.audio_count_silent:{\"audio_column_name\":\"another_name\",\"resample_args\":{\"rule\":\"30T\",\"origin\":\"end_day\"}}}"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "1377bc9d",
   "metadata": {},
   "source": [
    "- `wrapper_features3` will be used to analyze three features, `audio_mean_db`, `audio_min_freq`, and `audio_count_silent`. For the feature audio_mean_db, we will use the data stored in the column `one_name` and the data will be binned in one day periods with a 5-min offset.  For the feature audio_min_freq, we will use the data stored in the column `one_name` in our dataframe and the data will be binned in 5-hour periods. Finally, for the feature audio_count_silent, we will use the data stored in the column `another_name` in our dataframe and the data will be binned in 30-minute periods and the origin of the bins will be the ceiling midnight of the last day.\n",
    "\n",
    "**Default values:** if no arguments are passed, `niimpy`'s default values are either \"decibels\", \"frequency\", or \"is_silent\" for the communication_column_name, and 30-min aggregation bins. The column name depends on the function to be called. Moreover, the wrapper will compute all the available functions in absence of the argument dictionary. \n",
    "\n",
    "#### 4.2.2 Using the wrapper\n",
    "Now that we understand how the wrapper is customized, it is time we compute our first communication feature using the wrapper. Suppose that we are interested in extracting the audio_count_loud  duration every 50 minutes. We will need `niimpy`'s `extract_features_audio` function, the data, and we will also need to create a dictionary to customize our function. Let's create the dictionary first"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "1a16011f",
   "metadata": {},
   "outputs": [],
   "source": [
    "wrapper_features1 = {au.audio_count_loud:{\"audio_column_name\":\"decibels\",\"resample_args\":{\"rule\":\"50T\"}}}"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "d8ac128e",
   "metadata": {},
   "source": [
    "Now, let's use the wrapper"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "24f453c0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user</th>\n",
       "      <th>device</th>\n",
       "      <th>audio_count_loud</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2019-08-13 07:30:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 08:20:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 09:10:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 10:00:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 10:50:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                   user        device  audio_count_loud\n",
       "2019-08-13 07:30:00+03:00  iGyXetHE3S8u  Cq9vueHh3zVs                 1\n",
       "2019-08-13 08:20:00+03:00  iGyXetHE3S8u  Cq9vueHh3zVs                 1\n",
       "2019-08-13 09:10:00+03:00  iGyXetHE3S8u  Cq9vueHh3zVs                 1\n",
       "2019-08-13 10:00:00+03:00  iGyXetHE3S8u  Cq9vueHh3zVs                 1\n",
       "2019-08-13 10:50:00+03:00  iGyXetHE3S8u  Cq9vueHh3zVs                 2"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "results_wrapper = au.extract_features_audio(data, features=wrapper_features1)\n",
    "results_wrapper.head(5)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "3816fc21",
   "metadata": {},
   "source": [
    "Our first attempt was succesful. Now, let's try something more. Let's assume we want to compute the audio_count_loud and audio_min_freq in 1-hour bins."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "0906693e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user</th>\n",
       "      <th>device</th>\n",
       "      <th>audio_count_loud</th>\n",
       "      <th>audio_min_freq</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2019-08-13 07:00:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "      <td>1</td>\n",
       "      <td>7735.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 08:00:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "      <td>1</td>\n",
       "      <td>7690.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 09:00:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "      <td>1</td>\n",
       "      <td>756.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 10:00:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "      <td>2</td>\n",
       "      <td>3059.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 11:00:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "      <td>2</td>\n",
       "      <td>12278.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                   user        device  audio_count_loud  \\\n",
       "2019-08-13 07:00:00+03:00  iGyXetHE3S8u  Cq9vueHh3zVs                 1   \n",
       "2019-08-13 08:00:00+03:00  iGyXetHE3S8u  Cq9vueHh3zVs                 1   \n",
       "2019-08-13 09:00:00+03:00  iGyXetHE3S8u  Cq9vueHh3zVs                 1   \n",
       "2019-08-13 10:00:00+03:00  iGyXetHE3S8u  Cq9vueHh3zVs                 2   \n",
       "2019-08-13 11:00:00+03:00  iGyXetHE3S8u  Cq9vueHh3zVs                 2   \n",
       "\n",
       "                           audio_min_freq  \n",
       "2019-08-13 07:00:00+03:00          7735.0  \n",
       "2019-08-13 08:00:00+03:00          7690.0  \n",
       "2019-08-13 09:00:00+03:00           756.0  \n",
       "2019-08-13 10:00:00+03:00          3059.0  \n",
       "2019-08-13 11:00:00+03:00         12278.0  "
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "wrapper_features2 = {au.audio_count_loud:{\"audio_column_name\":\"decibels\",\"resample_args\":{\"rule\":\"1H\"}},\n",
    "                     au.audio_min_freq:{\"audio_column_name\":\"frequency\", \"resample_args\":{\"rule\":\"1H\"}}}\n",
    "results_wrapper = au.extract_features_audio(data, features=wrapper_features2)\n",
    "results_wrapper.head(5)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "2244a071",
   "metadata": {},
   "source": [
    "Great! Another successful attempt. We see from the results that more columns were added with the required calculations. This is how the wrapper works when all features are computed with the same bins. Now, let's see how the wrapper performs when each function has different binning requirements. Let's assume we need to compute the audio_count_loud every day, and the audio_min_freq every 5 hours with an offset of 5 minutes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "4e80bfd0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user</th>\n",
       "      <th>device</th>\n",
       "      <th>audio_count_loud</th>\n",
       "      <th>audio_min_freq</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2019-08-13 00:00:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "      <td>10.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 00:00:00+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "      <td>7.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 00:00:00+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>OWd1Uau8POix</td>\n",
       "      <td>5.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 05:05:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "      <td>NaN</td>\n",
       "      <td>756.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 10:05:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2914.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                   user        device  audio_count_loud  \\\n",
       "2019-08-13 00:00:00+03:00  iGyXetHE3S8u  Cq9vueHh3zVs              10.0   \n",
       "2020-01-09 00:00:00+02:00  jd9INuQ5BBlW  3p83yASkOb_B               7.0   \n",
       "2020-01-09 00:00:00+02:00  jd9INuQ5BBlW  OWd1Uau8POix               5.0   \n",
       "2019-08-13 05:05:00+03:00  iGyXetHE3S8u  Cq9vueHh3zVs               NaN   \n",
       "2019-08-13 10:05:00+03:00  iGyXetHE3S8u  Cq9vueHh3zVs               NaN   \n",
       "\n",
       "                           audio_min_freq  \n",
       "2019-08-13 00:00:00+03:00             NaN  \n",
       "2020-01-09 00:00:00+02:00             NaN  \n",
       "2020-01-09 00:00:00+02:00             NaN  \n",
       "2019-08-13 05:05:00+03:00           756.0  \n",
       "2019-08-13 10:05:00+03:00          2914.0  "
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "wrapper_features3 = {au.audio_count_loud:{\"audio_column_name\":\"decibels\",\"resample_args\":{\"rule\":\"1D\"}},\n",
    "                     au.audio_min_freq:{\"audio_column_name\":\"frequency\", \"resample_args\":{\"rule\":\"5H\", \"offset\":\"5min\"}}}\n",
    "results_wrapper = au.extract_features_audio(data, features=wrapper_features3)\n",
    "results_wrapper.head(5)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "c6563910",
   "metadata": {},
   "source": [
    "The output is once again a dataframe. In this case, two aggregations are shown. The first one is the daily aggregation computed for the `audio_count_loud` feature. The second one is the 5-hour aggregation period with 5-min offset for the `audio_min_freq`. We must note that because the `audio_min_freq`feature is not required to be aggregated daily, the daily aggregation timestamps have a NaN value. Similarly, because the `audio_count_loud`is not required to be aggregated in 5-hour windows, its values are NaN for all subjects. "
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "8a960ee8",
   "metadata": {},
   "source": [
    "#### 4.2.3 Wrapper and its default option\n",
    "The default option will compute all features in 30-minute aggregation windows. To use the `extract_features_audio` function with its default options, simply call the function. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "daf215ac",
   "metadata": {},
   "outputs": [],
   "source": [
    "default = au.extract_features_audio(data, features=None)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "68a22b4e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user</th>\n",
       "      <th>device</th>\n",
       "      <th>audio_count_silent</th>\n",
       "      <th>audio_count_speech</th>\n",
       "      <th>audio_count_loud</th>\n",
       "      <th>audio_min_freq</th>\n",
       "      <th>audio_max_freq</th>\n",
       "      <th>audio_mean_freq</th>\n",
       "      <th>audio_median_freq</th>\n",
       "      <th>audio_std_freq</th>\n",
       "      <th>audio_min_db</th>\n",
       "      <th>audio_max_db</th>\n",
       "      <th>audio_mean_db</th>\n",
       "      <th>audio_median_db</th>\n",
       "      <th>audio_std_db</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2019-08-13 07:00:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>7735.0</td>\n",
       "      <td>7735.0</td>\n",
       "      <td>7735.0</td>\n",
       "      <td>7735.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>51.0</td>\n",
       "      <td>51.0</td>\n",
       "      <td>51.0</td>\n",
       "      <td>51.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 07:30:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>13609.0</td>\n",
       "      <td>13609.0</td>\n",
       "      <td>13609.0</td>\n",
       "      <td>13609.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>90.0</td>\n",
       "      <td>90.0</td>\n",
       "      <td>90.0</td>\n",
       "      <td>90.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 08:00:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>7690.0</td>\n",
       "      <td>7690.0</td>\n",
       "      <td>7690.0</td>\n",
       "      <td>7690.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>81.0</td>\n",
       "      <td>81.0</td>\n",
       "      <td>81.0</td>\n",
       "      <td>81.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 08:30:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>8347.0</td>\n",
       "      <td>8347.0</td>\n",
       "      <td>8347.0</td>\n",
       "      <td>8347.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>58.0</td>\n",
       "      <td>58.0</td>\n",
       "      <td>58.0</td>\n",
       "      <td>58.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 09:00:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>Cq9vueHh3zVs</td>\n",
       "      <td>1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>13592.0</td>\n",
       "      <td>13592.0</td>\n",
       "      <td>13592.0</td>\n",
       "      <td>13592.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>36.0</td>\n",
       "      <td>36.0</td>\n",
       "      <td>36.0</td>\n",
       "      <td>36.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                   user        device  audio_count_silent  \\\n",
       "2019-08-13 07:00:00+03:00  iGyXetHE3S8u  Cq9vueHh3zVs                   0   \n",
       "2019-08-13 07:30:00+03:00  iGyXetHE3S8u  Cq9vueHh3zVs                   0   \n",
       "2019-08-13 08:00:00+03:00  iGyXetHE3S8u  Cq9vueHh3zVs                   0   \n",
       "2019-08-13 08:30:00+03:00  iGyXetHE3S8u  Cq9vueHh3zVs                   0   \n",
       "2019-08-13 09:00:00+03:00  iGyXetHE3S8u  Cq9vueHh3zVs                   1   \n",
       "\n",
       "                           audio_count_speech  audio_count_loud  \\\n",
       "2019-08-13 07:00:00+03:00                 NaN               NaN   \n",
       "2019-08-13 07:30:00+03:00                 NaN               1.0   \n",
       "2019-08-13 08:00:00+03:00                 NaN               1.0   \n",
       "2019-08-13 08:30:00+03:00                 NaN               0.0   \n",
       "2019-08-13 09:00:00+03:00                 NaN               0.0   \n",
       "\n",
       "                           audio_min_freq  audio_max_freq  audio_mean_freq  \\\n",
       "2019-08-13 07:00:00+03:00          7735.0          7735.0           7735.0   \n",
       "2019-08-13 07:30:00+03:00         13609.0         13609.0          13609.0   \n",
       "2019-08-13 08:00:00+03:00          7690.0          7690.0           7690.0   \n",
       "2019-08-13 08:30:00+03:00          8347.0          8347.0           8347.0   \n",
       "2019-08-13 09:00:00+03:00         13592.0         13592.0          13592.0   \n",
       "\n",
       "                           audio_median_freq  audio_std_freq  audio_min_db  \\\n",
       "2019-08-13 07:00:00+03:00             7735.0             NaN          51.0   \n",
       "2019-08-13 07:30:00+03:00            13609.0             NaN          90.0   \n",
       "2019-08-13 08:00:00+03:00             7690.0             NaN          81.0   \n",
       "2019-08-13 08:30:00+03:00             8347.0             NaN          58.0   \n",
       "2019-08-13 09:00:00+03:00            13592.0             NaN          36.0   \n",
       "\n",
       "                           audio_max_db  audio_mean_db  audio_median_db  \\\n",
       "2019-08-13 07:00:00+03:00          51.0           51.0             51.0   \n",
       "2019-08-13 07:30:00+03:00          90.0           90.0             90.0   \n",
       "2019-08-13 08:00:00+03:00          81.0           81.0             81.0   \n",
       "2019-08-13 08:30:00+03:00          58.0           58.0             58.0   \n",
       "2019-08-13 09:00:00+03:00          36.0           36.0             36.0   \n",
       "\n",
       "                           audio_std_db  \n",
       "2019-08-13 07:00:00+03:00           NaN  \n",
       "2019-08-13 07:30:00+03:00           NaN  \n",
       "2019-08-13 08:00:00+03:00           NaN  \n",
       "2019-08-13 08:30:00+03:00           NaN  \n",
       "2019-08-13 09:00:00+03:00           NaN  "
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "default.head()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "d0a40289",
   "metadata": {},
   "source": [
    "## 5. Implementing own features"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "e2dfbcbe",
   "metadata": {},
   "source": [
    "If none of the provided functions suits well, We can implement our own customized features easily. To do so, we need to define a function that accepts a dataframe and returns a dataframe. The returned object should be indexed by user and timestamps (multiindex).\n",
    "Let's assume we need a new function that counts sums all frequencies. Let's first define the function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "839a0dee",
   "metadata": {},
   "outputs": [],
   "source": [
    "def audio_sum_freq(df, audio_column_name = \"frequency\", resample_args= {\"rule\":\"30T\"}):\n",
    "    if len(df)>0:\n",
    "        result = df.groupby('user')[audio_column_name].resample(**resample_args).sum()\n",
    "        result = result.to_frame(name='audio_sum_freq')\n",
    "        result = result.reset_index(\"user\")\n",
    "        result.index.rename(\"datetime\", inplace=True)\n",
    "        return result\n",
    "    return None"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "07787017",
   "metadata": {},
   "source": [
    "Then, we can call our new function in the stand-alone way or using the `extract_features_audio` function. Alternatively, we can pass the feature function to the wrapper. Let's read again the data and assume we want the default behavior of the wrapper. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "150945da",
   "metadata": {},
   "outputs": [],
   "source": [
    "customized_features = au.extract_features_audio(data, features={audio_sum_freq: {}})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "4d4bd7e4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user</th>\n",
       "      <th>audio_sum_freq</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>datetime</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2019-08-13 07:00:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>7735</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 07:30:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>13609</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 08:00:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>7690</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 08:30:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>8347</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-13 09:00:00+03:00</th>\n",
       "      <td>iGyXetHE3S8u</td>\n",
       "      <td>13592</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                   user  audio_sum_freq\n",
       "datetime                                               \n",
       "2019-08-13 07:00:00+03:00  iGyXetHE3S8u            7735\n",
       "2019-08-13 07:30:00+03:00  iGyXetHE3S8u           13609\n",
       "2019-08-13 08:00:00+03:00  iGyXetHE3S8u            7690\n",
       "2019-08-13 08:30:00+03:00  iGyXetHE3S8u            8347\n",
       "2019-08-13 09:00:00+03:00  iGyXetHE3S8u           13592"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "customized_features.head()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "niimpy",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}