Python100dayscourse / part3 /73 - Day 73 - Advanced - Data Visualisation with Matplotlib Programming Languages /009 Smoothing out Time-Series Data.html
| <p>Looking at our chart we see that time-series data can be quite noisy, with a lot of up and down spikes. This can sometimes make it difficult to see what's going on. </p><p>A useful technique to make a trend apparent is to smooth out the observations by taking an average. By averaging say, 6 or 12 observations we can construct something called the rolling mean. Essentially we calculate the average in a window of time and move it forward by one observation at a time. </p><p>Since this is such a common technique, Pandas actually two handy methods already built-in: <a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rolling.html" rel="noopener noreferrer" target="_blank">rolling()</a> and <a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.rolling.Rolling.mean.html" rel="noopener noreferrer" target="_blank">mean()</a>. We can chain these two methods up to create a DataFrame made up of the averaged observations. </p><pre class="prettyprint linenums"># The window is number of observations that are averaged | |
| roll_df = reshaped_df.rolling(window=6).mean() | |
| plt.figure(figsize=(16,10)) | |
| plt.xticks(fontsize=14) | |
| plt.yticks(fontsize=14) | |
| plt.xlabel('Date', fontsize=14) | |
| plt.ylabel('Number of Posts', fontsize=14) | |
| plt.ylim(0, 35000) | |
| # plot the roll_df instead | |
| for column in roll_df.columns: | |
| plt.plot(roll_df.index, roll_df[column], | |
| linewidth=3, label=roll_df[column].name) | |
| plt.legend(fontsize=16)</pre><p>Now our chart looks something like this:</p><figure><img src="https://img-c.udemycdn.com/redactor/raw/2020-09-23_16-32-57-3c790c5a655c0c6960852aa8ee61e757.png"></figure><p>Play with the <code>window</code> argument (use <code>3</code> or <code>12</code>) and see how the chart changes!</p> |