dangvinh77's picture
Upload folder using huggingface_hub
413a032 verified
<p>Looking at our chart we see that time-series data can be quite noisy, with a lot of up and down spikes. This can sometimes make it difficult to see what's going on. </p><p>A useful technique to make a trend apparent is to smooth out the observations by taking an average. By averaging say, 6 or 12 observations we can construct something called the rolling mean. Essentially we calculate the average in a window of time and move it forward by one observation at a time. </p><p>Since this is such a common technique, Pandas actually two handy methods already built-in: <a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rolling.html" rel="noopener noreferrer" target="_blank">rolling()</a> and <a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.rolling.Rolling.mean.html" rel="noopener noreferrer" target="_blank">mean()</a>. We can chain these two methods up to create a DataFrame made up of the averaged observations. </p><pre class="prettyprint linenums"># The window is number of observations that are averaged
roll_df = reshaped_df.rolling(window=6).mean()
plt.figure(figsize=(16,10))
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
plt.xlabel('Date', fontsize=14)
plt.ylabel('Number of Posts', fontsize=14)
plt.ylim(0, 35000)
# plot the roll_df instead
for column in roll_df.columns:
plt.plot(roll_df.index, roll_df[column],
linewidth=3, label=roll_df[column].name)
plt.legend(fontsize=16)</pre><p>Now our chart looks something like this:</p><figure><img src="https://img-c.udemycdn.com/redactor/raw/2020-09-23_16-32-57-3c790c5a655c0c6960852aa8ee61e757.png"></figure><p>Play with the <code>window</code> argument (use <code>3</code> or <code>12</code>)&nbsp;and see how the chart changes!</p>