Time series forecasting is an important task for effective and efficient planning in many fields like finance, weather and energy. Traditional approaches like SARIMA models often require manual data pre-processing steps (e.g. differencing to make the data stationary) and it’s also hard to explain why these models produce the prediction results to people without forecasting expertise. In addition, these models are not allowed to add additional domain knowledge to improve precision. For solving these problems, Facebook researchers recently released FBProphet, a time series forecasting tool supporting both Python and R. FBProphet provides a decomposition regression model that is extendable and configurable with interpretable parameters.
In this blog, I will use FBProphet to forecast item demand using the data from the Kaggle competition “Store Item Demand Forecasting Challenge”.
Data analysis
Let’s start by importing the Python packages that we need.
import pandas as pd | |
from fbprophet import Prophet | |
from matplotlib import pyplot as plt |
view rawts1.py hosted with ❤ by GitHub
import pandas as pd | |
from fbprophet import Prophet | |
from matplotlib import pyplot as plt |
view rawts1.py hosted with ❤ by GitHub
Then, we load the train data using Pandas. You may need to change the path to where you put the train.csv
file.
train = pd.read_csv(‘../input/train.csv’) | |
train[‘date’] = pd.to_datetime(train[‘date’]) |
view rawts2.py hosted with ❤ by GitHub
train = pd.read_csv(‘../input/train.csv’) | |
train[‘date’] = pd.to_datetime(train[‘date’]) |
view rawts2.py hosted with ❤ by GitHub
As there are many items and stores, I will restrict the analysis to item 1 and store 1.
df = train[(train[‘item’] == 1) & (train[‘store’] == 1)][[‘date’, ‘sales’]] | |
df.rename(columns={‘date’: ‘ds’, ‘sales’: ‘y’}, inplace=True) | |
df.describe() |
view rawts3.py hosted with ❤ by GitHub
df = train[(train[‘item’] == 1) & (train[‘store’] == 1)][[‘date’, ‘sales’]] | |
df.rename(columns={‘date’: ‘ds’, ‘sales’: ‘y’}, inplace=True) | |
df.describe() |
view rawts3.py hosted with ❤ by GitHub
y | |
---|---|
count | 1826.000000 |
mean | 19.971522 |
std | 6.741022 |
min | 4.000000 |
25% | 15.000000 |
50% | 19.000000 |
75% | 24.000000 |
max | 50.000000 |
Ok, we have the data now. Let’s plot them.
fig, ax = plt.subplots(figsize=(15, 5)) | |
ax.plot(df[‘ds’], df[‘y’], linestyle=’None’, marker=’o’) | |
ax.set_xlabel(‘Date’) | |
ax.set_ylabel(‘Sales’); |
view rawts4.py hosted with ❤ by GitHub
fig, ax = plt.subplots(figsize=(15, 5)) | |
ax.plot(df[‘ds’], df[‘y’], linestyle=’None’, marker=’o’) | |
ax.set_xlabel(‘Date’) | |
ax.set_ylabel(‘Sales’); |
view rawts4.py hosted with ❤ by GitHub
As you can see, the dataset contains five years (2013-2018). We will use the last 180 days for testing and the rest for training. Then, we will plot them to double-check.
n_tests = 180 | |
df_train = df[:-n_tests] | |
df_test = df[-n_tests:] | |
fig, ax = plt.subplots(figsize=(15, 5)) | |
ax.plot(df_train[‘ds’], df_train[‘y’], linestyle=’None’, marker=’o’, color=’black’, label=’Train’) | |
ax.plot(df_test[‘ds’], df_test[‘y’], linestyle=’None’, marker=’o’, color=’red’, label=’Test’) | |
ax.legend() | |
ax.set_xlabel(‘Date’) | |
ax.set_ylabel(‘Sales’); |
view rawts5.py hosted with ❤ by GitHub
n_tests = 180 | |
df_train = df[:-n_tests] | |
df_test = df[-n_tests:] | |
fig, ax = plt.subplots(figsize=(15, 5)) | |
ax.plot(df_train[‘ds’], df_train[‘y’], linestyle=’None’, marker=’o’, color=’black’, label=’Train’) | |
ax.plot(df_test[‘ds’], df_test[‘y’], linestyle=’None’, marker=’o’, color=’red’, label=’Test’) | |
ax.legend() | |
ax.set_xlabel(‘Date’) | |
ax.set_ylabel(‘Sales’); |
view rawts5.py hosted with ❤ by GitHub
Forecasting with FBProphet
Let’s create a Prophet model, add the weekly, yearly components and then fit it with the train dataset.
model = Prophet( | |
daily_seasonality=False, | |
weekly_seasonality=False, | |
yearly_seasonality=False, | |
changepoint_prior_scale=0.05, | |
) | |
model.add_seasonality( | |
name=’weekly’, | |
period=7, | |
fourier_order=4, | |
) | |
model.add_seasonality( | |
name=’yearly’, | |
period=365.25, | |
fourier_order=2, | |
) | |
model.fit(df_train); |
view rawts6.py hosted with ❤ by GitHub
model = Prophet( | |
daily_seasonality=False, | |
weekly_seasonality=False, | |
yearly_seasonality=False, | |
changepoint_prior_scale=0.05, | |
) | |
model.add_seasonality( | |
name=’weekly’, | |
period=7, | |
fourier_order=4, | |
) | |
model.add_seasonality( | |
name=’yearly’, | |
period=365.25, | |
fourier_order=2, | |
) | |
model.fit(df_train); |
view rawts6.py hosted with ❤ by GitHub
Now, I can ask the model to predict the future on the testing dataset. I will ask it to estimate also the training points, so the model will predict for the whole dataset as follows.
forecast = model.predict(df) | |
forecast[[‘ds’, ‘yhat’]].head() |
view rawts7.py hosted with ❤ by GitHub
forecast = model.predict(df) | |
forecast[[‘ds’, ‘yhat’]].head() |
view rawts7.py hosted with ❤ by GitHub
ds | yhat | |
---|---|---|
0 | 2013-01-01 | 9.232334 |
1 | 2013-01-02 | 9.853309 |
2 | 2013-01-03 | 10.262132 |
3 | 2013-01-04 | 11.855213 |
4 | 2013-01-05 | 13.551500 |
FBProphet comes with a handy plot function allowing us to plot the three fitted components. Let’s plot them.
model.plot_components(forecast); |
view rawts8.py hosted with ❤ by GitHub
model.plot_components(forecast); |
view rawts8.py hosted with ❤ by GitHub
It can be seen from the trend component that the sales always increase from 2013 to 2018. Meanwhile, the yearly component shows that the sales peak is around June. Finally, the sales rise gradually from Monday to Sunday.
Let’s plot the forecast, train, and test datasets to validate this.
fig, ax = plt.subplots(figsize=(15, 5)) | |
ax.plot(df_train[‘ds’], df_train[‘y’], c=’black’, marker=’o’, ms=3, linestyle=’None’, label=’Train’) | |
ax.plot(df_test[‘ds’], df_test[‘y’], c=’r’, marker=’o’, ms=3, linestyle=’None’, label=’Test’) | |
ax.plot(forecast[‘ds’], forecast[‘yhat’], c=’b’, marker=’o’, ms=3, linestyle=’None’, label=’Forecast’, alpha=0.5) | |
ax.legend() | |
ax.set_xlabel(‘Date’) | |
ax.set_ylabel(‘Sales’); |
view rawts9.py hosted with ❤ by GitHub
fig, ax = plt.subplots(figsize=(15, 5)) | |
ax.plot(df_train[‘ds’], df_train[‘y’], c=’black’, marker=’o’, ms=3, linestyle=’None’, label=’Train’) | |
ax.plot(df_test[‘ds’], df_test[‘y’], c=’r’, marker=’o’, ms=3, linestyle=’None’, label=’Test’) | |
ax.plot(forecast[‘ds’], forecast[‘yhat’], c=’b’, marker=’o’, ms=3, linestyle=’None’, label=’Forecast’, alpha=0.5) | |
ax.legend() | |
ax.set_xlabel(‘Date’) | |
ax.set_ylabel(‘Sales’); |
view rawts9.py hosted with ❤ by GitHub
It can be observed from the above plot that the down trend during the last 6 months is captured correctly and the forecast values have 7 patterns (or 7 different cosine-like lines). Those are corresponding to 7 weekdays. Let’s plot the Monday and Sunday datapoints of the actual (including train and test) and forecast dataset.
fig, ax = plt.subplots(figsize=(15, 5)) | |
forecast[‘weekday’] = forecast[‘ds’].dt.weekday | |
df[‘weekday’] = df[‘ds’].dt.weekday | |
colors = [‘r’, ‘g’, ‘yellow’, ‘pink’, ‘purple’, ‘cyan’, ‘blue’] | |
weekdays = [‘Mon’, ‘Tue’, ‘Wed’, ‘Thu’, ‘Fri’, ‘Sat’, ‘Sun’] | |
for wd in [0, 6]: | |
fc_wd = forecast[forecast[‘weekday’] == wd] | |
ax.plot( | |
fc_wd[‘ds’], fc_wd[‘yhat’], | |
c=colors[wd], marker=’o’, ms=5, linestyle=’None’, | |
label=f’Forecast-{weekdays[wd]}’) | |
df_wd = df[df[‘weekday’] == wd] | |
ax.plot( | |
df_wd[‘ds’], df_wd[‘y’], | |
c=colors[wd], marker=’^’, ms=5, linestyle=’None’, | |
label=f’Actual-{weekdays[wd]}’ | |
) | |
ax.legend() | |
ax.set_xlabel(‘Date’) | |
ax.set_ylabel(‘Sales’); |
view rawts10.py hosted with ❤ by GitHub
fig, ax = plt.subplots(figsize=(15, 5)) | |
forecast[‘weekday’] = forecast[‘ds’].dt.weekday | |
df[‘weekday’] = df[‘ds’].dt.weekday | |
colors = [‘r’, ‘g’, ‘yellow’, ‘pink’, ‘purple’, ‘cyan’, ‘blue’] | |
weekdays = [‘Mon’, ‘Tue’, ‘Wed’, ‘Thu’, ‘Fri’, ‘Sat’, ‘Sun’] | |
for wd in [0, 6]: | |
fc_wd = forecast[forecast[‘weekday’] == wd] | |
ax.plot( | |
fc_wd[‘ds’], fc_wd[‘yhat’], | |
c=colors[wd], marker=’o’, ms=5, linestyle=’None’, | |
label=f’Forecast-{weekdays[wd]}’) | |
df_wd = df[df[‘weekday’] == wd] | |
ax.plot( | |
df_wd[‘ds’], df_wd[‘y’], | |
c=colors[wd], marker=’^’, ms=5, linestyle=’None’, | |
label=f’Actual-{weekdays[wd]}’ | |
) | |
ax.legend() | |
ax.set_xlabel(‘Date’) | |
ax.set_ylabel(‘Sales’); |
view rawts10.py hosted with ❤ by GitHub
It can be seen that the Monday and Sunday datapoints are separated and FBProphet captures this property correctly. I can now plot all weekdays to gain some insights.
fig, ax = plt.subplots(figsize=(15, 5)) | |
ax.plot( | |
df[‘ds’], df[‘y’], | |
c=’black’, marker=’o’, ms=3, linestyle=’None’, | |
label=f’Actual’ | |
) | |
forecast[‘weekday’] = forecast[‘ds’].dt.weekday | |
colors = [‘r’, ‘g’, ‘blue’, ‘pink’, ‘purple’, ‘cyan’, ‘orange’] | |
weekdays = [‘Mon’, ‘Tue’, ‘Wed’, ‘Thu’, ‘Fri’, ‘Sat’, ‘Sun’] | |
for wd in range(7): | |
fc_wd = forecast[forecast[‘weekday’] == wd] | |
ax.plot( | |
fc_wd[‘ds’], fc_wd[‘yhat’], | |
c=colors[wd], marker=’o’, ms=3, linestyle=’None’, | |
label=f’Forecast-{weekdays[wd]}’, alpha=0.8) | |
ax.legend() | |
ax.set_xlabel(‘Date’) | |
ax.set_ylabel(‘Sales’); |
view rawts11.py hosted with ❤ by GitHub
fig, ax = plt.subplots(figsize=(15, 5)) | |
ax.plot( | |
df[‘ds’], df[‘y’], | |
c=’black’, marker=’o’, ms=3, linestyle=’None’, | |
label=f’Actual’ | |
) | |
forecast[‘weekday’] = forecast[‘ds’].dt.weekday | |
colors = [‘r’, ‘g’, ‘blue’, ‘pink’, ‘purple’, ‘cyan’, ‘orange’] | |
weekdays = [‘Mon’, ‘Tue’, ‘Wed’, ‘Thu’, ‘Fri’, ‘Sat’, ‘Sun’] | |
for wd in range(7): | |
fc_wd = forecast[forecast[‘weekday’] == wd] | |
ax.plot( | |
fc_wd[‘ds’], fc_wd[‘yhat’], | |
c=colors[wd], marker=’o’, ms=3, linestyle=’None’, | |
label=f’Forecast-{weekdays[wd]}’, alpha=0.8) | |
ax.legend() | |
ax.set_xlabel(‘Date’) | |
ax.set_ylabel(‘Sales’); |
view rawts11.py hosted with ❤ by GitHub
Finally, FBProphet SMAPE on this dataset is 18% and calculated as below. To improve the precision, we need to tune FBProphet hyperparameters and add additional regressors.
y_true = df_test[‘y’] | |
y_forecast = forecast[-n_tests:][‘yhat’] | |
smape = ((y_true – y_forecast).abs() / (y_true.abs() + y_forecast.abs())).mean() * 200 | |
print(‘The SMAPE error is:’, smape) |
view rawts12.py hosted with ❤ by GitHub
y_true = df_test[‘y’] | |
y_forecast = forecast[-n_tests:][‘yhat’] | |
smape = ((y_true – y_forecast).abs() / (y_true.abs() + y_forecast.abs())).mean() * 200 | |
print(‘The SMAPE error is:’, smape) |
view rawts12.py hosted with ❤ by GitHub
The SMAPE error is: 18.283400082248754
Conclusion
FBProphet is very flexible as it allows you to add multiple seasonal components and additional regressors. Its component results are interpretable, hence users can intuitively tune the parameters to improve the performance. In addition, unlike SARIMA models, FBProphet does not require regularly spaced data points, therefore users do not need to interpolate missing data, e.g. when removing outliers. Finally, FBProphet uses STAN to fit the model, and STAN still has not supported GPU yet, so FBProphet does not scale up when fed with many data points.