Time Series Forecasting with FBProphet

Time series forecasting is an important task for effective and efficient planning in many fields like finance, weather and energy. Traditional approaches like SARIMA models often require manual data pre-processing steps (e.g. differencing to make the data stationary) and it’s also hard to explain why these models produce the prediction results to people without forecasting expertise. In addition, these models are not allowed to add additional domain knowledge to improve precision. For solving these problems, Facebook researchers recently released FBProphet, a time series forecasting tool supporting both Python and R. FBProphet provides a decomposition regression model that is extendable and configurable with interpretable parameters.

In this blog, I will use FBProphet to forecast item demand using the data from the Kaggle competition “Store Item Demand Forecasting Challenge”

Data analysis

Let’s start by importing the Python packages that we need. 

import pandas as pd
from fbprophet import Prophet
from matplotlib import pyplot as plt

view rawts1.py hosted with ❤ by GitHub

import pandas as pd
from fbprophet import Prophet
from matplotlib import pyplot as plt

view rawts1.py hosted with ❤ by GitHub

Then, we load the train data using Pandas. You may need to change the path to where you put the train.csv file.

train = pd.read_csv(‘../input/train.csv’)
train[‘date’] = pd.to_datetime(train[‘date’])

view rawts2.py hosted with ❤ by GitHub

train = pd.read_csv(‘../input/train.csv’)
train[‘date’] = pd.to_datetime(train[‘date’])

view rawts2.py hosted with ❤ by GitHub

As there are many items and stores, I will restrict the analysis to item 1 and store 1.

df = train[(train[‘item’] == 1) & (train[‘store’] == 1)][[‘date’, ‘sales’]]
df.rename(columns={‘date’: ‘ds’, ‘sales’: ‘y’}, inplace=True)
df.describe()

view rawts3.py hosted with ❤ by GitHub

df = train[(train[‘item’] == 1) & (train[‘store’] == 1)][[‘date’, ‘sales’]]
df.rename(columns={‘date’: ‘ds’, ‘sales’: ‘y’}, inplace=True)
df.describe()

view rawts3.py hosted with ❤ by GitHub

y
count1826.000000
mean19.971522
std6.741022
min4.000000
25%15.000000
50%19.000000
75%24.000000
max50.000000

Ok, we have the data now. Let’s plot them.

fig, ax = plt.subplots(figsize=(15, 5))
ax.plot(df[‘ds’], df[‘y’], linestyle=’None’, marker=’o’)
ax.set_xlabel(‘Date’)
ax.set_ylabel(‘Sales’);

view rawts4.py hosted with ❤ by GitHub

fig, ax = plt.subplots(figsize=(15, 5))
ax.plot(df[‘ds’], df[‘y’], linestyle=’None’, marker=’o’)
ax.set_xlabel(‘Date’)
ax.set_ylabel(‘Sales’);

view rawts4.py hosted with ❤ by GitHub

As you can see, the dataset contains five years (2013-2018). We will use the last 180 days for testing and the rest for training. Then, we will plot them to double-check.

n_tests = 180
df_train = df[:-n_tests]
df_test = df[-n_tests:]
fig, ax = plt.subplots(figsize=(15, 5))
ax.plot(df_train[‘ds’], df_train[‘y’], linestyle=’None’, marker=’o’, color=’black’, label=’Train’)
ax.plot(df_test[‘ds’], df_test[‘y’], linestyle=’None’, marker=’o’, color=’red’, label=’Test’)
ax.legend()
ax.set_xlabel(‘Date’)
ax.set_ylabel(‘Sales’);

view rawts5.py hosted with ❤ by GitHub

n_tests = 180
df_train = df[:-n_tests]
df_test = df[-n_tests:]
fig, ax = plt.subplots(figsize=(15, 5))
ax.plot(df_train[‘ds’], df_train[‘y’], linestyle=’None’, marker=’o’, color=’black’, label=’Train’)
ax.plot(df_test[‘ds’], df_test[‘y’], linestyle=’None’, marker=’o’, color=’red’, label=’Test’)
ax.legend()
ax.set_xlabel(‘Date’)
ax.set_ylabel(‘Sales’);

view rawts5.py hosted with ❤ by GitHub

Forecasting with FBProphet

Let’s create a Prophet model, add the weekly, yearly components and then fit it with the train dataset.

model = Prophet(
daily_seasonality=False,
weekly_seasonality=False,
yearly_seasonality=False,
changepoint_prior_scale=0.05,
)
model.add_seasonality(
name=’weekly’,
period=7,
fourier_order=4,
)
model.add_seasonality(
name=’yearly’,
period=365.25,
fourier_order=2,
)
model.fit(df_train);

view rawts6.py hosted with ❤ by GitHub

model = Prophet(
daily_seasonality=False,
weekly_seasonality=False,
yearly_seasonality=False,
changepoint_prior_scale=0.05,
)
model.add_seasonality(
name=’weekly’,
period=7,
fourier_order=4,
)
model.add_seasonality(
name=’yearly’,
period=365.25,
fourier_order=2,
)
model.fit(df_train);

view rawts6.py hosted with ❤ by GitHub

Now, I can ask the model to predict the future on the testing dataset. I will ask it to estimate also the training points, so the model will predict for the whole dataset as follows.

forecast = model.predict(df)
forecast[[‘ds’, ‘yhat’]].head()

view rawts7.py hosted with ❤ by GitHub

forecast = model.predict(df)
forecast[[‘ds’, ‘yhat’]].head()

view rawts7.py hosted with ❤ by GitHub

dsyhat
02013-01-019.232334
12013-01-029.853309
22013-01-0310.262132
32013-01-0411.855213
42013-01-0513.551500

FBProphet comes with a handy plot function allowing us to plot the three fitted components. Let’s plot them.

model.plot_components(forecast);

view rawts8.py hosted with ❤ by GitHub

model.plot_components(forecast);

view rawts8.py hosted with ❤ by GitHub

It can be seen from the trend component that the sales always increase from 2013 to 2018. Meanwhile, the yearly component shows that the sales peak is around June. Finally, the sales rise gradually from Monday to Sunday.

Let’s plot the forecast, train, and test datasets to validate this.

fig, ax = plt.subplots(figsize=(15, 5))
ax.plot(df_train[‘ds’], df_train[‘y’], c=’black’, marker=’o’, ms=3, linestyle=’None’, label=’Train’)
ax.plot(df_test[‘ds’], df_test[‘y’], c=’r’, marker=’o’, ms=3, linestyle=’None’, label=’Test’)
ax.plot(forecast[‘ds’], forecast[‘yhat’], c=’b’, marker=’o’, ms=3, linestyle=’None’, label=’Forecast’, alpha=0.5)
ax.legend()
ax.set_xlabel(‘Date’)
ax.set_ylabel(‘Sales’);

view rawts9.py hosted with ❤ by GitHub

fig, ax = plt.subplots(figsize=(15, 5))
ax.plot(df_train[‘ds’], df_train[‘y’], c=’black’, marker=’o’, ms=3, linestyle=’None’, label=’Train’)
ax.plot(df_test[‘ds’], df_test[‘y’], c=’r’, marker=’o’, ms=3, linestyle=’None’, label=’Test’)
ax.plot(forecast[‘ds’], forecast[‘yhat’], c=’b’, marker=’o’, ms=3, linestyle=’None’, label=’Forecast’, alpha=0.5)
ax.legend()
ax.set_xlabel(‘Date’)
ax.set_ylabel(‘Sales’);

view rawts9.py hosted with ❤ by GitHub

It can be observed from the above plot that the down trend during the last 6 months is captured correctly and the forecast values have 7 patterns (or 7 different cosine-like lines). Those are corresponding to 7 weekdays. Let’s plot the Monday and Sunday datapoints of the actual (including train and test) and forecast dataset.

fig, ax = plt.subplots(figsize=(15, 5))
forecast[‘weekday’] = forecast[‘ds’].dt.weekday
df[‘weekday’] = df[‘ds’].dt.weekday
colors = [‘r’, ‘g’, ‘yellow’, ‘pink’, ‘purple’, ‘cyan’, ‘blue’]
weekdays = [‘Mon’, ‘Tue’, ‘Wed’, ‘Thu’, ‘Fri’, ‘Sat’, ‘Sun’]
for wd in [0, 6]:
fc_wd = forecast[forecast[‘weekday’] == wd]
ax.plot(
fc_wd[‘ds’], fc_wd[‘yhat’],
c=colors[wd], marker=’o’, ms=5, linestyle=’None’,
label=f’Forecast-{weekdays[wd]}’)
df_wd = df[df[‘weekday’] == wd]
ax.plot(
df_wd[‘ds’], df_wd[‘y’],
c=colors[wd], marker=’^’, ms=5, linestyle=’None’,
label=f’Actual-{weekdays[wd]}’
)
ax.legend()
ax.set_xlabel(‘Date’)
ax.set_ylabel(‘Sales’);

view rawts10.py hosted with ❤ by GitHub

fig, ax = plt.subplots(figsize=(15, 5))
forecast[‘weekday’] = forecast[‘ds’].dt.weekday
df[‘weekday’] = df[‘ds’].dt.weekday
colors = [‘r’, ‘g’, ‘yellow’, ‘pink’, ‘purple’, ‘cyan’, ‘blue’]
weekdays = [‘Mon’, ‘Tue’, ‘Wed’, ‘Thu’, ‘Fri’, ‘Sat’, ‘Sun’]
for wd in [0, 6]:
fc_wd = forecast[forecast[‘weekday’] == wd]
ax.plot(
fc_wd[‘ds’], fc_wd[‘yhat’],
c=colors[wd], marker=’o’, ms=5, linestyle=’None’,
label=f’Forecast-{weekdays[wd]}’)
df_wd = df[df[‘weekday’] == wd]
ax.plot(
df_wd[‘ds’], df_wd[‘y’],
c=colors[wd], marker=’^’, ms=5, linestyle=’None’,
label=f’Actual-{weekdays[wd]}’
)
ax.legend()
ax.set_xlabel(‘Date’)
ax.set_ylabel(‘Sales’);

view rawts10.py hosted with ❤ by GitHub

It can be seen that the Monday and Sunday datapoints are separated and FBProphet captures this property correctly. I can now plot all weekdays to gain some insights.

fig, ax = plt.subplots(figsize=(15, 5))
ax.plot(
df[‘ds’], df[‘y’],
c=’black’, marker=’o’, ms=3, linestyle=’None’,
label=f’Actual’
)
forecast[‘weekday’] = forecast[‘ds’].dt.weekday
colors = [‘r’, ‘g’, ‘blue’, ‘pink’, ‘purple’, ‘cyan’, ‘orange’]
weekdays = [‘Mon’, ‘Tue’, ‘Wed’, ‘Thu’, ‘Fri’, ‘Sat’, ‘Sun’]
for wd in range(7):
fc_wd = forecast[forecast[‘weekday’] == wd]
ax.plot(
fc_wd[‘ds’], fc_wd[‘yhat’],
c=colors[wd], marker=’o’, ms=3, linestyle=’None’,
label=f’Forecast-{weekdays[wd]}’, alpha=0.8)
ax.legend()
ax.set_xlabel(‘Date’)
ax.set_ylabel(‘Sales’);

view rawts11.py hosted with ❤ by GitHub

fig, ax = plt.subplots(figsize=(15, 5))
ax.plot(
df[‘ds’], df[‘y’],
c=’black’, marker=’o’, ms=3, linestyle=’None’,
label=f’Actual’
)
forecast[‘weekday’] = forecast[‘ds’].dt.weekday
colors = [‘r’, ‘g’, ‘blue’, ‘pink’, ‘purple’, ‘cyan’, ‘orange’]
weekdays = [‘Mon’, ‘Tue’, ‘Wed’, ‘Thu’, ‘Fri’, ‘Sat’, ‘Sun’]
for wd in range(7):
fc_wd = forecast[forecast[‘weekday’] == wd]
ax.plot(
fc_wd[‘ds’], fc_wd[‘yhat’],
c=colors[wd], marker=’o’, ms=3, linestyle=’None’,
label=f’Forecast-{weekdays[wd]}’, alpha=0.8)
ax.legend()
ax.set_xlabel(‘Date’)
ax.set_ylabel(‘Sales’);

view rawts11.py hosted with ❤ by GitHub

Finally, FBProphet SMAPE on this dataset is 18% and calculated as below. To improve the precision, we need to tune FBProphet hyperparameters and add additional regressors.

y_true = df_test[‘y’]
y_forecast = forecast[-n_tests:][‘yhat’]
smape = ((y_true – y_forecast).abs() / (y_true.abs() + y_forecast.abs())).mean() * 200
print(‘The SMAPE error is:’, smape)

view rawts12.py hosted with ❤ by GitHub

y_true = df_test[‘y’]
y_forecast = forecast[-n_tests:][‘yhat’]
smape = ((y_true – y_forecast).abs() / (y_true.abs() + y_forecast.abs())).mean() * 200
print(‘The SMAPE error is:’, smape)

view rawts12.py hosted with ❤ by GitHub

The SMAPE error is: 18.283400082248754

Conclusion

FBProphet is very flexible as it allows you to add multiple seasonal components and additional regressors. Its component results are interpretable, hence users can intuitively tune the parameters to improve the performance. In addition, unlike SARIMA models, FBProphet does not require regularly spaced data points, therefore users do not need to interpolate missing data, e.g. when removing outliers. Finally, FBProphet uses STAN to fit the model, and STAN still has not supported GPU yet, so FBProphet does not scale up when fed with many data points.

Want to know more about how DiUS can help you?

Offices

Melbourne

Level 3, 31 Queen St
Melbourne, Victoria, 3000
Phone: 03 9008 5400

Sydney
Level 2, 50 York St
Sydney, NSW, 2000
Phone: 02 8014 6640

DiUS wishes to acknowledge the Traditional Custodians of the lands on which we work and gather at both our Melbourne and Sydney offices. We pay respect to Elders past, present and emerging and celebrate the diversity of Aboriginal peoples and their ongoing cultures and connections to the lands and waters of Australia.

Relay Newsletter

Sign up to receive the latest news, insights and case studies from DiUS straight into your inbox.

Subscribe

* indicates required

© 2021 DiUS®. All rights reserved.

Privacy  |  Terms