Time series forecasting is an important task for effective and efficient planning in many fields like finance, weather and energy. Traditional approaches like SARIMA models often require manual data pre-processing steps (e.g. differencing to make the data stationary) and it’s also hard to explain why these models produce the prediction results to people without forecasting expertise. In addition, these models are not allowed to add additional domain knowledge to improve precision. For solving these problems, Facebook researchers recently released FBProphet, a time series forecasting tool supporting both Python and R. FBProphet provides a decomposition regression model that is extendable and configurable with interpretable parameters.

In this blog, I will use FBProphet to forecast item demand using the data from the Kaggle competition “Store Item Demand Forecasting Challenge”

Data analysis

Let’s start by importing the Python packages that we need. 

 import pandas as pd
 from fbprophet import Prophet
 from matplotlib import pyplot as plt
view raw ts1.py hosted with ❤ by GitHub
 import pandas as pd
 from fbprophet import Prophet
 from matplotlib import pyplot as plt
view raw ts1.py hosted with ❤ by GitHub

Then, we load the train data using Pandas. You may need to change the path to where you put the train.csv file.

 train = pd.read_csv('../input/train.csv')
 train['date'] = pd.to_datetime(train['date'])
view raw ts2.py hosted with ❤ by GitHub
 train = pd.read_csv('../input/train.csv')
 train['date'] = pd.to_datetime(train['date'])
view raw ts2.py hosted with ❤ by GitHub

As there are many items and stores, I will restrict the analysis to item 1 and store 1.

 df = train[(train['item'] == 1) & (train['store'] == 1)][['date', 'sales']]
 df.rename(columns={'date': 'ds', 'sales': 'y'}, inplace=True)
 df.describe()
view raw ts3.py hosted with ❤ by GitHub
 df = train[(train['item'] == 1) & (train['store'] == 1)][['date', 'sales']]
 df.rename(columns={'date': 'ds', 'sales': 'y'}, inplace=True)
 df.describe()
view raw ts3.py hosted with ❤ by GitHub
 y
count1826.000000
mean19.971522
std6.741022
min4.000000
25%15.000000
50%19.000000
75%24.000000
max50.000000

Ok, we have the data now. Let’s plot them.

 fig, ax = plt.subplots(figsize=(15, 5))
 ax.plot(df['ds'], df['y'], linestyle='None', marker='o')
 ax.set_xlabel('Date')
 ax.set_ylabel('Sales');
view raw ts4.py hosted with ❤ by GitHub
 fig, ax = plt.subplots(figsize=(15, 5))
 ax.plot(df['ds'], df['y'], linestyle='None', marker='o')
 ax.set_xlabel('Date')
 ax.set_ylabel('Sales');
view raw ts4.py hosted with ❤ by GitHub

As you can see, the dataset contains five years (2013-2018). We will use the last 180 days for testing and the rest for training. Then, we will plot them to double-check.

 n_tests = 180
 df_train = df[:-n_tests]
 df_test = df[-n_tests:]
 fig, ax = plt.subplots(figsize=(15, 5))
 ax.plot(df_train['ds'], df_train['y'], linestyle='None', marker='o', color='black', label='Train')
 ax.plot(df_test['ds'], df_test['y'], linestyle='None', marker='o', color='red', label='Test')
 ax.legend()
 ax.set_xlabel('Date')
 ax.set_ylabel('Sales');
view raw ts5.py hosted with ❤ by GitHub
 n_tests = 180
 df_train = df[:-n_tests]
 df_test = df[-n_tests:]
 fig, ax = plt.subplots(figsize=(15, 5))
 ax.plot(df_train['ds'], df_train['y'], linestyle='None', marker='o', color='black', label='Train')
 ax.plot(df_test['ds'], df_test['y'], linestyle='None', marker='o', color='red', label='Test')
 ax.legend()
 ax.set_xlabel('Date')
 ax.set_ylabel('Sales');
view raw ts5.py hosted with ❤ by GitHub

Forecasting with FBProphet

Let’s create a Prophet model, add the weekly, yearly components and then fit it with the train dataset.

 model = Prophet(
 daily_seasonality=False,
 weekly_seasonality=False,
 yearly_seasonality=False,
 changepoint_prior_scale=0.05,
 )
  
 model.add_seasonality(
 name='weekly',
 period=7,
 fourier_order=4,
 )
  
 model.add_seasonality(
 name='yearly',
 period=365.25,
 fourier_order=2,
 )
  
 model.fit(df_train);
view raw ts6.py hosted with ❤ by GitHub
 model = Prophet(
 daily_seasonality=False,
 weekly_seasonality=False,
 yearly_seasonality=False,
 changepoint_prior_scale=0.05,
 )
  
 model.add_seasonality(
 name='weekly',
 period=7,
 fourier_order=4,
 )
  
 model.add_seasonality(
 name='yearly',
 period=365.25,
 fourier_order=2,
 )
  
 model.fit(df_train);
view raw ts6.py hosted with ❤ by GitHub

Now, I can ask the model to predict the future on the testing dataset. I will ask it to estimate also the training points, so the model will predict for the whole dataset as follows.

 forecast = model.predict(df)
 forecast[['ds', 'yhat']].head()
view raw ts7.py hosted with ❤ by GitHub
 forecast = model.predict(df)
 forecast[['ds', 'yhat']].head()
view raw ts7.py hosted with ❤ by GitHub
 dsyhat
02013-01-019.232334
12013-01-029.853309
22013-01-0310.262132
32013-01-0411.855213
42013-01-0513.551500

FBProphet comes with a handy plot function allowing us to plot the three fitted components. Let’s plot them.

 model.plot_components(forecast);
view raw ts8.py hosted with ❤ by GitHub
 model.plot_components(forecast);
view raw ts8.py hosted with ❤ by GitHub

It can be seen from the trend component that the sales always increase from 2013 to 2018. Meanwhile, the yearly component shows that the sales peak is around June. Finally, the sales rise gradually from Monday to Sunday.

Let’s plot the forecast, train, and test datasets to validate this.

 fig, ax = plt.subplots(figsize=(15, 5))
 ax.plot(df_train['ds'], df_train['y'], c='black', marker='o', ms=3, linestyle='None', label='Train')
 ax.plot(df_test['ds'], df_test['y'], c='r', marker='o', ms=3, linestyle='None', label='Test')
 ax.plot(forecast['ds'], forecast['yhat'], c='b', marker='o', ms=3, linestyle='None', label='Forecast', alpha=0.5)
 ax.legend()
 ax.set_xlabel('Date')
 ax.set_ylabel('Sales');
view raw ts9.py hosted with ❤ by GitHub
 fig, ax = plt.subplots(figsize=(15, 5))
 ax.plot(df_train['ds'], df_train['y'], c='black', marker='o', ms=3, linestyle='None', label='Train')
 ax.plot(df_test['ds'], df_test['y'], c='r', marker='o', ms=3, linestyle='None', label='Test')
 ax.plot(forecast['ds'], forecast['yhat'], c='b', marker='o', ms=3, linestyle='None', label='Forecast', alpha=0.5)
 ax.legend()
 ax.set_xlabel('Date')
 ax.set_ylabel('Sales');
view raw ts9.py hosted with ❤ by GitHub

It can be observed from the above plot that the down trend during the last 6 months is captured correctly and the forecast values have 7 patterns (or 7 different cosine-like lines). Those are corresponding to 7 weekdays. Let’s plot the Monday and Sunday datapoints of the actual (including train and test) and forecast dataset.

 fig, ax = plt.subplots(figsize=(15, 5))
 forecast['weekday'] = forecast['ds'].dt.weekday
 df['weekday'] = df['ds'].dt.weekday
 colors = ['r', 'g', 'yellow', 'pink', 'purple', 'cyan', 'blue']
 weekdays = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
 for wd in [0, 6]:
 fc_wd = forecast[forecast['weekday'] == wd]
 ax.plot(
 fc_wd['ds'], fc_wd['yhat'],
 c=colors[wd], marker='o', ms=5, linestyle='None',
 label=f'Forecast-{weekdays[wd]}')
 df_wd = df[df['weekday'] == wd]
 ax.plot(
 df_wd['ds'], df_wd['y'],
 c=colors[wd], marker='^', ms=5, linestyle='None',
 label=f'Actual-{weekdays[wd]}'
 )
 ax.legend()
 ax.set_xlabel('Date')
 ax.set_ylabel('Sales');
view raw ts10.py hosted with ❤ by GitHub
 fig, ax = plt.subplots(figsize=(15, 5))
 forecast['weekday'] = forecast['ds'].dt.weekday
 df['weekday'] = df['ds'].dt.weekday
 colors = ['r', 'g', 'yellow', 'pink', 'purple', 'cyan', 'blue']
 weekdays = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
 for wd in [0, 6]:
 fc_wd = forecast[forecast['weekday'] == wd]
 ax.plot(
 fc_wd['ds'], fc_wd['yhat'],
 c=colors[wd], marker='o', ms=5, linestyle='None',
 label=f'Forecast-{weekdays[wd]}')
 df_wd = df[df['weekday'] == wd]
 ax.plot(
 df_wd['ds'], df_wd['y'],
 c=colors[wd], marker='^', ms=5, linestyle='None',
 label=f'Actual-{weekdays[wd]}'
 )
 ax.legend()
 ax.set_xlabel('Date')
 ax.set_ylabel('Sales');
view raw ts10.py hosted with ❤ by GitHub

It can be seen that the Monday and Sunday datapoints are separated and FBProphet captures this property correctly. I can now plot all weekdays to gain some insights.

 fig, ax = plt.subplots(figsize=(15, 5))
 ax.plot(
 df['ds'], df['y'],
 c='black', marker='o', ms=3, linestyle='None',
 label=f'Actual'
 )
  
 forecast['weekday'] = forecast['ds'].dt.weekday
 colors = ['r', 'g', 'blue', 'pink', 'purple', 'cyan', 'orange']
 weekdays = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
 for wd in range(7):
 fc_wd = forecast[forecast['weekday'] == wd]
 ax.plot(
 fc_wd['ds'], fc_wd['yhat'],
 c=colors[wd], marker='o', ms=3, linestyle='None',
 label=f'Forecast-{weekdays[wd]}', alpha=0.8)
  
 ax.legend()
 ax.set_xlabel('Date')
 ax.set_ylabel('Sales');
view raw ts11.py hosted with ❤ by GitHub
 fig, ax = plt.subplots(figsize=(15, 5))
 ax.plot(
 df['ds'], df['y'],
 c='black', marker='o', ms=3, linestyle='None',
 label=f'Actual'
 )
  
 forecast['weekday'] = forecast['ds'].dt.weekday
 colors = ['r', 'g', 'blue', 'pink', 'purple', 'cyan', 'orange']
 weekdays = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
 for wd in range(7):
 fc_wd = forecast[forecast['weekday'] == wd]
 ax.plot(
 fc_wd['ds'], fc_wd['yhat'],
 c=colors[wd], marker='o', ms=3, linestyle='None',
 label=f'Forecast-{weekdays[wd]}', alpha=0.8)
  
 ax.legend()
 ax.set_xlabel('Date')
 ax.set_ylabel('Sales');
view raw ts11.py hosted with ❤ by GitHub

Finally, FBProphet SMAPE on this dataset is 18% and calculated as below. To improve the precision, we need to tune FBProphet hyperparameters and add additional regressors.

 y_true = df_test['y']
 y_forecast = forecast[-n_tests:]['yhat']
 smape = ((y_true - y_forecast).abs() / (y_true.abs() + y_forecast.abs())).mean() * 200
 print('The SMAPE error is:', smape)
view raw ts12.py hosted with ❤ by GitHub
 y_true = df_test['y']
 y_forecast = forecast[-n_tests:]['yhat']
 smape = ((y_true - y_forecast).abs() / (y_true.abs() + y_forecast.abs())).mean() * 200
 print('The SMAPE error is:', smape)
view raw ts12.py hosted with ❤ by GitHub

The SMAPE error is: 18.283400082248754

Conclusion

FBProphet is very flexible as it allows you to add multiple seasonal components and additional regressors. Its component results are interpretable, hence users can intuitively tune the parameters to improve the performance. In addition, unlike SARIMA models, FBProphet does not require regularly spaced data points, therefore users do not need to interpolate missing data, e.g. when removing outliers. Finally, FBProphet uses STAN to fit the model, and STAN still has not supported GPU yet, so FBProphet does not scale up when fed with many data points.