Time Series Forecasting with FBProphet

Time series forecasting is an important task for effective and efficient planning in many fields like finance, weather and energy. Traditional approaches like SARIMA models often require manual data pre-processing steps (e.g. differencing to make the data stationary) and it’s also hard to explain why these models produce the prediction results to people without forecasting expertise. In addition, these models are not allowed to add additional domain knowledge to improve precision. For solving these problems, Facebook researchers recently released FBProphet, a time series forecasting tool supporting both Python and R. FBProphet provides a decomposition regression model that is extendable and configurable with interpretable parameters.

In this blog, I will use FBProphet to forecast item demand using the data from the Kaggle competition “Store Item Demand Forecasting Challenge”.

Data analysis

Let’s start by importing the Python packages that we need.

	import pandas as pd
	from fbprophet import Prophet
	from matplotlib import pyplot as plt

view raw ts1.py hosted with ❤ by GitHub

	import pandas as pd
	from fbprophet import Prophet
	from matplotlib import pyplot as plt

view raw ts1.py hosted with ❤ by GitHub

Then, we load the train data using Pandas. You may need to change the path to where you put the train.csv file.

	train = pd.read_csv(‘../input/train.csv’)
	train[‘date’] = pd.to_datetime(train[‘date’])

view raw ts2.py hosted with ❤ by GitHub

	train = pd.read_csv(‘../input/train.csv’)
	train[‘date’] = pd.to_datetime(train[‘date’])

view raw ts2.py hosted with ❤ by GitHub

As there are many items and stores, I will restrict the analysis to item 1 and store 1.

	df = train[(train[‘item’] == 1) & (train[‘store’] == 1)][[‘date’, ‘sales’]]
	df.rename(columns={‘date’: ‘ds’, ‘sales’: ‘y’}, inplace=True)
	df.describe()

view raw ts3.py hosted with ❤ by GitHub

	df = train[(train[‘item’] == 1) & (train[‘store’] == 1)][[‘date’, ‘sales’]]
	df.rename(columns={‘date’: ‘ds’, ‘sales’: ‘y’}, inplace=True)
	df.describe()

view raw ts3.py hosted with ❤ by GitHub

	y
count	1826.000000
mean	19.971522
std	6.741022
min	4.000000
25%	15.000000
50%	19.000000
75%	24.000000
max	50.000000

Ok, we have the data now. Let’s plot them.

	fig, ax = plt.subplots(figsize=(15, 5))
	ax.plot(df[‘ds’], df[‘y’], linestyle=’None’, marker=’o’)
	ax.set_xlabel(‘Date’)
	ax.set_ylabel(‘Sales’);

view raw ts4.py hosted with ❤ by GitHub

	fig, ax = plt.subplots(figsize=(15, 5))
	ax.plot(df[‘ds’], df[‘y’], linestyle=’None’, marker=’o’)
	ax.set_xlabel(‘Date’)
	ax.set_ylabel(‘Sales’);

view raw ts4.py hosted with ❤ by GitHub

As you can see, the dataset contains five years (2013-2018). We will use the last 180 days for testing and the rest for training. Then, we will plot them to double-check.

	n_tests = 180
	df_train = df[:-n_tests]
	df_test = df[-n_tests:]
	fig, ax = plt.subplots(figsize=(15, 5))
	ax.plot(df_train[‘ds’], df_train[‘y’], linestyle=’None’, marker=’o’, color=’black’, label=’Train’)
	ax.plot(df_test[‘ds’], df_test[‘y’], linestyle=’None’, marker=’o’, color=’red’, label=’Test’)
	ax.legend()
	ax.set_xlabel(‘Date’)
	ax.set_ylabel(‘Sales’);

view raw ts5.py hosted with ❤ by GitHub

	n_tests = 180
	df_train = df[:-n_tests]
	df_test = df[-n_tests:]
	fig, ax = plt.subplots(figsize=(15, 5))
	ax.plot(df_train[‘ds’], df_train[‘y’], linestyle=’None’, marker=’o’, color=’black’, label=’Train’)
	ax.plot(df_test[‘ds’], df_test[‘y’], linestyle=’None’, marker=’o’, color=’red’, label=’Test’)
	ax.legend()
	ax.set_xlabel(‘Date’)
	ax.set_ylabel(‘Sales’);

view raw ts5.py hosted with ❤ by GitHub

Forecasting with FBProphet

Let’s create a Prophet model, add the weekly, yearly components and then fit it with the train dataset.

	model = Prophet(
	daily_seasonality=False,
	weekly_seasonality=False,
	yearly_seasonality=False,
	changepoint_prior_scale=0.05,
	)

	model.add_seasonality(
	name=’weekly’,
	period=7,
	fourier_order=4,
	)

	model.add_seasonality(
	name=’yearly’,
	period=365.25,
	fourier_order=2,
	)

	model.fit(df_train);

view raw ts6.py hosted with ❤ by GitHub

	model = Prophet(
	daily_seasonality=False,
	weekly_seasonality=False,
	yearly_seasonality=False,
	changepoint_prior_scale=0.05,
	)

	model.add_seasonality(
	name=’weekly’,
	period=7,
	fourier_order=4,
	)

	model.add_seasonality(
	name=’yearly’,
	period=365.25,
	fourier_order=2,
	)

	model.fit(df_train);

view raw ts6.py hosted with ❤ by GitHub

Now, I can ask the model to predict the future on the testing dataset. I will ask it to estimate also the training points, so the model will predict for the whole dataset as follows.

	forecast = model.predict(df)
	forecast[[‘ds’, ‘yhat’]].head()

view raw ts7.py hosted with ❤ by GitHub

	forecast = model.predict(df)
	forecast[[‘ds’, ‘yhat’]].head()

view raw ts7.py hosted with ❤ by GitHub

	ds	yhat
0	2013-01-01	9.232334
1	2013-01-02	9.853309
2	2013-01-03	10.262132
3	2013-01-04	11.855213
4	2013-01-05	13.551500

FBProphet comes with a handy plot function allowing us to plot the three fitted components. Let’s plot them.

model.plot_components(forecast);

view raw ts8.py hosted with ❤ by GitHub

model.plot_components(forecast);

view raw ts8.py hosted with ❤ by GitHub

It can be seen from the trend component that the sales always increase from 2013 to 2018. Meanwhile, the yearly component shows that the sales peak is around June. Finally, the sales rise gradually from Monday to Sunday.

Let’s plot the forecast, train, and test datasets to validate this.

	fig, ax = plt.subplots(figsize=(15, 5))
	ax.plot(df_train[‘ds’], df_train[‘y’], c=’black’, marker=’o’, ms=3, linestyle=’None’, label=’Train’)
	ax.plot(df_test[‘ds’], df_test[‘y’], c=’r’, marker=’o’, ms=3, linestyle=’None’, label=’Test’)
	ax.plot(forecast[‘ds’], forecast[‘yhat’], c=’b’, marker=’o’, ms=3, linestyle=’None’, label=’Forecast’, alpha=0.5)
	ax.legend()
	ax.set_xlabel(‘Date’)
	ax.set_ylabel(‘Sales’);

view raw ts9.py hosted with ❤ by GitHub

	fig, ax = plt.subplots(figsize=(15, 5))
	ax.plot(df_train[‘ds’], df_train[‘y’], c=’black’, marker=’o’, ms=3, linestyle=’None’, label=’Train’)
	ax.plot(df_test[‘ds’], df_test[‘y’], c=’r’, marker=’o’, ms=3, linestyle=’None’, label=’Test’)
	ax.plot(forecast[‘ds’], forecast[‘yhat’], c=’b’, marker=’o’, ms=3, linestyle=’None’, label=’Forecast’, alpha=0.5)
	ax.legend()
	ax.set_xlabel(‘Date’)
	ax.set_ylabel(‘Sales’);

view raw ts9.py hosted with ❤ by GitHub

It can be observed from the above plot that the down trend during the last 6 months is captured correctly and the forecast values have 7 patterns (or 7 different cosine-like lines). Those are corresponding to 7 weekdays. Let’s plot the Monday and Sunday datapoints of the actual (including train and test) and forecast dataset.

	fig, ax = plt.subplots(figsize=(15, 5))
	forecast[‘weekday’] = forecast[‘ds’].dt.weekday
	df[‘weekday’] = df[‘ds’].dt.weekday
	colors = [‘r’, ‘g’, ‘yellow’, ‘pink’, ‘purple’, ‘cyan’, ‘blue’]
	weekdays = [‘Mon’, ‘Tue’, ‘Wed’, ‘Thu’, ‘Fri’, ‘Sat’, ‘Sun’]
	for wd in [0, 6]:
	fc_wd = forecast[forecast[‘weekday’] == wd]
	ax.plot(
	fc_wd[‘ds’], fc_wd[‘yhat’],
	c=colors[wd], marker=’o’, ms=5, linestyle=’None’,
	label=f’Forecast-{weekdays[wd]}’)
	df_wd = df[df[‘weekday’] == wd]
	ax.plot(
	df_wd[‘ds’], df_wd[‘y’],
	c=colors[wd], marker=’^’, ms=5, linestyle=’None’,
	label=f’Actual-{weekdays[wd]}’
	)
	ax.legend()
	ax.set_xlabel(‘Date’)
	ax.set_ylabel(‘Sales’);

view raw ts10.py hosted with ❤ by GitHub

	fig, ax = plt.subplots(figsize=(15, 5))
	forecast[‘weekday’] = forecast[‘ds’].dt.weekday
	df[‘weekday’] = df[‘ds’].dt.weekday
	colors = [‘r’, ‘g’, ‘yellow’, ‘pink’, ‘purple’, ‘cyan’, ‘blue’]
	weekdays = [‘Mon’, ‘Tue’, ‘Wed’, ‘Thu’, ‘Fri’, ‘Sat’, ‘Sun’]
	for wd in [0, 6]:
	fc_wd = forecast[forecast[‘weekday’] == wd]
	ax.plot(
	fc_wd[‘ds’], fc_wd[‘yhat’],
	c=colors[wd], marker=’o’, ms=5, linestyle=’None’,
	label=f’Forecast-{weekdays[wd]}’)
	df_wd = df[df[‘weekday’] == wd]
	ax.plot(
	df_wd[‘ds’], df_wd[‘y’],
	c=colors[wd], marker=’^’, ms=5, linestyle=’None’,
	label=f’Actual-{weekdays[wd]}’
	)
	ax.legend()
	ax.set_xlabel(‘Date’)
	ax.set_ylabel(‘Sales’);

view raw ts10.py hosted with ❤ by GitHub

It can be seen that the Monday and Sunday datapoints are separated and FBProphet captures this property correctly. I can now plot all weekdays to gain some insights.

	fig, ax = plt.subplots(figsize=(15, 5))
	ax.plot(
	df[‘ds’], df[‘y’],
	c=’black’, marker=’o’, ms=3, linestyle=’None’,
	label=f’Actual’
	)

	forecast[‘weekday’] = forecast[‘ds’].dt.weekday
	colors = [‘r’, ‘g’, ‘blue’, ‘pink’, ‘purple’, ‘cyan’, ‘orange’]
	weekdays = [‘Mon’, ‘Tue’, ‘Wed’, ‘Thu’, ‘Fri’, ‘Sat’, ‘Sun’]
	for wd in range(7):
	fc_wd = forecast[forecast[‘weekday’] == wd]
	ax.plot(
	fc_wd[‘ds’], fc_wd[‘yhat’],
	c=colors[wd], marker=’o’, ms=3, linestyle=’None’,
	label=f’Forecast-{weekdays[wd]}’, alpha=0.8)

	ax.legend()
	ax.set_xlabel(‘Date’)
	ax.set_ylabel(‘Sales’);

view raw ts11.py hosted with ❤ by GitHub

	fig, ax = plt.subplots(figsize=(15, 5))
	ax.plot(
	df[‘ds’], df[‘y’],
	c=’black’, marker=’o’, ms=3, linestyle=’None’,
	label=f’Actual’
	)

	forecast[‘weekday’] = forecast[‘ds’].dt.weekday
	colors = [‘r’, ‘g’, ‘blue’, ‘pink’, ‘purple’, ‘cyan’, ‘orange’]
	weekdays = [‘Mon’, ‘Tue’, ‘Wed’, ‘Thu’, ‘Fri’, ‘Sat’, ‘Sun’]
	for wd in range(7):
	fc_wd = forecast[forecast[‘weekday’] == wd]
	ax.plot(
	fc_wd[‘ds’], fc_wd[‘yhat’],
	c=colors[wd], marker=’o’, ms=3, linestyle=’None’,
	label=f’Forecast-{weekdays[wd]}’, alpha=0.8)

	ax.legend()
	ax.set_xlabel(‘Date’)
	ax.set_ylabel(‘Sales’);

view raw ts11.py hosted with ❤ by GitHub

Finally, FBProphet SMAPE on this dataset is 18% and calculated as below. To improve the precision, we need to tune FBProphet hyperparameters and add additional regressors.

	y_true = df_test[‘y’]
	y_forecast = forecast[-n_tests:][‘yhat’]
	smape = ((y_true – y_forecast).abs() / (y_true.abs() + y_forecast.abs())).mean() * 200
	print(‘The SMAPE error is:’, smape)

view raw ts12.py hosted with ❤ by GitHub

	y_true = df_test[‘y’]
	y_forecast = forecast[-n_tests:][‘yhat’]
	smape = ((y_true – y_forecast).abs() / (y_true.abs() + y_forecast.abs())).mean() * 200
	print(‘The SMAPE error is:’, smape)

view raw ts12.py hosted with ❤ by GitHub

The SMAPE error is: 18.283400082248754

Conclusion

FBProphet is very flexible as it allows you to add multiple seasonal components and additional regressors. Its component results are interpretable, hence users can intuitively tune the parameters to improve the performance. In addition, unlike SARIMA models, FBProphet does not require regularly spaced data points, therefore users do not need to interpolate missing data, e.g. when removing outliers. Finally, FBProphet uses STAN to fit the model, and STAN still has not supported GPU yet, so FBProphet does not scale up when fed with many data points.

Data and Analytics

Time Series Forecasting with FBProphet

Data analysis

Forecasting with FBProphet

Conclusion

Offices

Melbourne

Subscribe to updates from DiUS