By Dr Gwinyai Nyakuengama

(3 October 2018)

KEY WORDS

ARFIMA; Time Series; Daily female births in California; Stata; R package – Prophet

ACKNOWLEDGEMENT

We gratefully acknowledge:

StataCorp, Survey Design and Analysis Services: https://surveydesign.com.au/, Doctor Becketti (2013), authors of the R package – Prophet for their core macros in Stata and R language;
datamarket.com for their data; and
some anonymous colleagues.

Collectively, these parties not only inspired but underpinned this blog.

OBJECTIVE

To predict the 30-day, daily total female births in California, for January 1960.

METHOD

In this study:

Daily total female births (female for California reported in 1959 were accessed from datamarket.com .
Stata was used to test for stationarity in this time series data.
Stata was used to fit an Auto-Regressive Fractionally Integrated Moving Average
(ARFIMA) model and to predict the daily female births for the month of January, 1960.
Also, the R package – Prophet, was used to fit a time-series model with additive seasonalities, meaning the effect of the seasonality is added to the trend to get the forecast.

RESULTS

Cali_daily_birth_Slide3

This Stata plot of the daily female births in California for 1959 showed that the data has very high volatility.

This was suggestive of:

a non-stationary time series, and most importantly, the existence of a long-memory volatility in the series; and
an ARFIMA modelling solution to predict the daily female births for the month of January, 1960.

Cali_daily_birth_Slide4

These Stata auto-correlation and partial auto-correlation plots also suggested the presence of serial correlation in the female daily birth time series.

Cali_daily_birth_Slide5

Based on these Stata Dickey-Fuller test results, we failed to reject the null hypothesis of a random walk with a possible drift in the female daily births.

Cali_daily_birth_Slide6

In Stata, the commonly used criteria for choosing appropriate time series lags are Schwarz’s Bayesian information criterion (SBIC), the Akaike’s information criterion (AIC), Final Prediction Error (FPE) and the Hannan and Quinn information criterion (HQIC). It turns out that AIC works well on monthly data.

The above results from Stata’s vector auto-regressive selection order (vascor) macro indicate that the second lag (ar2) was picked by most decision criteria (i.e. FPE , AIC and HQIC). However, a lagged 1 period (ar1) was selected using the SBIC criterion.

Cali_daily_birth_Slide7

The DFGLS: Stata module to compute Dickey-Fuller/GLS unit root test command:

calculated the optimal lag length using a sequential t-test (Ng and Perron, 1995), Schwert criterion (SC) and the “modified AIC” (MAIC) statistical criteria as 6, 1 and 7, respectively; and
controls for a linear time trend by default unlike the Stata dfuller or pperron commands.

Based on these results:

we failed to reject the null hypothesis of a random walk with drift in the daily girl birth series; and
the daily female births were accurately estimated, judging by the relatively low root-mean-square error (rmse) of around seven daily girl births, considering the high volatility of this time series.

Cali_daily_birth_Slide8

The above Stata ARFIMA regression results suggested:

a significant model fit and, more importantly;
d, the fractionally differenced component of the predicted series, reflected a significant a fractionally integrated process with 0 < d < ½ ; and
the L1.ar and L1.ma were both significant.

This Stata plot shows:

that the dynamic forecasting (xb prediction) obtained using the ARFIMA model faithfully tracked the observed daily female births through out 1959; and
the 30-day, daily female birth prediction for January 1960, with the 90 per cent prediction intervals around the mean.

Cali_daily_birth_Slide10

Just focusing on the 30-day prediction from the Stata ARFIMA model:

the daily female births in January 1960, was around 43 births, contained within the 90% CI bands (see next figure); and
on average, this prediction had a root-mean-square error (rmse) of 7 daily female births (see next figure).

Cali_daily_birth_Slide11

The Stata ARFIMA model’s 30-day predictions in January 1960 show;

around 43 daily female births, with a;
root-mean-square error (rmse) of around 7 daily female births.
note that this figure agrees with the estimate shown earlier that was obtained using the Stata command, df-gls.

We also predicted the births using the R package – Prophet, tuned the predictions to 90% CI , same as in Stata.

This plot from the R package – Prophet shows:

periodicity in the daily female births; and
presence of outliers – dots outside the shaded blue area (90% CI).

The average root-mean-square error (rmse) from R was also around seven daily female births (or 7.2 exactly).

Cali_daily_birth_Slide15

Just focusing on the 30-day prediction in January 1960, these two plots from the R package – Prophet show:

strong weekly periodicity in the daily female births; with
peaks every Tuesday and Wednesday and troughs every Sunday.

CONCLUSION

The Stata ARFIMA model was an excellent fit of the highly volatile, daily female births in California for 1959.
On average, the Stata ARFIMA model predicted 43 daily female births (+/- seven births) for the month of January, 1959.
Pleasingly, both Stata and the R package – Prophet gave consistent and complementary results.
Additionally, the R package – Prophet picked up some strong weekly periodicity in the data – with most births occurring on Tuesdays and Wednesdays and the least births occurring on Sundays.

BIBLIOGRAPHY

Becketti S. (2013): Introduction to Time Series Using Stata 1st Edition, Stata Press https://www.amazon.com/Introduction-Time-Using-Stata-Becketti/dp/1597181323

Ivanov, V. and Kilian, L. 2001. ‘A Practitioner’s Guide to Lag-Order Selection for Vector Autoregressions’. CEPR Discussion Paper no. 2685. London, Centre for Economic Policy Research. http://www.cepr.org/pubs/dps/DP2685.asp

Prophet: https://facebook.github.io/prophet/docs/quick_start.html

Prophet R package: June 15, 2018 https://cran.r-project.org/web/packages/prophet/prophet.pdf

StataCorp 2013: Stata Time-Series Reference Manual https://www.stata.com/manuals/ts.pdf.

Time Series Prediction of Daily Total Female Births in California – January, 1960

By Dr Gwinyai Nyakuengama

(3 October 2018)

KEY WORDS

ACKNOWLEDGEMENT

OBJECTIVE

METHOD

RESULTS

CONCLUSION

BIBLIOGRAPHY

Like this:

Published by predictivedatanalytics

Leave a ReplyCancel reply

By Dr Gwinyai Nyakuengama

(3 October 2018)

KEY WORDS

ACKNOWLEDGEMENT

OBJECTIVE

METHOD

RESULTS

CONCLUSION

BIBLIOGRAPHY

Share this:

Like this:

Published by predictivedatanalytics

Leave a ReplyCancel reply

Discover more from DatAnalytics