Econometric modelling can go awry
Econometric analysis is often used to provide justification for regulatory proposals and decisions. Yet figures are not infallible and mistakes can happen for a number of reasons. This bulletin uses recent actual examples to show the importance of using appropriate methods, and why it is critical to carefully review the analysis before drawing conclusions from the modelling results.
The importance of being robust
Econometric analysis is a powerful tool. It can be used to test claims and draw conclusions. It can be used to study the growth of economies, underpin conclusions regarding market power in antitrust cases, and provide estimates of the efficient costs of regulated monopolies. But, like most tools, it must be used correctly to avoid inadvertently causing harm.
Just as you would want a qualified medical practitioner performing a routine check-up, econometric analysis should be performed by a knowledgeable practitioner and second opinions may also be valuable. Undertaking quality assurance is important.
Below, we discuss issues that may affect the validity of an econometric analysis, including some examples uncovered by Frontier Economics during recent reviews of work.
Computational errors
The potential for human error means it is possible to make mistakes when typing the commands for the statistical software used in econometric analysis. A missing hyphen led to the destruction of NASA’s Mariner 1 spacecraft; results of econometric analysis can similarly be impacted.
- A famous 2010 paper showed real economic growth slows when debt levels rise to more than 90% of gross domestic product. This conclusion was used by a number of policy makers as justification for austerity measures. Researchers later showed that this result was due to coding errors in a spreadsheet.[1]
- In 2015 the University of Melbourne’s Institute of Applied Economic and Social Research presented results showing that graduates of the “Group of Eight” universities earn significantly less than graduates of other universities. These results were later retracted, citing a “coding error”.[2]
- In a recent working paper[3] by a regulatory authority examining the dividend drop-off method for valuing imputation credits, the authors tested whether the distribution of the random movements impacting stocks with high franking credits differs from the distribution of the random movements impacting stocks with low franking credits. However, there was a typo in the code. Instead of comparing apples with apples, they unknowingly compared a basket of apples and oranges with a basket of apples and bananas. The obvious difference between bananas and oranges led them to conclude that the contents of the baskets (i.e. the apples) were different.
Mistakes like the above examples can be challenging to identify, even with rigorous quality assurance procedures. A culture of transparency can help mitigate risk. If suspicions arise about certain findings, stakeholders should verify the analysis undertaken by examining the code used to perform the analysis. Ultimately, all errors have some ramifications. While some may be purely academic, others may directly impact a business’s bottom line. In regulated industries, erroneous results might make a difference of millions of dollars. A number of regulators in Australia allow for stakeholders to review the code used in the analysis underpinning their findings. This improves accountability, and allows mistakes to be identified and corrected. This in turn improves the quality of decisions made by regulators.
Data Quality/timeliness
Results of econometric analysis are only as reliable as the source data: garbage in, garbage out. The data may not be appropriate for the purpose it is used for, or perhaps there exists a better dataset, one that is likely to provide more reliable information. Stale data may cease to be useful in determining numerical relationships.
- Energy Networks Australia commissioned a review of allowable debt-raising costs using updated data. This showed an increase in the typical arrangement fees for bonds compared to previous estimates used by the Australian Energy Regulator (AER). As a result, the AER revised upwards the allowable debt-raising costs, recognising that the updated dataset was more informative of prevailing conditions and therefore more appropriate.[4] This increase was significant, of the order of $1m over a determination period. This meant the updated data allowed businesses to charge higher rates to cover their expenses.
- In benchmarking electricity distributors, the AER wanted to use a particular econometric technique that is very data-intensive.[5] It requested businesses to report eight years of historical data; however, some of the businesses did not have very reliable data for all those years.[6] This raised concerns when the data was then used by the AER to determine the efficiency of the businesses. On appeal, the Australian Competition Tribunal found that the AER should have had less confidence in its benchmarking results than it did because of the backcasting involved in compiling the dataset used by the AER in the benchmarking.
This is not to say that imperfect data is uninformative; data sources often fall short of perfection. But it is important to strive to have the most up-to-date and accurate information whenever practicable.
Improper cleaning techniques
Let’s assume the code has been programmed correctly, and the dataset is appropriate for the query at hand. What else is important? In a typical analysis, the data is obtained from a variety of sources. However, the data is often not in an immediately useable format and usually requires “cleaning”. There may be errors that could have a substantial impact on the results if not corrected. Data may have been inputted incorrectly, perhaps missing a digit. Units of measurement may also be inconsistent between observations - petrol prices may be reported in dollars for some observations and cents for others, distance travelled may be reported in kilometres, or thousands of kilometres. Frontier Economics has encountered and corrected for such errors on a number of occasions.
Ensuring that like is compared with like is important. Rigorous cleaning methods will reduce the chance that errors survive the cleaning process. But such mistakes may not be noticed, instead appearing as outliers or influential observations, impacting results.
Statistical misconceptions
Statistics, and by extension econometrics, is often at odds with intuition. People often have cognitive biases which may lead to incorrect conclusions. Patterns may be seen, when in fact there is no pattern. It is for this reason that it is preferable to employ an approach that satisfies statistical rigour—to separate genuine information from ‘noise’.
- It is often observed that the smallest schools are overrepresented among the best performing schools.[7] This observation resulted in a push towards smaller schools in some US states. This leap in logic ignores the relationship between school size and the variability in scores: the fewer students in a school, the easier it is for the average score to be substantially above or substantially below the national average. In fact, small schools would also be commonly found among the worst performing schools.
- When electricity utilities investigate the likely impact of the introduction of a new tariff, they often consider the impact on a typical or average customer. However, customers that are most likely to ‘opt in’ to the new tariff are those that have a consumption profile that benefits from the new tariff, e.g. customers with low consumption during peak periods. As a result of this ‘self-selectivity’, the predictions of the impact of the new tariff on overall peak demand could be significantly overstated.
Careful attention needs to be paid in the prediction methodology to mitigate the impact of any selectivity biases.
Conclusion
Econometric analysis is not a trivial exercise and there are many potential issues that may undermine any conclusions reached. Hence, it is important not to take work performed by others at face value: a surprising finding may be the result of a tiny coding mistake, errors in the data or inappropriate statistical procedures and statistical reasoning. But when done right, econometric analysis can be a very powerful tool for quantifying economic analyses and underpinning economic and regulatory decision-making.
[1] “(W)e were able to identify the selective exclusion of available data, coding errors and inappropriate methods for weighting summary statistics”, p.261 in Herndon, T., Ash, M., & Pollin, R. (2014). Does high public debt consistently stifle economic growth? A critique of Reinhart and Rogoff. Cambridge Journal of Economics, 38(2), 257-279.
[2] “Following advice … that they were unable to replicate some of the findings … a review of the computer programs used to produce the table revealed some coding errors”, p.61 in Wilkins, R. (2015). The Household, Income and Labour Dynamics in Australia Survey: Selected Findings from Waves 1 to 12. Melbourne: Melbourne Institute of Applied Economic and Social Research, The University of Melbourne.
[3] Economics Regulation Authority (2017), Estimating the Utilisation of Franking Credits through the Dividend Drop-Off Method. available at http://www.erawa.com.au/cproot/17208/2/Secretariat Working Paper - Estimating the Utilisation of Franking Credits.PDF
[4] AER, Final Decision Ausnet Services Transmission Determination 2017-2022 - Attachment 3 - Rate of Return, April 2017, p. 403.
[5] Stochastic frontier analysis.
[6] The specific data requested may not have been recorded for the purposes of regulation in the early years of the sample.
[7] See Kane and Staiger, The Promise and Pitfalls of Using Imprecise School Accountability Measures, 2002DOWNLOAD FULL PUBLICATION