Econometrics Notes

In progress, Chapters 1 to 11 available.

Author

Anthony Tay

Published

October 31, 2023

Preface

These notes were written to accompany the econometrics courses that I teach at the School of Economics, Singapore Management University (SMU):

  • ECON207 Intermediate Econometrics (BSc Econ)
  • ECON682 Econometric Analysis (Econometrics core for MSc Econ / MSc Fin. Econ.)
  • ECON6001 Time Series Econometrics (MSc Economics - Quantitative Economics Track)

There are (will be) about 20 to 30 chapters in total; the specific chapters you will use are listed in your course outline.

What is Econometrics?

Econometrics draws on statistics, economic theory, and mathematics to develop tools for estimating economic relationships, for the purposes of decision making, prediction and forecasting, inferring causal effects, evaluating the efficacy of policy interventions and initiatives, testing the validity of economic theories and their underlying assumptions, and answering a multitude of questions that are ultimately empirical in nature. Examples include:

  • Pricing decisions by firms require knowledge of the price sensitivity of demand for their products. These are provided by estimates of the products’ price elasticities of demand.
  • Monetary authorities / central banks build empirical forecasting models of the economy to help anticipate outcomes such as high inflation or economic recessions and predict the outcome of potential policy responses.
  • House prices that are very much higher than that predicted by an empirical model linking house prices to economic fundamentals may indicate imbalances in the economy that require policy intervention.
  • There is a long list of public initiatives undertaken by authorities to encourage certain behaviors in people and firms, or to improve economic, health, educational and other outcomes in populations. To what extent do they work?
  • Many theories in various fields such as industrial organization, economic growth, economic geography among others, assume constant returns to scale in production. Are such assumptions in line with empirical evidence, or would they fall when tested against data.
  • Estimates of the economic effect of climate change must factor in adaptation by industries. While we can expect industries to at least try to adapt, is there evidence that they are able to do so effectively and quickly enough?

Such applications present many challenges. The challenge in forecasting applications is to find predictors that have stable relationships with the variable being forecast, and to determine and estimate the form of these relationships. In some cases there are many potential predictors, each limited in predictive ability on its own, but perhaps powerful in totality. The challenge is to estimate usable forecasting relationships with those predictors.

Causal inference – empirically teasing out causal relations from correlative ones – must deal with confounding effects. For example, the causal link from years of education to earnings is tangled up with the effects of individual characteristics such as ability, work experience at the time of sampling, family background, among others things, all of which drive both earnings and the decision to pursue more years of education. Any attempt to interpret a correlation between years of education and earnings as a causal effect must somehow control for these factors. The ideal situation is if we could hold everything fixed apart from the candidate “causal” variable \(x\) and observe what happens to the ‘explained’ variable \(y\) when we change \(x\), but of course this is impossible. What are the alternatives? In some applications, one might be able to employ a randomized controlled trial (RCT) wherein subjects are randomly assigned into a treatment group and a non-treatment group. The randomization breaks the link between the confounding characteristics and the treatment, and enables one to interpret the correlation between treatment and outcome as evidence of causation. In most cases, however, researchers have to depend on observational data, where information regarding a sample drawn from a population is observed without any intervention from the researcher. In these cases, clever methods must be devised to tease out causality from correlation.

Econometric methods must also take into consideration the data structures found in economic data – whether data is made up of a sample from a population taken at some point in time (we call this “cross-sectional data”), or several cross-sections resampled over multiple periods (“pooled cross-sections”) or the same cross-sectional sample re-observed over multiple periods (“panel or logistical data”), or observations of variables taken over multiple time periods (“time-series data”), and so on. In some applications, the researcher has to take special steps to counter the complications that arise because of this structure. In other examples, the features of certain data structures can be exploited to assist in empirical causal inference. Other data related issues include measurement error, and the fact that we often are only able to employ data that are, at best, proxies of the actual variables we would like to study.

Econometricians have always relied on computers to implement their formulas. This reliance has further increased as computer-based statistical methods – where algorithms have replaced formulas – have become more important over the past few decades. The econometrician now must add computing skills, in addition to economic theory, mathematics and statistics, to her list of competencies.

Mathematical Prerequisites

I assume that the reader is able to do simple differentiation and integration. What we need of optimization theory is reviewed in a section in the mathematics review chapter. We will use a considerable amount of matrix algebra, and detailed notes are provided on this topic. We will, of course, use probability theory and statistics extensively. These notes include chapter-length reviews of both.

Software

The computations in these notes were done in R. Data are available from your course webpages, and I assume these are stored in a ‘data’ folder in your working directory. There are many introductions to R on the web. I will proceed on the assumption that you have studied some of these, and that you have a working installation on your computer. I recommend running R within the RStudio Integrated Development Environment (IDE). The chapter “Introduction to R” contains brief instructions on installing R and RStudio, and a quick primer on using R.