This course will review the application of machine learning techniques to both prediction problems and so-called causal problems where a firm or policy maker needs to understand the impact of some form of intervention on a heterogeneous population.

One example, is a firm that wishes to understand how the introduction of a change in pricing impacts both aggregate demand, and the demand on different segments of the population. In another example, a policymaker seeks to understand the impact of an intervention both in terms of some form of average effect, but also how individuals differ in the magnitude of the effect. Examples include the impact of job training programmes, the impact of education policies in developing economies, and the differential impact of drugs on survival and recovery.

In this context we make the distinction between the ex post assessment of a change and the ex ante identification of characteristics of individuals that are predictive of the likely impact of such a change.

Using Breiman’s (2001) notion of two cultures in the use of statistical modelling, the course begins with a review of the fundamental differences between machine learning and econometrics.

*There are two cultures in the use of statistical modelling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown. - Breiman [2001], p199.*

We contrast a modelling approach where the analyst makes certain assumption on model specification, including functional form, with an approach where the data mechanism is presumed unknown. In this context we consider the econometrician’s concern for internal validity, alongside the focus within machine learning of ensuring that a model is robust in the sense of generalising to unseen data (external validity).

The course will focus upon topics at the intersection of machine learning and econometrics, covering a mix of theory and applications. In making the distinction between models which are used to solve a prediction problem and models which are used to estimate some form of causal effect, we introduce participants to identification strategies in econometrics. Here it is important to demonstrate how empirical strategies such as unconfoundedness, instrumental variables, and difference-in-difference can be used alongside machine learning methods for prediction.

As a point of departure we make reference to the two broad types of machine learning in terms of supervised and unsupervised learning, making the link to nonparametric regression. We then consider a number of fundamental building blocks, starting with error decomposition in terms of bias and variance, the role of training, estimation and test samples, and the role of regularization as a means to avoid overfitting.

In covering two broad areas where machine learning is used, namely prediction, classification and causal effects, for each case we link the exposition to parametric bench- marks. For prediction we consider the piecewise nonlinear regression model, and high dimensional methods; and for causal effects we consider the specification of models with instrumental variables and treatment effects.

Participants will also be introduced to the use of ensemble methods as an averaging and regularization device. In this context we will explore a number of general methods for model averaging including bootstrap sampling (so-called bagging) and random forests. For Machine Learning models in prediction, classification and causal effects we provide examples using Stata, R and Python.

The introduction of time-of-use electricity prices is an example of a policy with heterogeneous effects. Consumers in different socioeconomic groups and with distinct historical intra-day load profiles and behavioural characteristics, may respond differently to the introduction of tariffs that charge different prices for electricity at different times of the day. Customers who can (cannot) adapt their consumption profile to tou tariffs will accrue a benefit (cost). Those who consume electricity at more expensive peak peri- ods, and who are unable to change their consumption patterns, could end up paying significantly more.

Analysts often describe subpopulations that are of interest a priori, and which can be defined by a known combination of covariates. However, increasingly researchers face a selection problem given a large number of possible covariates alongside uncertainty as to which covariates are important for heterogeneity, and what functional form best describes the association between these covariates and treatment effects.

In assessing whether demographic variables are informative in terms of the impact of tou tariffs on load profiles, the Customer-Led Network Revolution project noted:

*.. a relatively consistent average demand profile across the different demo- graphic groups, with much higher variability within groups than between them. This high variability is seen both in total consumption and in peak demand.*

In addition, the question of which demographic variables are important when considering the impact of energy policies ignores the fact that many of these variables should be considered together, in a multiplicative fashion. One reason for this finding might be that, for example, it is the (unknown) combination of income, household size, education, and daily usage patterns that describes a particularly responsive or unresponsive group.

Throughout the course we make reference to the problem of identifying the distributional effects of some intervention, without succumbing to the problems of data mining (multiplicity). Here we examine the empirical problem of identifying the characteristics of winners and losers subsequent to the introduction of tou tariffs following the intro- duction of a Time-of-Use (tou) pricing scheme where the price per kWh of electricity usage depends on the time of consumption. The pricing scheme is enabled by smart meters, which records consumption every half-hour.

Using machine learning methods we describe the association between the effect of tou pricing schemes on household electricity demand and a range of variables that are observable before the introduction of the new pricing schemes.

- L. Breiman, J. Freidman, R. Olshen, C. Stone. Classification and Regression Trees.
- Klein-Verlag, 1990.
- J. Freidman, T. Hastie, R. Tibshirani. The Elements of Statistical Learning. Springer, 2009.

Morning Session | Afternoon Session | Evening Session |
---|---|---|

10am-12pm | 1pm-3pm | 3:30pm-5pm |

The course is designed to provide both the tools to undertake projects using machine learning (ml), and critically ensure that participants understand and can communicate how the methods work.

Towards this objective, on Day 1, Session 1 we introduce participants to the vernacular of machine learning tools.

In Session 2 we will further explore the links between ml, econometrics and data mining. We also examine how ml utilise data mining tools, suitably adapted to allow inference. The course is designed in such as way to ensure that participants are given the necessary context to understand the genesis of ml methods. To this end, the first point of departure reviews the ordinary least squares estimator and provides links to ml using kernel density estimation. We also provide the necessary links to econometrics and nonparametric statistics.

Course Notes: Overview, Prediction and Evaluation

- High-level overview of Machine Learning and AI
- Machine Learning: The Vernacular
- The Nature of Prediction Problems
- Prediction, Evaluation and Causal Inference

Course Notes: ML and Econometrics, Point Dep OLS

- Econometrics
- Machine Learning: Tools and Vernacular
- Bias Variance Trade-off
- Regularisation
- Multiplicity and P-values
- Ensemble Learning
- Point of Departure I: The Ordinary Least Squares Estimator

Day 2, Session 1 begins with the second point of departure - high dimensional methods in statistics. These methods are used when analysts face a big data problem in terms of which of a large set of explanatory variables to include in a regression model.

We follow this with a practical where participants can explore the use of regularised regression tools with a number of empirical applications.

In session 2 we provide an introduction to a number of machine learning methods including regression trees and forests. This is then followed by a practical where we examine the use of ml methods for prediction.

Course Notes: Point Dep II High Dimens Methods, Applications of Regularised Regression

- High Dimensional Methods
- Least absolute Shrinkage and Selection (lasso)
- Choosing λ
- Causal Inference in High-Dimensions
- lasso For Treatment Models
- Double lasso
- Practical: Regularized Regression

Course Notes: ML and Decision Trees

- Machine Learning and Decision Trees - 1.5hr
- Machine Learning: Terminology and Concepts
- An Overview of Regression Trees
- The Bias-Variance Trade-off
- Training, Testing and Cross Validation
- Regularization: Variance reduction and Ensemble Learning
- Practical: Machine Learning for Prediction

On Day 3, Session 1, we review some of the fundamentals of machine learning that have been introduced. This includes the use of ml for prediction, classification and causal effects, alongside the key methodological concepts such as the bias-variance trade-off and methods to achieve regularisation.

- ml for prediction and causal inference
- Machine Learning: the vernacular
- Bias-Variance trade-off, overfitting and prediction
- Regularisation for prediction and causal inference

Session 2 begins with the third point of departure - programme evaluation and treatment effects. We make reference to the work of the Nobel Laureate Esther Duflo who has made significant contributions to the use of randomised control trials, in addition to the utilisation of machine learning methods in this context.

In Session 2 we examine the use of machine learning methods for causal inference. Relative to some econometric methods, ml techniques have sort to exploit so-called big data to provide a coherent approach to uncover variation in treatment effects without succumbing to the pitfalls of data mining.

This is followed by a practical where we examine the use of ml methods applied to the impact of time-of-use electricity on individual-level demand response. A key question here is whether it is possible to identify characteristics of households that enable policy makers to identify so-called winners and losers once we move to a price system where prices vary throughout the day.

- Causal Inference for Treatment Effects
- Chernozhukov et al. Generic Machine Learning Inference on Heterogenous Treatment Effects
- Honest Estimation
- Forests and Variance Reduction Methods
- Instrumental Variables Forests
- Regression and Classification trees versus iv trees
- Athey, Wager and Tibishrani - Generalised Random Forests (grf)
- Applications
- Time of Use Tariffs and Smart Meter Data-Heterogeneous treatment effects

- L. Breiman, J. Freidman, R. Olshen, C. Stone. Classification and Regression Trees. Klein-Verlag, 1990.
- Random Forests. https://en.wikipedia.org/wiki/Random_forest.
- Training, Validation, and Test sets. https://en.wikipedia.org/wiki/ Training,_validation,_and_test_sets
- J. Freidman, T. Hastie, R. Tibshirani. The Elements of Statistical Learning. Springer, 2009.
- G. James, D. Witten, T. Hastie, R. Tibshirani. An Introduction to Statistical Learning with Applications in R. Springer, 2013.
- S. Russell, P. Norvig Artificial Intelligence: A Modern Approach 3rd edition, 2009

- L. Breiman Statistical Modelling: The Two Cultures Statistical Science, Vol. 16, No. 3. pp. 199-215
- S. Athey - The Impact of Machine Learning on Economics. in, The Economics of Artificial Intelligence: An Agenda, 2018. National Bureau of Economic Research. See http://bit.ly/2EENtvy S. Athey,G. Imbens Machine Learning Methods Economists Should Know About. Working Paper, 2019, Graduate School of Business, Stanford University.
- S. Mullainathan, J. Spiess. Machine Learning: An Applied Econometric Approach Journal of Economic Perspectives vol. 31, 2017, pp. 87-106.
- A. Belloni, V. Chernozhukov, C. Hansen. High-dimensional methods and inference on structural and treatment effects. Journal of Economic Perspectives, 28(2):29-50, 2014(a)

- Student registrations: Attendees must provide proof of full time student status at the time of booking to qualify for student registration rate (valid student ID card or authorised letter of enrolment).
- Additional discounts are available for multiple registrations.
- Delegates are provided with temporary licences for the principal software package(s) used in the delivery of the course. It is essential that these temporary training licenses are installed on your computers prior to the start of the course.
- Payment of course fees required prior to the course start date.
- Registration closes 1 calendar day prior to the start of the course.
- 100% fee returned for cancellations made more than 28-calendar days prior to start of the course.
- 50% fee returned for cancellations made 14-calendar days prior to the start of the course.
- No fee returned for cancellations made less than 14-calendar days prior to the start of the course.

**The number of attendees is restricted. Please register early to guarantee your place.**