Courses

Currently, all of our training courses are being held online.

All of our courses are hosted by expert certified trainers and research professionals who teach through a mix of demonstrative and practical sessions to provide high-class, practical training.

You can register for our courses online. To discuss any of our courses or specific training requirements, please call

+44 (0) 20 8697 3377 .

Machine Learning for Prediction and Causal Inference - Masterclass

7th - 9th December, 2020 (4 hours, 4 hours, 3 hours - per day) Online 3 days (7th December 2020 - 9th December 2020) Stata

Presented By: Dr. Melvyn Weeks (University of Cambridge)

Course Overview

Course Timetable
  • Day 1, 10am-12pm & 2pm -4pm
  • Day 2, 10am -12pm & 2pm -4pm
  • Day 3, 10am -1pm

This course will review the application of machine learning techniques to both prediction problems and so-called causal problems where a firm or policy maker needs to understand the impact of some form of intervention on a heterogeneous population.

One example, is a firm that wishes to understand how the introduction of a change in pricing impacts both aggregate demand, and the demand on different segments of the population. In another example, a policymaker seeks to understand the impact of an intervention both in terms of some form of average effect, but also how individuals differ in the magnitude of the effect. Examples include the impact of job training programmes, the impact of education policies in developing economies, and the differential impact of drugs on survival and recovery.

In this context we make the distinction between the ex post assessment of a change and the ex ante identification of characteristics of individuals that are predictive of the likely impact of such a change.

Using Breiman’s (2001) notion of two cultures in the use of statistical modelling, the course begins with a review of the fundamental differences between machine learning and econometrics.

There are two cultures in the use of statistical modelling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown. - Breiman [2001], p199.

We contrast a modelling approach where the analyst makes certain assumption on model specification, including functional form, with an approach where the data mechanism is presumed unknown. In this context we consider the econometrician’s concern for internal validity, alongside the focus within machine learning of ensuring that a model is robust in the sense of generalising to unseen data (external validity).

The course will focus upon topics at the intersection of machine learning and econometrics, covering a mix of theory and applications. In making the distinction between models which are used to solve a prediction problem and models which are used to estimate some form of causal effect, we introduce participants to identification strategies in econometrics. Here it is important to demonstrate how empirical strategies such as unconfoundedness, instrumental variables, and difference-in-difference can be used alongside machine learning methods for prediction.

As a point of departure we make reference to the two broad types of machine learning in terms of supervised and unsupervised learning, making the link to nonparametric regression. We then consider a number of fundamental building blocks, starting with error decomposition in terms of bias and variance, the role of training, estimation and test samples, and the role of regularization as a means to avoid overfitting.

In covering two broad areas where machine learning is used, namely prediction, classification and causal effects, for each case we link the exposition to parametric bench- marks. For prediction we consider the piecewise nonlinear regression model, and high dimensional methods; and for causal effects we consider the specification of models with instrumental variables and treatment effects.

Participants will also be introduced to the use of ensemble methods as an averaging and regularization device. In this context we will explore a number of general methods for model averaging including bootstrap sampling (so-called bagging) and random forests. For Machine Learning models in prediction, classification and causal effects we provide examples using Stata, R and Python.

Application: Causal Forest Estimation of Heterogeneous Household Response to Time-Of-Use Electricity Pricing Schemes

The introduction of time-of-use electricity prices is an example of a policy with heterogeneous effects. Consumers in different socioeconomic groups and with distinct historical intra-day load profiles and behavioural characteristics, may respond differently to the introduction of tariffs that charge different prices for electricity at different times of the day. Customers who can (cannot) adapt their consumption profile to tou tariffs will accrue a benefit (cost). Those who consume electricity at more expensive peak peri- ods, and who are unable to change their consumption patterns, could end up paying significantly more.

Analysts often describe subpopulations that are of interest a priori, and which can be defined by a known combination of covariates. However, increasingly researchers face a selection problem given a large number of possible covariates alongside uncertainty as to which covariates are important for heterogeneity, and what functional form best describes the association between these covariates and treatment effects.

In assessing whether demographic variables are informative in terms of the impact of tou tariffs on load profiles, the Customer-Led Network Revolution project noted:

.. a relatively consistent average demand profile across the different demo- graphic groups, with much higher variability within groups than between them. This high variability is seen both in total consumption and in peak demand.

In addition, the question of which demographic variables are important when considering the impact of energy policies ignores the fact that many of these variables should be considered together, in a multiplicative fashion. One reason for this finding might be that, for example, it is the (unknown) combination of income, household size, education, and daily usage patterns that describes a particularly responsive or unresponsive group.

Throughout the course we make reference to the problem of identifying the distributional effects of some intervention, without succumbing to the problems of data mining (multiplicity). Here we examine the empirical problem of identifying the characteristics of winners and losers subsequent to the introduction of tou tariffs following the intro- duction of a Time-of-Use (tou) pricing scheme where the price per kWh of electricity usage depends on the time of consumption. The pricing scheme is enabled by smart meters, which records consumption every half-hour.

Using machine learning methods we describe the association between the effect of tou pricing schemes on household electricity demand and a range of variables that are observable before the introduction of the new pricing schemes.

Readings
  • L. Breiman, J. Freidman, R. Olshen, C. Stone. Classification and Regression Trees.
  • Klein-Verlag, 1990.
  • J. Freidman, T. Hastie, R. Tibshirani. The Elements of Statistical Learning. Springer, 2009.

Regression Modelling using Stata

30th - 31st October 2020 Online 2 days (30th October 2020 - 31st October 2020) Stata

Presented By: Dr. Malvina Marchese

Regression modelling is a fundamental tool in the research box of every economist, econometrician or applied researcher in a variety of fields. Join Dr Malvina Marchese and learn the statistical theory behind linear and non-linear regression methods. These methods are taught with specially chosen datasets, using real examples from macro economic and finance research.

This course is for researchers from all academic disciplines who are new to Stata. The course assumes only limited statistical knowledge and experience of using statistical software.

Panel Data Econometrics - in collaboration with Lancaster University (online)

5 - 6 November, 2020 (Online 4 per day + 1 hour Q&A) Online 2 days (5th November 2020 - 6th November 2020) Stata

Prof. Sébastien Laurent, Aix-Marseille University

Panel data econometrics has developed rapidly over the last decades.

Longitudinal data are more and more available to researchers and methods to analyse these data are in high demand from scholars from different fields.

The course offers a comprehensive overview on panel data methods with Stata, covering static and dynamic linear models.

Each session briefly introduces the different methodologies, discussing strengths and weaknesses with a focus on the interpretation of the results.

By the end of the two-day on-line course, participants should be able to prepare panel data for the analysis with Stata, choose the relevant model, get the parameter estimates and interpret the results.

Machine Learning using Stata: Introduction & Advanced - in collaboration with Lancaster University (online)

26th - 27th October & 9th - 10th November 2020 Online 4 days (26th October 2020 - 10th November 2020) Stata

Course Overview: Part one

Recent years have witnessed an unprecedented availability of information on social, economic, and health-related phenomena. Researchers, practitioners, and policymakers have nowadays access to huge datasets (the so-called “Big Data”). This data is collected on people, companies and institutions, web and mobile devices and satellites, at an increasing speed and detail.

Machine learning is a relatively new approach to data analytics, which places itself in the intersection between statistics, computer science, and artificial intelligence. It's primary objective is that of turning information into knowledge and value by “letting the data speak”. Machine learning limits prior assumptions about data structure, and relies on a model-free philosophy that supports algorithm development, computational procedures, and graphical inspection more than tight assumptions, algebraic development, and analytical solutions. Machine learning was computationally unfeasible up until a few years ago. It is only possible on the machines of today, with their increased computing power and ability to learn, their hardware development, and with continuous software upgrading.

This course is a primer to machine learning techniques using Stata. Stata owns various packages to perform machine learning which are however poorly known to many Stata users. This course fills this gap by making participants familiar with Stata's potential to draw knowledge and value form row, large, and possibly noisy data. The teaching approach will be mainly based on the graphical language and intuition more than on algebra. The training will make use of instructional as well as real-world examples, and will evenly balance theory and practical sessions.

After the course, participants are expected to have an improved understanding of Stata's potential to perform some of the most used machine learning techniques, thus becoming able to master research tasks including, among others:

  • (i) factor-importance detection
  • (ii) signal-from-noise extraction
  • (iii) correct model specification
  • (iv) model-free classification, both from a data-mining and a causal perspective.

Course Overview: part 2

No prior knowledge of machine learning techniques are required to attend this course, as the first session will start from scratch with a fresh introduction to the subject. This course will focus on three specific techniques not covered in the first-part of the course, that is: regression and classification trees (including bagging, random forests, and boosting), kernel-based regression, and global methods (step-wise, polynomial, spline, and series regressions).

The teaching approach will be mainly based on the graphical language and intuition more so than on algebra. The training will make use of instructional as well as real-world examples, and will evenly balance theory and practical sessions.

After the course, participants are expected to have an improved understanding of Stata's potential to perform some of the most used machine learning techniques, thus becoming able to master research tasks including:

  • (i) factor-importance detection,
  • (ii) signal-from-noise extraction,
  • (iii) model-free regression and classification, both from a data-mining and a causal perspective.

The course is open to people coming from all scientific fields, but it is particularly targeted to researchers working in the medical, epidemiological and socio-economic sciences.

Econometrics of Program Evaluation Using Stata

30 Nov - 1 Dec, 2020 (10am - 12pm & 2pm - 4pm, GMT) Online 2 days (30th November 2020 - 1st December 2020) Stata

Presented By: Dr. Giovanni Cerulli

This course will provide participants with the essential tools, both theoretical and applied, for a proper use of modern micro-econometric methods for policy evaluation and causal counterfactual modelling under both assumptions of “selection on observables” and “selection on unobservables”. The course will cover these approaches: Regression adjustment (parametric and nonparametric), Matching (on covariates and on propensity score), Reweighting and Double-robust methods, and Difference-in-differences methods.

Time Series Analysis & Modelling Using Stata

10 - 11 December 2020 (CT 9am - 5pm) Online 2 days (10th December 2020 - 11th December 2020) Stata

Presented By: Dr George Naufal (Public Policy Research Institute (PPRI), Texas A&M University)

Time series data are nowadays collected for several phenomena in social and empirical sciences. Initially collected at year or quarter level, time series data are now used by marketing analytics, financial technology, and other fields in which data are collected at much smaller intervals (daily, hourly and even by the minute). This course focuses on the fundamental concepts required for the analysis, modelling and forecasting of time series data and provides an introduction to the theoretical foundation of time series models alongside a practical guide to the use of time series analysis techniques implemented in Stata 15. The course is based on the textbook by S. Boffelli and G. Urga (2016), Financial Econometrics Using Stata, Stata Press Publication.

Stata Winter School Online

14th - 19th December, 2020 (5.5 hours of live teaching over 3 sessions: 10.00-12.00; 13.00-15.00; 15.30-17.00 (GMT)) Online 5 days (14th December 2020 - 19th December 2020) Stata

Course Overview

The Stata Winter School consists of a series of one and two-day courses which can be taken individually or as a whole as required. The School is aimed at students, academics and professionals who want to develop and strengthen their data processing, programming, graphics and statistical skills using Stata. All of the courses are taught interactively using a blend of theory, follow-along demonstrations and exercises.

For the first time, we will be running our winter school entirely online, so you can join from the comfort of your home, anywhere in the world.

The course timetable: 5.5 hours of live teaching over 3 sessions: 10.00-12.00; 13.00-15.00; 15.30-17.00 (GMT). Each session will include time for Q&A.

7 Item(s)
Post your comment

Timberlake Consultants