MACHINE LEARNING or STATISTICAL LEARNING
 
Colin Cameron, Department of Economics, University of California - Davis  October 2023

Machine learning methods for prediction are well-established in the statistical and computer science literature.
Applying machine learning methods for causal influence is a very active area in the economics literature.
A summary such as that in the slides below can become dated very quickly. 

SLIDES: MACHINE LEARNING VERY BRIEF OVERVIEW 2023

This 29 slide overview was presented October 2023
machlearn2019_Intro_very_brief.pdf

BOOK CHAPTER: 2022

Chapter 28 in A. Colin Cameron and Pravin K. Trivedi, Microeconometrics using Stata: Volume 2 Nonlinear Models and Causal Inference Methods covers Machine Learning Methods for Prediction and for Causal Inference. Click here for book information.

Stata mostly uses the Lasso, ridge regression and elastic net. This is enough to provide a good introduction to machine learning methods. Additionally Stata has some built-in commands for causal inference using the LASSO in the partial linear model and the standard binary treatment effects model.

For other machine learners such as neural nets and random forests it is standard to use packages in Python or R. 

SHORT COURSE: 2022

In May 2022 I gave eight hours of lectures on machine learning for econometrics at Simon Fraser University.
Click here for course slides, programs and data sets.

SLIDES: MACHINE LEARNING BRIEF OVERVIEW 2019

This 60 slide overview was presented June 2019
machlearn2019_Intro_brief.pdf

SLIDES: CAUSAL MACHINE LEARNING FOR ECONOMICS BRIEF OVERVIEW 2020

This 20 slide introduction to casual inference for the partial linear model using the LASSO was presented January 2020
machlearn2020_Causal_Intro_brief.pdf

USEFUL TEXTS FOR MACHINE LEARNING (NOT ECONOMICS)

For statistical learning a leading text is the undergraduate / masters level book
ISL2: Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani (2021), An Introduction to Statistical Learning: with Applications in R, Second Edition, Springer.
A free legal pdf is at https://www.statlearning.com/

A Python version of this book is also available.
ISLP: Garetha James, Daniela Witten, Trevor Hastie, Robert Tibsharani and Jonathan Taylor, (2023), An Introduction to Statistical Learning: With Applications in Python, Springer.

A free legal pdf is at https://www.statlearning.com/

Supplementary material on statistical learning is in the Ph.D. level book
ESL: Trevor Hastie, Robert Tibsharani and Jerome Friedman (2009), The Elements of Statistical Learning: Data Mining, Inference and Prediction, Springer.
A free legal pdf is at http://statweb.stanford.edu/~tibs/ElemStatLearn/index.html and a $25 hardcopy can be obtained via http://www.springer.com/gp/products/books/mycopy

Another book that is good but I haven't used is
Bradley Efron and Trevor Hastie (2016)
Computer Age Statistical Inference: Algorithms, Evidence and Data Science,  Cambridge University Press.

USEFUL TEXTS FOR MACHINE LEARNING (FOR ECONOMICS)


The following book is more recent and includes some causal methods

Matt Taddy (2019), Business Data Science: Combining Machine Learning and Economics to Optimize, Automate, and Accelerate Business Decisions, McGraw-Hill.

LEADERS IN ECONOMETRICS

Bringing established machine learning methods into econometrics is currently an active area. The literature focuses on valid statistical inference controlling for first-stage data mining, and causal inference. Leading econometricians include
Victor Chernozhukov    http://web.mit.edu/~vchern/www/
https://faculty.fuqua.duke.edu/~abn5/belloni-index.html
Alex Belloni https://faculty.fuqua.duke.edu/~abn5/belloni-index.html
Christian Hansen http://faculty.chicagobooth.edu/christian.hansen/research/
Susan Athey  https://www.gsb.stanford.edu/faculty-research/faculty/susan-athey   https://people.stanford.edu/athey/research
Guido Imbens    https://www.gsb.stanford.edu/faculty-research/faculty/guido-w-imbens  https://people.stanford.edu/imbens/publications

ONLINE COURSES

Coursera has many courses   https://www.coursera.org/browse/data-science/machine-learning?languages=en

SOME ECONOMICS REFERENCES

This is a very active area. The papers listed below were published between 2011 and 2019.


Machine learning prediction in economics

Hal Varian (2014), "Big Data: New Tricks for Econometrics", Journal of Economic Perspectives, Spring, 3-28.
Sendhil Mullainathan and J. Spiess: "Machine Learning: An Applied Econometric Approach", Journal of Economic Perspectives, Spring 2017, 87-106.
Jon Kleinberg, H. Lakkaraju, Jure Leskovec, Jens Ludwig, Sendhil Mullainathan (2018), "Human Decisions and Machine Predictions", Quarterly Journal of Economics, 237-293.

Surveys of causal inference in economics
Susan Athey (2018), "The Impact of Machine Learning on Economics". http://www.nber.org/chapters/c14009.pdf
Susan Athey and Guido Imbens (2019), "Machine Learning Methods Economists Should Know About."
Alex Belloni, Victor Chernozhukov and Christian Hansen (2014), "High-dimensional methods and inference on structural and treatment effects," Journal of Economic Perspectives, Spring, 29-50. 

Causal inference in economics
Alex Belloni, Victor Chernozhukov and Christian Hansen (2011), "Inference Methods for High-Dimensional Sparse Econometric Models," Advances in Economics and Econometrics, ES World Congress 2010, ArXiv 2011.
Alex Belloni, D. Chen, Victor Chernozhukov and Christian Hansen (2012), "Sparse Models and Methods for Optimal Instruments with an Application to Eminent Domain", Econometrica, Vol. 80, 2369-2429.
Alex Belloni, Victor Chernozhukov, Ivan Fernandez-Val and Christian Hansen (2017), "Program Evaluation and Causal Inference with High-Dimensional Data," Econometrica, 233-299.
Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey and James Robins (2018), "Double/debiased machine learning for treatment and structural parameters," The Econometrics Journal, 21, C1-C68.
Max Farrell (2015), "Robust Estimation of Average Treatment Effect with Possibly more Covariates than Observations", Journal of Econometrics, 189, 1-23.
Max Farrell, Tengyuan Liang and Sanjog Misra (2018), "Deep Neural Networks for Estimation and Inference: Application to Causal Effects and Other Semiparametric Estimands," arXiv:1809.09953v2.
Stefan Wager and Susan Athey (2018), "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests," JASA, 1228-1242.

Stata Software
Stata version 16 introduced commands for lasso, ridge, elasticnet and casual inference in the partial linear and related models with exogenous or endogenous regressors.

Python Software
In Spring 2023 I used Python for machine learning at a very introductory level.
Click here for material on getting going with Python and sci-kit learn.

A. Colin Cameron / UC-Davis Economics /  http://www.econ.ucdavis.edu/faculty/cameron