Recommended reading



Introductory statistics

Practical Statistics for Medical Research
Doug Altman

The best all-round introduction to statistics, even if you have no intention of working with medical data. The emphasis is on practical tips for managing the data and understanding why you would choose a particular analytical method. It's quite old now, but the basics haven't changed. The only drawback for me is that it pre-dates the GAISE and Cobbist movement.

Statistics: unlocking the power of data
Lock, Lock, Lock, Lock & Lock

Another really good introduction, with lots of real-life data throughout that you can play with to really understand what different methods are doing. This is the first textbook to take the GAISE guidelines' approach to inference through simulation. The accompanying website StatKey is a masterpiece of online simulation-based teaching resource.

Applied Statistics: principles and examples
David Cox & Joyce Snell

It is worth tracking down a copy of this book, which is still outstanding more than 30 years after publication. The first third of the book is about principles, the remainder is the really unique aspect: a set of 24 detailed examples, where the authors analyse real-life data, and explain all the thought processes that go into it. I don't know of any other book or website that will give you an over-the-shoulder view of great statisticians at work. The intervening years have not supplanted the practical methods that they employ.

Exploratory Data Analysis
John Tukey

This epoch-defining, idiosyncratic book from the 1970s remains quite unique as a guide to thinking about data, to devising simple tools to explore them, and to keeping your feet on the ground as things get more complex. It is out of print and second-hand copies tend to be expensive, but you might be able to order a copy into your library. It will be worth the wait!

Introduction to Statistical Investigations
Nathan Tintle et al

A new textbook from highly experienced and forward-thinking stats educators at the heart of the GAISE movement.

Medical applications and study designs

Essential Medical Statistics
Betty Kirkwood & Jonathan Sterne

A more epidemiologically-focussed and mathematical introduction than Altman, but useful for those working with observational study designs in biomedical applications who don't mind a dose of algebra.

Modern Epidemiology
Kenneth Rothman, Sander Greenland & Timothy Lash

The ultimate companion to all observational study designs, their conduct and analysis. This book contains many subtle details on real-life scenarios that are ignored in other textbooks on epidemiological research / observational studies.

Clinical Trials: a practical approach
Stuart Pocock

A classic (and not outdated) short overview of all aspects of clinical trials: design, randomisation, monitoring, ethics, analysis and many common pitfalls along the way.

Data science and machine learning

Computer-Age Statistical Inference
Brad Efron & Trevor Hastie

A bold attempt to bring classical (frequentist and Fisherian), Bayesian and machine learning methods and theory into one book. It largely succeeds! The authors are not shy of algebra and calculus, but as with EoSL (see right), do not use it unless it illuminates some important bit of deeper understanding for the reader. A very fine chapter on neural networks and deep learning.

Elements of Statistical Learning: data mining, prediction and inference
Trevor Hastie, Rob Tibshirani & Jerome Friedman

An excellent introduction to machine learning / data mining methods and ways of thinking about problems. Comprehensive and logical without being weighed down by theorem-lemma writing. If you have learnt about statistics, this is a natural second step to gain familiarity with – and confidence to use – machine learning methods such as random forests and deep learning. And this entire book is free online!

Professional practice

Past, Present and Future of Statistical Science
various artists

A collection of essays by former winners of COPSS awards. Their insights into what it means to be a statistician are invaluable. There are many words of advice for young people, never patronising, often inspiring. I have a book review of it appearing soon in Significance (April 2017).

Keynote talk at Interaction '15
Mike Monteiro

Monteiro is a designer, and this is a talk about the real-world challenges of being a designer. Not which pencil to pick, but how to make money and not compromise your principles: how to be a professional. If you mentally replace designer with statistician or data scientist, you will learn a lot from this talk. I highly recommend it. There's a lot of swearing, by the way.

Deep Work: Rules for Focused Success in a Distracted World
Cal Newport

Doing data analysis means long periods of work with high levels of concentration. Many workplaces don't get this and expect instant replies to e-mails and such. At the same time, you can distract yourself nowadays more than ever before. You will find this a serious and much under-rated challenge. I took several pragmatic ideas from this book and practice them daily.

Data visualisation

Design For Information
Isabel Meirelles

A unique overview of forms, design, perception and technology, beautifully illustrated throughout. A must for the budding datavizzer.

Interactive Data Visualization for the Web
Scott Murray

Certainly the best single introduction to interactive graphics using D3, this book assumes no prior web coding knowledge (but see below for some JavaScript tips).

Storytelling With Data
Cole Nussbaumer Knaflic

A greast all-round introduction and overview, with lots on perception. More trad than Meirelles, easier for the newcomer but maybe not stretching enough for the old hand. Worth reading though.

Exploratory Data Analysis
John Tukey

It's that book again. EDA gets mentioned in two places because it is both a great guide to thinking analytically but also to visualising for analysis and inventing new techniques that are simple, quick but informative too.

Bayesian methods

Bayesian Data Analysis
Andrew Gelman et al

A comprehensive and deep account of Bayesian theory and practice in the early 21st century. Insightful and practical. As Tukey's book says EDA down the spine, this says BDA, and it is sufficiently masterful for that not to be pretentious in the least. You need this book, maybe not now, but soon.

The BUGS Book
David Lunn at al

An approachable and practical little book that runs through what you need to know about BUGS software, Bayesian modeling, and then gives loads of code for loads of models. Really, you should be using Stan for most of these, but this is still worth having and studying closely.

Handbook of Markov Chain Monte Carlo
ed. Brooks, Gelman, Jones & Meng

The ideal reference for more technical, methodological users of Bayesian methods and MCMC simulation in general. Clearly written throughout, without waffle.

Programming and software for data analysis

The R Primer
Claus Thorn Ekstrøm

A short introduction to R, focussing on getting practical results rather than programming niceties

The R Inferno
Pat Burns

A programming book like no other. You'll laugh, you'll cry, you'll burn in hell ... but you'll also become a better R programmer. This would be especially useful for those unfamiliar with functional programming and vector-based operations, like if you've only ever learnt a little C++. It's also free online!

A Short Introduction To Stata For Biostatistics
Michael Hills & Bianca de Stavola

A short and practical introduction to Stata, focussing on techniques for biomedical research but accessible to all; the programming language is used throughout rather than the drop-down menus

Multilevel and Longitudinal Modeling Using Stata
Sophia Rabe-Hesketh & Anders Skrondal

I've said on several occasions that it is worth learning Stata just so you can learn multilevel models from this book – and I am not joking. I've never encountered such a clear, thorough but totally practical introduction to these essential modeling techniques (and I've tried a lot). Now in two volumes and worth keeping up to date because a lot of new Stata commands have come in for these models.

Eloquent JavaScript
Marijn Haverbeke

A quite brilliantly written introduction to JavaScript that soon gets you writing like the cool kids: functionally. If you don't know what that means, don't worry about it, just know that you will stand out with your intelligent questions at the next hip JS meetup. And it's free online!

Learning Python
Mark Lutz

This is how I (and countless others) learnt some Python, which is pretty essential if you are to pass as a data scientist in the early 21st century. Lutz draws the venom out of those fangs. You'll soon be bitten (that's enough - ed.)

Introductory Statistics For Health & Nursing Using SPSS
Louise Marston

A short introduction to SPSS with a good quantity of screenshots, especially aimed at healthcare applications and taking the reader up to two-group comparisons and correlations.

How To Use SPSS Syntax
Manfred te Grotenhuis & Chris Visscher

A gentle introduction to programming SPSS through its "syntax". An ideal second step after getting to grips with the GUI.

Philosophy of science, and other odds and ends

Inference To The Best Explanation
Peter Lipton

I return to this book over and over, always finding more to think about. It really informs how I use data to learn about the world.

Public Policy In An Uncertain World
Charles Manski

A compelling argument to acknowledge uncertainty and reject the trappings of certainty. Lots of good examples, lots of practical ideas. The author knows this field inside-out.

Matrix algebra: theory, computations, and applications in statistics
James Gentle

To be a serious stats master, you have to grapple with matrix algebra. This is the only book I've come across that introduces the subject with data science in mind. It makes the whole subject much more approachable and memorable. Gentle by name, gentle by nature.