The other day I went through the list of books I have read over the last half a year. And I noticed that most popular science books on that list were somewhat related to data science – in one way or the other. I found it fascinating how many different subjects and areas are related to what we do every day as data analysts and scientists. Therefore, I want to take the opportunity and quickly review some of the books, give my opinion on why those books matter for us as data analysts, and maybe give you some inspiration about what to read next.
1. The psychological aspect of data science
I start with two books that are originally from the field of psychology and economics. The authors of both books have won the Nobel prize for economics and both have in common that they change the way we think about statistics and data.
Daniel Kahneman's “Thinking, fast and slow” is a modern classic. He gives a broad overview of his research into how people reason, judge and make decisions. By now, many have heard of the two different modes of thinking: “system 1”, which is instinctive and fast and “system 2” which is slower and more rational. It is fascinating, probably any of the other books in this list refers to Kahneman in some way or another. That just shows how influential his ideas and this book is. For me the book is essential in a data science curriculum in two ways: Firstly, a lot of data is generated by humans and it is, therefore, important to understand how some of the data is generated and what biases could be within. Secondly, the book reflects on how we think about statistics. Kahneman claims that statistical thinking is not natural to most people – not even statisticians. This is a good reminder to be careful when communicating results.
The second book “Nudge – Improving Decisions About Health, Wealth, and Happiness” by Richard Thaler is about behavioral economics. The goal of nudging is to change the behavior of people without imposing explicit restrictions or bans. The concept has become rather influential in different areas of marketing to health promotion. In Germany, the concept is very controversial, and most commentators criticise it as paternalistic. The authors themselves call nudging a form of “libertarian paternalism”. I found the book really interesting for another reason: Data Science gives us the opportunity to understand the effects of the choice architecture in so many parts of life. This can be used for good or for bad, but the potential damage of thinking about choice architecture and designing them deliberately are smaller than not thinking about it all.
2. The societal aspect of data science
This brings me to the second batch of books I have read about the relationship between digitalization, big data, and society.
I started with Jaron Lanier’s “Who owns the future”. Even though it is already a couple of years old, it is still a valuable reflection of the relationship between emergent technologies and our economies. His main argument is that the platform economy is disenfranchising the middle class. Platform services are built on our data and we are giving it to them for free (more or less). At the end of the book, he proposes some kind of micro-payment system which he believes could fix the system. I enjoyed reading the book, especially because it looks at the emergence of tech companies and their business models not only in technological terms but foremost as a macroeconomic challenge. I think it is crucial that we move beyond this fixation on the technical aspects to a more holistic understanding of how our societies change in the age of digitalization.
A book that does this even more so is Thomas L. Friedman’s “Thank you for being late – An optimists guide to thriving in the age of accelerations”. Friedman takes on a truly holistic perspective. If you want to know what GitHub and MapReduce have to do with the Syrian refugee crisis and climate change, this book is for you! Friedman describes several recent developments, connects them and puts them into perspective. This book was probably the clearest account of how big data is reshaping our economies and societies. What I particularly like is that Friedman ends his book with actual, pragmatic policy proposals (and not some idealistic but far-fetched ideas like Jaron Lanier).
A book taking a completely different and very critical perspective is Steffen Mau’s “Das metrisch Wir – Die Quantifizierung des Sozialen” (The metric society). Steffen Mau is a professor for macro-sociology in Berlin and his book is a harsh criticism of recent trends in “quantifying everything” – what he calls sociometrics. He discusses rankings, votes, and recommendation systems against the backdrop of the implicit sociological effects and epistemological foundations. The book is by far the most critical and pessimistic book I have read (which was actually the reason I wanted to read it in the first place), but it didn’t offer too many new insights. The examples were discussed extremely one-sided and most of the topics were covered at great length before: Do we really need another book explaining that Facebook’s “Like Button” is eroding our social interactions? And aren’t measurement problems in surveys discussed in social science literature for decades? On the meta-level Mau sees the quantification of everything as an expression of our new neo-liberal society. This criticism is valid without a doubt – but reading a book by Colin Crouch is probably more fruitful in that domain.
3. The epistemological aspect of data science
This brings me to the last two books. Both are more directly related to data science.
The first one is The Signal and the Noise - Why So Many Predictions Fail-but Some Don't by Nate Silver. It is a beautiful introduction to the art and science of prediction – one of the key goals of data science. Nate Silver, who is most popular for his blog fivethirtyeight.com has such a deep knowledge and enthusiasm for scientific predictions which he conveys in a clear and inspiring way while being agnostic about the limitations and problems. Silver also reflects on the epistemological foundation of statistical forecasting and predictions. The second half of the book is a great introduction to the Bayesian epistemology. I loved the book so much that I read it twice, and it then inspired me to do a TED Talk about that matter (find the Talk here). Must read!
The last book I read was Pedro Domingos’ The Master Algorithm - How the Quest for the Ultimate Learning Machine Will Remake Our World. Bill Gates called it “one the most important book about machine learning”, so I bought it. It was indeed a very different introduction to machine learning. Domingos introduces the different epistemic communities and thought processes behind different kind of machine learning techniques. I was rather surprised to learn that decision trees, support vector machines, deep neural networks and Bayesian learning techniques were quite separate in the communities that developed them over the last decades and that the philosophical ideas about “learning” and the world, in general, are very different. That was the good part of the book. In the end, Domingos explains his quest to combine all different learners to create what he calls “the master algorithm”. While Steffen Mau’s book was too negative and too pessimistic, this book is probably too optimistic and uncritical of the developments in AI.
I hope you made it to the end of the list! And I hope you found something interesting and new on this list. Data Science is not just about data science. I think it is crucial that we as data enthusiasts keep in mind that what we do transcends just “working with data”. On the other side, it is more important than ever to communicate what we do – for example by writing popular science books. Have you read some of the books and have a different opinion? Was there a book you really enjoyed reading about data science? And where are all the female authors writing about this topic? If you have any suggestions, please let me know on Twitter @jj_mllr.