books data scientists should read

Five books every data scientist should read

Posted by

Whether you’re starting off as a data scientist or you’ve been in the industry for decades, it’s ideal to be doing additional reading in parallel with your work or studying. We’ve rounded up the five books data scientists should read.

Before we list the books, just a quick recap: data science involves the analysis of data for actionable insights on anything from product development to customer retention to new business opportunities.

And now here are the books data scientists should read.


  1. Data Science from Scratch by Joel Grus

books data scientists should read

This book covers many of the most fundamental data science tools and algorithms by implementing them from scratch. You’ll need some programming skills and an aptitude for maths to get to grips with the maths and statistics at the heart of data science.

Some things covered include:

  • A crash course in Python
  • The basics of linear algebra, statistics, and probability— as well as understanding how and when they’re used in data science
  • The collecting, exploring and manipulating of data.
  • Machine learning

A newer edition of this book is coming out in June 2019.


  1. Storytelling with Data: A Data Visualization Guide for Business Professionals by Kole Nussbaumer Knaflic

books data scientists should read

This is a crucial read for anyone in the data science industry. The book is basically about the organisation and extraction of vast quantities of data. This comes down to looking at how to get rid of unclear and excess data, improving data collection processes and presenting data, or data visualization, in relevant and practical ways. It’s the guide to what you should do with the useful data you’ve collected and how you should do this.

Some of the things covered in the book include:

  • Understand the importance of context and audience
  • Determine the appropriate type of graph for your situation
  • Recognize and eliminate the clutter clouding your information
  • Direct your audience’s attention to the most important parts of your data
  • Think like a designer and utilize concepts of design in data visualization
  • Leverage the power of storytelling to help your message resonate with your audience


  1. Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurélien Géron

books data scientists should read

Data scientists use machine learning – it’s one of their fundamental techniques. It’s a way to conduct an in-depth analysis of your data by creating a predictive model. This is a great starting-out book, because it assumes you know almost nothing about machine learning. The goal of the book is to supply the concepts, intuitions and tools needed to implement programs capable of learning from data. The techniques covered range from the simplest to the most commonly used (e.g. linear regression) to some of the Deep Learning techniques. The book uses actual production-ready Python frameworks.

Briefly, Scikit Learn is a great entry point to master machine learning. TensorFlow is more complicated and uses data flow graphs to train and run very large, distributed neural networks.


  1. Think Python by Allen Downey

books data scientists should read

Python is still the leading language for data science. If you’re starting out with Python, this is an excellent first step. This hands-on guide takes you through this important language one step at a time. You’ll start with basic programming concepts and ultimately move on to functions, recursion, data structures and object-oriented design. Every few chapters the book ties the key concepts together and includes relevant case studies.  It is aimed at students at college level, as well as self-learners and professionals who need to learn the programming basics.


  1. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Hadley Wickham and Garrett Grolemund

books data scientists should read

R is another top programming language for data science. In fact, R is growing at a similar rate to Python, though from a smaller initial base, claims a Stack Overflow blog post. A recent poll in the data science community found that 52,1% of programmers use R versus 52,6% who use Python.

This book will introduce you to R, RStudio and the tidyverse, a collection of R packages which work together to make data science fast and fluent. It’s suitable for readers with no previous programming experience who want to get R-fit as quickly as possible.

You’ll learn the following:

  • Wrangle—transform your datasets into a form convenient for analysis
  • Program—learn powerful R tools for solving data problems with greater clarity and ease
  • Explore—examine your data, generate hypotheses, and quickly test them
  • Model—provide a low-dimensional summary that captures true “signals” in your dataset
  • Communicate—learn R Markdown for integrating prose, code, and results


This article has covered five books data scientists should read. If you’re keen to equip yourself with valuable data science skills and tools required to think like a data scientist, pre-register for HyperionDev’s brand new data science bootcamp today. You’ll learn about machine learning, data analysis and bioinformatics to prepare you for your career as a data scientist.