A collection of job hunt resources for data science roles

6 minute read

Published:

This is a collection of resources that I have found useful in my job hunt for data science roles (both new grad and internships).

Books

  • Introduction to Probability by Joseph K. Blitzstein and Jessica Hwang
  • Mathematical Statistics and Data Analysis by John A. Rice
  • Statistical Inference by George Casella and Roger L. Berger
  • An Introduction to Statistical Learning with Applications in R/Python by Gareth James and others
  • The Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani, and Jerome Friedman
  • Probabilistic Machine Learning series by Kevin P. Murphy
  • Forecasting: Principles and Practice by Rob J. Hyndman and George Athanasopoulos
  • Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow by Aurélien Géron
  • Learning Spark by Jules S. Damji, Brooke Wenig, Tathagata Das, and Denny Lee
  • Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville
  • Ace the Data Science Interview by Kevin Huo and Nick Singh
    • Definitely the best for interview prep. The exercises are awesome and I personally encountered a couple of them in interviews.
  • A Practical Guide to Quantitative Finance Interviews by Xinfeng Zhou
  • Heard on the Street: Quantitative Questions from Wall Street Job Interviews by Timothy F. Crack
    • This one and Zhou’s book are more geared towards quant roles, but the chapters on probability and statistics are still relevant for data science roles. The problems are also just fun to work through.

Prep Websites

  • LeetCode
    • De-facto standard for DSA questions. Also useful for SQL/Pandas problems.
  • HackerRank
    • Practice problems are okay, but the focus should be on getting familiar with the platform (e.g. parsing verbose problem statements, debugging code with custom test cases, working with the wonky integrated JupyterLab IDE for data science problems, general time management, etc.). Almost all of my online assessments were through HackerRank.
  • CodeSignal
    • The exercises on CodeSignal are honestly pretty bad, but I highly recommend getting familiar with the platform. You can also find technical briefs for their assessment frameworks like the “Data Science Skills Evaluation Framework” floating around online as PDFs which go into more detail about the types of questions you might see.
    • Unfortunately, it’s kind of hard to simulate the CodeSignal environment without actually doing a CodeSignal assessment which you can only get when a company invites you to take one. I had to tank a few before I got the hang of it. For the Data Science Framework, brush up on probability, ML fundamentals, and data manipulation and simple modeling in Python (e.g. Pandas, NumPy, scikit-learn)—it’s a massive time crunch. For the General Coding Assessment, answer the questions in the order 1 -> 2 -> 4 -> 3; 1 and 2 should be freebies to be done in a couple minutes total, 3 is implementation-heavy, and 4 requires significantly more thought (usually involves a dictionary/hashmap trick to avoid TLE) but is also worth the most points.
  • Sean Prashad’s LeetCode patterns
  • NeetCode
    • Both NeetCode and Sean Prashad’s list are just collections of LeetCode problems. However, they are organized by topic and difficulty very effectively, which is helpful for targeted practice and gaining intuition for common patterns. I found using the NeetCode 150 list helped me the most when I was first learning data structures and algorithms, and used the Blind 75 list to refresh in subsequent years. If you can get to the point where you can solve most mediums comfortably, you should be fine. For data science roles, dynamic programming will never show up for most companies, but I have gotten DP questions in online assessments before at hedge funds.
  • DataLemur
    • Super useful for practicing SQL problems.
  • StrataScratch
    • Another good resource for SQL problems. They also have a couple data challenges at the free tier that are good practice for take-home assignments.

YouTube Channels

  • StatQuest with Josh Starmer
    • Lowkey a bit childish and hand-wavy, but the explanations are very clear and concise. You can get a good high-level understanding of a lot of topics from this channel in a short amount of time.
  • 3Blue1Brown

Blogs

  • Towards Data Science
    • There is an incredible amount of garbage articles floating around, but there is some gold if you look hard enough.
  • Substack
    • I don’t have any specific recommendations, but there are a lot of various data science newsletters and the content is usually pretty high quality unlike Towards Data Science and Medium.
  • Yuan Meng’s website

Job Boards

  • LinkedIn
  • Simplify
    • They also have a Chrome extension that automates a lot of the application process. Do yourself a favor and use it, it will save you a lot of time.
  • Handshake
  • Jobright / newgrad-jobs
  • SimplyHired
  • RippleMatch
    • Super hit or miss, I personally didn’t have any luck with it.
  • WayUp
  • Glassdoor
  • Indeed
  • ZipRecruiter
  • Directly on company websites
  • School career portals
    • Depending on your school, there may be opportunities that are exclusive to students at your school or a certain group of schools and are not publicly posted.
  • Various GitHub repos
    • These are good for organizing a lot of the job postings that are floating around on the internet. However, there’s usually a lag between when a job is posted and when it gets added to these repos, so I would recommend using these in conjunction with the other job boards. You ideally want to apply to a job within the first 24 hours of it being posted, so as much of a pain as it is, you should be checking all the job boards every few hours.

Miscellaneous

  • Jake’s resume template
    • I used this template for my resume and it’s very clean and professional. If you know LaTeX, you have a lot of flexibility to customize it to your liking (e.g. spacing, font size, etc.).
  • Kaggle
    • Cool datasets to play around with and if you look hard enough, you can find some kernels (notebooks) that are very informative with respect to EDA, feature engineering, different cross-validation strategies, more advanced modeling techniques like stacking and blending, etc.
  • arXiv, NeurIPS, ICML, ICLR, ACL, CVPR, etc.
    • Reading papers is a good way to stay up-to-date with the latest research and trends in the field. It’s also good practice for understanding the nuances of different models and algorithms from a theoretical perspective by working through the math.
  • Reddit
    • r/datascience, r/cscareerquestions, r/csMajors, etc.
    • There are some decent resources, advice, and discussions on these subreddits. However, there is also a lot of low quality content, so take everything with a grain of salt. There are also a lot of negative people on these subreddits, so I would recommend not spending too much time on them for your mental health.
  • Your peers