40 Questions on Probability for data science

Probability is the foundation of the statistical analysis used to examine big data. Data scientists need a deep understanding of the methods and uses of probability to complete the complex data study they perform in the workplace. You can use probability questions to practice solving real-world statistical problems before you interview for a job as a data scientist.

In this article, we provide examples of the types of probability questions you may be asked to prove your knowledge of this mathematical concept when interviewing for a position in data science.

Probability & Statistics Concepts To Review Before Your Data Science Interview

The beginnings of probability start with thinking about sample spaces, basic counting and combinatorial principles. Although it is not necessary to know all of the ins-and-outs of combinatorics, it is helpful to understand the basics for simplifying problems. One classic example here is the “stars and bars” counting method.

The other core topic to study is random variables. Knowing concepts related to expectation, variance, covariance, along with the basic probability distributions is crucial.

Probability Distributions

For modeling random variables, knowing the basics of various probability distributions is essential. Understanding both discrete and continuous examples, combined with expectations and variances, is crucial. The most common distributions discussed in interviews are the Uniform and Normal but there are plenty of other well-known distributions for particular use cases (Poisson, Binomial, Geometric).

Most of the time knowing the basics and their applications should suffice. For example, which distribution would flipping a coin be under? What about waiting for an event? It never hurts being able to do the derivations for expectation, variance, or other higher moments.

Conceptual Probability Questions

These probability questions are designed to test your conceptual knowledge of probability theory. You could be quizzed on the types of distribution, asked to explain the Central Limit Theorem, or to describe the application of Bayes’ Theorem. The key to this type of question is not only to demonstrate knowledge of formal probability theory but also the ability to communicate this knowledge to a layperson.

The Bernoulli distribution models the event of conducting one trial of an experiment with only two outcomes, while the binomial distribution models conducting n many trials.

2. Explain how a probability distribution could be not normal and give an example scenario.

A probability distribution is not normal if most of its observations do not cluster around the mean, forming the bell curve. An example of a non-normal probability distribution is a uniform distribution, in which all values are equally likely to occur within a given range.

3. What is Bayes’ Rule?

data science probability interview questions

4. What is the difference between covariance and correlation? Provide an example.

Covariance can take on any numeric value, while correlation can only take on values between -1 (strong inverse correlation) and 1 (strong direct correlation). Therefore, the relationship between two variables can have a covariance that seems high, but only a middling correlation value.

5. What is the difference between the Central Limit Theorem and the Law of Large Numbers?

The Law of Large Numbers says that a sample mean is an unbiased estimator for the population mean and that the error of that mean decreases as the sample size grows, while the Central Limit Theorem states that as a sample size n becomes large, its distribution can be approximated by the normal distribution.

6. What is an unbiased estimator? Give an example for a layperson.

An unbiased estimator is an accurate statistic that is used to approximate a population parameter. An example would be taking a sample of 1000 voters in a political poll to estimate the total voting population. There is no such thing as a perfectly unbiased estimator.

Review more probability concepts in the first chapter of our probability course.

Probability Case Study Questions

In this type of probability question, you’ll be given an example scenario and asked to use the given information to calculate a probability. One example might be:

7. You are playing a game with a friend to see who can roll a six on a six-sided die first. You roll first. What’s the probability that you win the game?

This question is testing your ability to apply formal probability knowledge to real-world scenarios. While it’s sometimes possible to brute-force questions like these by modeling all of the different possible outcomes of the scenario, most interviewers won’t be satisfied with that response. They’re looking for you to identify the underlying patterns within the problem and match the “correct,” or most elegant, probability concept to the problem in order to solve it.

8. Let’s say the probability that a specific item X is at location A is 0.6 and the probability that it is at location B is 0.8. What is the probability that item X would be found in locations A or B?

Lets define our probabilities:

P(Item at location A) = P(A) = 0.6

P(Item at location B) = P(B) = 0.8

We want the probability that item X is on the website in this city. That can be defined from the question as the probability that item X is at location A or location B. Given our events are not mutually exclusive, we can represent this probability in equation form: P(A or B) = P(AUB)

9. Imagine a deck of 500 cards numbered from 1 to 500. If all the cards are shuffled randomly and you are asked to pick three cards, one at a time, whats the probability of each subsequent card being larger than the previously drawn card?

Imagine this as a sample space problem, ignoring all other distracting details. If someone randomly picks three differently numbered unique cards without replacement, then we can assume that there will be a lowest card, a middle card, and a high card.

Lets make this easy and assume we drew the numbers 1, 2, and 3. In our scenario, if we drew (1,2,3) in that exact order, then that would be the winning scenario.

But whats the full range of outcomes we could draw?

10. Lets say you have a function that outputs a random integer between a minimum value, N, and maximum value, M. Now lets say we take the output from that function and make it the max value of another random number generator with the same min value N. What would the distribution of the samples look like? What would the expected value of the second function be?

Let X be the result of the first run and Y the result of the second run. Since the integer output is “random” and no additional information is given, we can assume all integers between and including N and M have an equal shot at being selected. Thus, X and Y are discrete uniform random variables with bounds N & M and N & X respectively.

11. Three zebras are sitting on each corner of an equilateral triangle. Each zebra randomly picks a direction and only runs along the outline of the triangle to either opposite edge of the triangle. What is the probability that none of the zebras collide?

Lets imagine all of the zebras on an equilateral triangle. They each have two options of directions to go in if they are running along the outline to either edge. Given the case is random, lets compute the possibilities in which they fail to collide.

There are only really two possibilities. The zebras will either all choose to run in a clockwise direction or a counter-clockwise direction.

Lets calculate the probabilities of each. The probability that every zebra will choose to go clockwise will be the product of each zebra choosing the clockwise direction. Given there are two choices (counterclockwise or clockwise), that would be 1/2 * 1/2 * 1/2 = 1/8

The probability of every zebra going counter-clockwise is the same at 1/8. Therefore, if we sum up the probabilities, we get the correct probability of 1/4 or 25%.

12. You call 3 random friends of yours who live in Seattle and ask each independently if its raining. Each of your friends has a 2/3 chance of telling you the truth and a 1/3 chance of messing with you by lying. All 3 friends tell you that “Yes” it is raining. What is the probability that its actually raining in Seattle?

Interpreting the direct result of the Frequentist approach, if you repeated the trials with your friends, there’s one event in which all three of your friends lied within those 27 trials.

However, since your friends gave the same answer, you’re not actually interested in all 27 of those trials, as that would include events where your friends had differing answers.

13. You flip a fair coin 576 times. Without using a calculator, calculate the probability of flipping at least 312 heads.

This question requires some memorization. At first glance, we can infer that its a binomial distribution problem, given that we have to guess the number of heads out of a number of trials. Therefore, we’ll use a binomial distribution with n trials and probability of success p on each trial.

The expected number of heads for a binomial distribution is the probability of a success (a fair coin has a 0.5 probability of landing heads or tails) multiplied by the total number of trials (576). So 288 is the expected number of times that our coin flips will turn up heads.

Then, you would have to remember that the standard deviation of the binomial distribution is sqrt(n*p*(1-p)).

14. Youre given a fair coin. You flip the coin until either Heads Heads Tails (HHT) or Heads Tails Tails (HTT) appears. Is one more likely to appear first? If so, which one and with what probability?

Okay, given the two scenarios, we can assess that both sequences need H first. Once H appears, the probability of HHT is now equivalent to 1/2.

Why is this the case? Because in this scenario, all you need for HHT is one H. The coin does not reset as we are flipping the coin continuously in sequence until we see the string of HHT or HTT happening in a row. Given that the first letter starts with H, this increases the chances of HHT occurring versus HTT.

Introduction

Statistical methods are key to find the answers to the questions which we aim to acquire from the data and they are the pillars to all the machine learning approaches.

In this article, I have curated a list of 25 Questions related to Statistics and Probability in Data Science.

General questions

Interviews often begin or end with general questions. Here are some sample questions that an interviewer may ask you about generic topics as you apply for a position in data science:

  • Why are you interested in this position?

  • What made you choose this line of work?

  • Can you tell me about a time you overcame a challenge?

  • What do you do if you fail at a task?

  • What three words would you use to describe yourself?

  • Where do you see yourself in the next five years?

  • What qualities do you admire most in a coworker?

  • Can you describe a time when you took on a leadership role?

  • Who do you consider your mentor?

  • Who do you admire and why?

  • What is your idea of a great work environment?

  • How would you explain your job to somebody who isnt in your field?

  • How do you handle critique?

  • Do you have any questions for me?

  • Related Posts

    Leave a Reply

    Your email address will not be published. Required fields are marked *