Top 10 Conditional Expectation Interview Questions and Answers

We cant lie -Â Data Science Interviews are TOUGH. Top tech companies ask very hard questions about probability and statistics.

That’s why we put together 40 real chances The answers to all 40 questions are in our book, Ace The Data Science Interview. It also has answers to 161 other questions about SQL, Machine Learning, and Product/Business Sense. You can also practice some of these same exact questions on DataLemurs statistics interview questions section.

Conditional expectation is a key concept in probability theory and statistics that often comes up in quantitative interviews, especially for roles in finance, data science, and machine learning. Knowing how to calculate conditional expectations and interpret them is crucial.

In this article, I’ll go over the top 10 conditional expectation interview questions frequently asked, provide sample answers, and explain the intuition behind each one. Mastering these will ensure you ace the probability and stats portion of your next interview!

1. What is meant by conditional expectation and how is it calculated?

The conditional expectation E(X|Y) refers to the expected value of the random variable X given that another random variable Y has occurred or is known. Intuitively, it tells us the average value we expect X to take based on the specific value or realization of Y.

The general formula for conditional expectation is:

E(X|Y) = ∑_y E(X|Y=y)P(Y=y)

Where E(X|Y=y) is the expected value of X given a particular value y that Y takes, and P(Y=y) is the probability of Y taking that value y

So we calculate the conditional expectation by taking a weighted average of the expected values of X for each possible value of Y, weighted by the probabilities of those Y values occurring.

2. How do you calculate E(X|Y) when X and Y are discrete random variables?

When X and Y are both discrete, we can calculate the conditional expectation directly using the formula:

E(X|Y=y) = ∑_x xP(X=x|Y=y)

Where P(X=x|Y=y) is the conditional probability mass function of X given the value y of Y.

Then substitute this into the general conditional expectation formula:

E(X|Y) = ∑_y [ ∑_x xP(X=x|Y=y) ] P(Y=y)

Which is a double summation over all possible values of X and Y.

For example, if X ~ Binomial(10, 0.3) and Y ~ Binomial(10, 0.5), then:

E(X|Y=5) = 0P(X=0|Y=5) + 1P(X=1|Y=5) + … + 10*P(X=10|Y=5)

And repeat for other values of Y to get E(X|Y).

3. How do you interpret the conditional expectation of one random variable given another?

The conditional expectation E(X|Y) can be interpreted as the best predictor of X given knowledge of Y, or the average value we expect X to take based on the observed value of Y.

Some key properties:

  • E(X|Y) is a function of Y, not a constant. As the value of Y changes, the conditional expectation will change.

  • E(X|Y) is the minimum MSE predictor of X given Y. No other function of Y will give a lower MSE when predicting X.

  • E(X|Y=y) will be equal to E(X) if X and Y are independent. The conditional expectation equals the unconditional expectation if knowing Y provides no additional information about X.

4. If Cov(X,Y) = 0, what can you say about E(X|Y)?

If the covariance between two random variables X and Y is 0, then X and Y are uncorrelated. When X and Y are uncorrelated, the conditional expectation E(X|Y) is simply equal to the unconditional expectation E(X):

E(X|Y) = E(X)

This makes intuitive sense – if X and Y are uncorrelated, then knowing the value of Y provides no additional information about the likely value of X. The best prediction of X is simply its mean, regardless of Y.

So when covariance is zero, the conditional expectation equals the unconditional expectation. This provides a quick way to calculate E(X|Y) when you know X and Y are uncorrelated.

5. How do you calculate the conditional expectation of a normal random variable?

For multivariate normal random variables, the conditional expectation takes on a very specific linear form.

Let X and Y be jointly normal with means μX, μY, variances σX^2, σY^2 , and correlation ρ. Then:

E(X|Y) = μX + ρσX/σY (Y – μY)

The conditional expectation is a linear function of Y, with slope equal to the correlation ρ multiplied by the standard deviation ratio.

Intuitively, as Y increases, our expectation of X will increase linearly if X and Y are positively correlated. The slope represents how strongly X is influenced by movements in Y.

For normal variables, you can quickly write down the conditional expectation using this formula rather than evaluating integrals.

6. Suppose X and Y have a bivariate normal distribution. How do you calculate E(X|Y>k) for some constant k?

When X and Y are bivariate normal, we can use properties of truncated normals to calculate conditional expectations like E(X|Y>k).

Let φ(Y) and Φ(Y) represent the PDF and CDF of a standard normal variable. Then:

E(X|Y>k) = μX + ρσX/σY * [φ(k)/(1 – Φ(k))]

The intuition is we are truncating the joint normal distribution to only where Y > k. This changes the expected value of Y in that truncated region, which then affects E(X|Y) through their correlation ρ.

You can also calculate conditional expectations like E(X|Y<k) or E(X|a<Y<b) using similar logic with truncated normal CDFs and densities. The bivariate normal case allows for many nice closed form conditional expectation solutions.

7. What is the significance of conditional expectation in regression analysis?

In the linear regression model:

Y = β0 + β1X + ε

Where ε is the error term with mean 0, the conditional expectation E(Y|X) represents the regression function – the expected value of Y given X.

Specifically:

E(Y|X) = β0 + β1X

So the conditional expectation as a function of X gives the predicted value or fitted values from the regression. The regression line is the conditional mean.

This demonstrates the close connection between conditional expectation and regression. Regression analysis seeks to estimate E(Y|X) by finding the linear function of X that best predicts Y on average.

8. Suppose {Xt} is a martingale process. What can you say about E(Xt|Xt-1)?

For a martingale process, the conditional expectation of the next value given the current value is just equal to the current value:

E(Xt|Xt-1) = Xt-1

This property characterizes the martingale, and is sometimes used as an alternative definition. It means the best predictor of the next X is just the current X – the new value provides no new information.

Intuitively, the changes in a martingale are unpredictable and centered around 0. Knowing the past doesn’t help predict the future. So the conditional expectation is simply equal to the current level.

9. What is the conditioning property of conditional expectation?

An important property of conditional expectation is:

E(E(X|Y)|Y) = E(X|Y)

That is, the conditional expectation operator commutes with further conditioning. The conditional expectation of X given Y is itself a function of Y. If we take the conditional expectation of that given Y, we get the original E(X|Y).

This conditioning property is very useful for iteratively calculating conditional expectations. We can break them down into steps while preserving equality at each step.

10. How do you calculate E(g(X,Y)|X) for some function g?

For a general function g(X,Y), we can use the law of iterative expectations:

E(g(X,Y)|X) = E(E(g(X,Y)|X,Y)|X)

Break it down into two steps:

  1. Calculate the inner conditional expectation E(g(X,Y)|X,Y). This holds X and Y fixed.

  2. Take the expectation of that given just X.

Intuitively, we first find the conditional expectation for each combination of X and Y values. We then average those results based on the marginal distribution of X.

This allows calculating complex conditional expectations by iteratively conditioning on subsets of the variables.

Solutions To Probability Interview Questions

Problem #1 Solution:

We can use Bayes Theorem here. Let’s call the situation where we flip an unfair coin U and the situation where we flip a fair coin F. Since the coin is chosen randomly, we know that P(U) = P(F) = 0. 5. Let 5T denote the event where we flip 5 heads in a row. Then we are interested in solving for P(U|5T), i. e. , the chance that we are flipping an unfair coin, since we have seen five tails in a row

We know P(5T|U) = 1 since by definition the unfair coin will always result in tails. Additionally, we know that P(5T|F) = 1/2^5 = 1/32 by definition of a fair coin. By Bayes Theorem we have:

[P(U|5T) = frac{P(5T|U) * P(U)}{P(5T|U) * P(U) + P(5T|F) * P(F)} = frac{0. 5}{0. 5 + 0. 5 * 1/32} = 0. 97].

Therefore the probability we picked the unfair coin is about 97%.

Problem #5 Solution:

By definition, a chord is a line segment whereby the two endpoints lie on the circle. Therefore, two arbitrary chords can always be represented by any four points chosen on the circle. If you choose to represent the first chord by two of the four points then you have:

[{4choose2} = 6 ]

pick the two points that will show chord 1 (and the other two points that will show chord 2). However, keep in mind that we are counting each chord twice because a chord with endpoints p1 and p2 is the same as a chord with endpoints p2 and p1. Therefore the proper number of valid chords is:

Among these three configurations, only exactly one of the chords will intersect, hence the desired probability is:

Problem #13 Solution:

Let X be the number of coin flips needed until two heads. Then we want to solve for E[X]. Let H denote a flip that resulted in heads, and T denote a flip that resulted in tails. Note that E[X] can be written in terms of E[X|H] and E[X|T], i. e. the expected number of flips needed, conditioned on a flip being either heads or tails respectively.

Conditioning on the first flip, we have:

[E[X] = frac{1}{2}(1+E[X|H]) + frac{1}{2}(1+E[X|T])]

Keep in mind that E[X|T] = E[X] because we have to start over to get two heads in a row if a tail is flipped.

To find E[X|H], we can make it depend on the next result, which could be heads (HH) or tails (HT).

Therefore, we have:

[E[X|H] = frac{1}{2}(1+E[X|HH]) + frac{1}{2}(1+E[X|HT])]

If the outcome is HH, then E[X|HH] = 0 because the goal was met. If the outcome is HT, then E[X|HT] = E[X] because a tail was flipped, we need to start over.

[E[X|H] = frac{1}{2}(1+0) + frac{1}{2}(1+E[X]) = 1 + frac{1}{2}E[X]]

Plugging this into the original equation yields E[X] = 6 coin flips

Problem #15 Solution:

Consider the first n coins that A flips, versus the n coins that B flips.

There are three possible scenarios:

  • A has more heads than B
  • A and B have an equal amount of heads
  • A has less heads than B

In case 1, A will always win (no matter what coin comes up), and in case 3, A will always lose (no matter what coin comes up). By symmetry, these two scenarios have an equal probability of occurring.

Denote the probability of either scenario as x, and the probability of scenario 2 as y.

We know that 2x + y = 1 since these 3 scenarios are the only possible outcomes. Now let’s consider coin n+1. If the flip results in heads, with probability 0. 5, then A will have won after scenario 2 (which happens with probability y). Therefore, A’s total chances of winning the game are increased by 0. 5y.

Thus, the probability that A will win the game is:

[x + frac{1}{2}y = x + frac{1}{2}(1-2x) = frac{1}{2}]

Problem #18 Solution:

Let B be the event that all n rolls have a value less than or equal to r. Then we have:

since all n rolls must have a value less than or equal to r. Let A be the event that the largest number is r. We have:

[B_r = B_{r-1} cup A_r]

and since the two events on the right hand side are disjoint, we have:

[P(B_r) = P(B_{r-1}) + P(A_r)]

Therefore, the probability of A is given by:

[P(A_r) = P(B_{r}) – P(B_{r-1}) = frac{r^n}{6^n} – frac{(r-1)^n}{6^n}]

20 Statistics Problems Asked By FAANG & Hedge Funds

  • [Facebook – Easy] How would you explain a confidence interval to someone who doesn’t know much about statistics?
  • Let’s say you are doing a multiple linear regression and think there are a number of predictors that are linked. Are they really related? If so, how will that change the regression results? What would you do about this?
  • [Uber – Easy] Describe p-values in layman’s terms.
  • [Easy Facebook] How would you make and test a way to compare two users’ ranked lists of favorite movies and TV shows?
  • [Microsoft – Easy] Explain the statistical background behind power.
  • [Twitter – Easy] Describe A/B testing. What are some common pitfalls?.
  • Google – Medium: How would you use a series of coin flips to get a confidence interval?
  • To use an exponential distribution with parameter λ to model the lifetime for a group of customers, let’s say you have the lifetime history (in months) of n customers. What is your best guess for λ?.
  • Figure out the mean and variance of the U(a, b) uniform distribution.
  • Let’s say that X is uniform (0, 1) and Y is uniform (0, 1). What do you think the minimum of X and Y will be?
  • You pick n times from a uniform distribution [0, d] for Spotify – Medium. What is your best estimate of d?.
  • You pick one number at random from a normally distributed random variable X ~ N(0, 1) once a day. How many days do you think it will be before you get a value of more than 2?
  • [Facebook – Medium] Find the expected value of a random variable that is geometrically distributed.
  • Thousand times, a coin was flipped, and 550 times it came up heads. Do you think the coin is fair? If not, explain.
  • Robinhood – Medium: Let’s say you have n integers from 1 to n and want to make a random arrangement of them. Let i = j be an integer, and let a swap happen when i is in the jth position and the other way around. How much do you think the total number of swaps will be worth?
  • [Uber – Hard] What is the mathematical difference between MLE and MAP?
  • If you know the means and standard deviations of two parts of a dataset, how do you find the third? How do you find the average and standard deviation of the whole dataset? Is it possible to do this for K subsets?
  • [Lyft – Hard] How do you pick a point at random from a circle with a radius of 1?
  • [Two Sigma – Hard] Let’s say you keep taking samples from some i. d. random variables with a range of 0 to 1 until the sum of the variables is greater than 1. How many times do you expect to sample?.
  • If you have a random Bernoulli trial generator, how do you get back a value that comes from a normal distribution?

Regression | Conditional Expectation | Data Scientist/ Quant Interview Question

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *