In E1Q11, you computed the MLE using a single sample.
We have two biased coins with probability \(p_1\) and \(p_2\). We are goining to throw \(N\) coin tosses, starting with the first coin and after \(K\) coin tosess we switch to the second coin (where \(K\) is chosen at the beginning).
In the lecture notes, you derived the MLE for the Binomial distribution when \(p\) is unknown. Now, we will investigate the case where \(n\) is unknown.
Suppose that \(k\) is a realisation of a random sample from a binomial distribution with parameters \((n, p)\), where \(p\) is known and \(n\) is unknown. (For example, this could correspond to flipping a coin with a known probability, but not knowing how many times the coin was flipped).
Show that
\[\frac{\mathrm{lik}(n+1)}{\mathrm{lik}(n)} \geq 1 \Rightarrow n \leq \frac{k}{p} - 1\]In Exercise 1.1.2 of the lecture notes, you used the plug-in principle also known as the invariance principle.
Are the MLE estimators biased?
Download the dataset containing the total expences per MP. Using the analysis of E1Q5, compare the average expenses for MPs in London and MPs out of London. Do the assumptions apply here?
(optional) Perform a similar analysis for the average expenses between parties (or among genders). (You will need to find another dataset that maps MPs to their parties).
(+) The unigram model in natural language processing models the probability of a sentence as \(s\) as \(\mathbb{P}(s) = p_{s_1} \cdot p_{s_2} \cdot \ldots \cdot p_{s_n}\) where \(s_1, \ldots, s_n\) are the \(n\) words of the sentence. Given \(M\) sentences \(s^1, \ldots , s^M\), show that the MLE for the parameters \(p_{w}\) are \(\frac{c_w}{W}\), where \(c_w\) is the number of times word \(w\) occurs in any sentence and \(W\) is the total number of words in all sentences.
Hint: This question is essentially asking you for the MLE of the multinomial distribution.
A very inexperienced archer shoots an arrow \(n\) times at a disc of (unknown) radius \(\theta\). The disc is hit every time, but at completely random places. Let \(r_1, \ldots, r_n\) be the distances of the various hits to the center of the disck.
Suppose that \(x_1, \ldots, x_n\) is a dataset, which is a realisation of a random sample from a Rayleigh distribution, which is the continuous distribution with pdf \(f_\theta(x) = \frac{x}{\theta^2}\exp{\lbrace -\frac{1}{2} \frac{x^2}{\theta^2} \rbrace}\) for \(x\geq 0\). Determine the maximum likelihood estimator for \(\theta\).
Suppose that \(x_1, \ldots, x_n\) is a dataset, which is a realisation of a random sample from a distribution with pdf
\[f_\theta(x) = \frac{\theta}{(x + 1)^{\theta + 1}} \text{ for } x > 0\]Determine the MLE for \(\theta\).
Suppose that \(x_1, \ldots, x_n\) is a dataset, which is a realisation of a random sample from a distribution with pdf
\[f_\theta(x) = \begin{cases} e^{\theta - x} & \text{for }\ x > \theta \\ 0 & \text{for }\ x \leq \theta \end{cases}\](+) Suppose that \(x_1, \ldots, x_n\) is a dataset, which is a realisation of a random sample from a distribution with pdf
\[f_\theta(x) = \frac{1}{2} e^{-\lvert x - \theta \rvert} \text{ for }-\infty < x < \infty\]Determine the maximum likelihood estimator for \(\theta\).
Suppose that \(x_1, \ldots, x_n\) is a dataset, which is a realisation of a random sample from a distribution with pdf
\[f_{\mu, \lambda}(x) = \left( \frac{\lambda}{2\pi x^3} \right)^{1/2} \exp{\lbrace - \lambda (x - \mu)^2 / (2\mu^2 x)\rbrace} \text{ for } x>0\]Determine the maximum likelihood estimator for \(\mu\) and \(\lambda\).
(+) Create your own MLE exercise.
Exercise [Linear transformation]: Consider r.v. \(X\) and \(Y = aX + b\) for constants \(a\) (\(a \neq 0\)) and \(b\). Find the cdf of \(Y\) in terms of the cdf of \(X\).
Exercise [Constant transformation]: Consider r.v. \(X\) with cdf \(F_X\) and the transformation \(Y = f(X) = a\) where \(a\) is a constant. Find the pdf \(Y\)?
Exercise [Cubic transformation]: Consider r.v. \(X \sim U[0, 1]\) and \(Y = X^3\). Find the cdf and pdf for \(X\).
Exercise: Redo exercise 1 from Example Sheet 1.
Exercise [Sine transformation]: Suppose \(X \sim U[0, 2\pi]\). Consider \(Y = \sin^2(X)\). Find the cdf for \(Y\).
Exercise [Square transformation]: Suppose \(X\) is a continuous random variable. Find the cdf and pdf for \(Y = X^2\).
Exercise [Normal-gamma squared relationship]: Let \(X \sim \mathcal{N}(0, 1)\), show that \(Y = X^2\) follows a gamma distribution.
Write efficient NumPy expressions for the following:
Counterexamples for fmin
scipy.optimize.fmin
finds a local minimum.scipy.optimize.fmin
fails to find a local minimum.Repeat E1Q14, but this time don’t assume that the inflection point is at \(1980\).
Extension of E1Q17.
Recommended reading: Chapter 7 from “The Science of Uncertainty” (see here)
Extension of E2Q1.
We are given a random sample \(x_1, \ldots , x_n\) from the Poisson distribution \(\mathrm{Po}(\theta)\), where \(\theta \sim \Gamma(\alpha, \beta)\) for constant \(\alpha\) and \(\beta\). Determine the posterior \(\Pr(\theta \mid x_1, \ldots , x_n)\).
We are given a random sample \(x_1, \ldots , x_n\) from the exponential distribution \(\mathcal{E}(\theta)\), where \(\theta \sim \Gamma(\alpha, \beta)\) for constant \(\alpha\) and \(\beta\). Determine the posterior \(\Pr(\theta \mid x_1, \ldots , x_n)\).
Example 7.14 from “The Science of Uncertainty” (see here)
Are the following vectors independent?
Show that the if a vector \(v\) belongs to the span of vectors \(v_1, v_2, v_3\) then \(\lbrace v, v_1, v_2, v_3 \rbrace\) are not linearly independent.
Using NumPy (np.linalg.matrix_rank
) determine whether the following vectors are linearly dependent:
Given \(n + 1\) vectors in \(\mathbb{R}^n\), can they be independent? (Do not give a proof for this).
What is the minimum number \(k\) of vectors in \(\mathbb{R}^n\) that can be linearly dependent?
Give \(n\) vectors in \(\mathbb{R}^n\) which are linearly independent.
Extension of E3Q8. Create a similar confidence band some other dataset of the lecture notes where you did prediction, e.g. the Iris dataset.
Perform a hypothesis test for whether the dataset has variance that changes with the square of variance.
[Normal confidence interval] Derive a confidence interval for the mean of the normal distribution (with known variance) given \(n\) samples. Why do we usually choose confidence intervals of equal tails?
[Uniform distribution] In this exercise, you will construct (meaningful) confidence intervals for the parameter of the uniform distribution. You are given i.i.d. r.vs. \(X_1, \ldots , X_n \sim U[0, \theta]\).
For each of the datasets that you presented in the course, collect the inference tasks and visualisations that you applied on the datasets. Are there some tasks that you can apply only on certain types of datasets?
Now, it is your time to to look in more depth at the data sets you have
For the police stop-and-search dataset, find an interesting hypothesis or some sort of modelling task and apply the techniques your learnt to solve it. Use visualisations that support your insights.
For the iris dataset, find an interesting hypothesis or some sort of modelling task and apply the techniques your learnt to solve it. Use visualisations that support your insights.
For the police stop-and-search dataset, find an interesting hypothesis or some sort of modelling task and apply the techniques your learnt to solve it. Use visualisations that support your insights.
Repeat the previous exercise with several datasets, but there is no need to write code.
(optional)
[Extended multiplication rule]
[Decomposition] Show that if \(X\) is independent of \((A, B)\) then it is independent of \(A\) and independent of \(B\).
[Weak Union] Show that if \(X\) is independent of \((A, B)\), then \(X\) is independent of \(A\) given \(B\).
[Contraction] Show that if \(X\) is independent of \(A\) given \(B\), and \(X\) is independent of \(B\), then \(X\) is independent of \((A, B)\).
[Machine translation] (optional) In this exercise, you will prove the famous probability formulations of the IBM models. In the context of statistical machine translation, we have a source sequence \(\mathbf{s}\) (where \(s_i\) is the \(i\)-ith element and \(s_i^j\) denotes the elements \(s_i , \ldots , s_j\)), a target sequence \(\mathbf{t}\) and an alignment \(\mathbf{a}\), indicating connections between the source and the target sequences.
Show that assuming independence between \(a\) and \(t\) given \(s\),
\[\mathbb{P}(t, a, m \vert s) = \mathbb{P}(m \vert s) \prod_{i=1}^{\lvert t \rvert} \mathbb{P}(a_i \vert a_1^{i-1}, s, m) \mathbb{P}(t_i \vert f_1^{i-1}, s, m)\]Show that
\[\mathbb{P}(t, a, m \vert s) = \mathbb{P}(m \vert s) \prod_{i=1}^{\lvert t \rvert} \mathbb{P}(t_i, a_i \vert a_1^{i-1}, s, m)\]Show that \(\mathbb{P}(t_j \vert t_1^{j-1}, a_1^j, s, m) \cdot \mathbb{P}(a_j \vert t_1^{j-1}, a_1^{j-1}, s, m) = \mathbb{P}(a_j, f_j \vert t_1^{j-1}, a_1^{j-1}, s, m)\). Deduce that
\[\mathbb{P}(t, a, m \vert s) = \mathbb{P}(m \vert s) \prod_{i=1}^{\lvert t \rvert} \mathbb{P}(t_i \vert t_1^{i-1}, a_1^i, s, m) \cdot \mathbb{P}(a_i \vert t_1^{i-1}, a_1^{i-1}, s, m)\](Optional) At the station there are three payphones which accept 20p pieces. One never works, another always works, while the third works with probability 1/2. On my way to the metropolis for the day, I wish to identify the reliable phone, so that I can use it on my return. The station is empty and I have just three 20p pieces. I try one phone and it does not work. I try another twice in succession and it works both times. What is the probability that this second phone is the reliable one?
Parliament contains a proportion \(p\) of Party A members, who are incapable of changing their minds about anything, and a proportion \(1−p\) of Party B members who change their minds completely at random (with probability \(r\)) between successive votes on the same issue. A randomly chosen member is noticed to have voted twice in succession in the same way. What is the probability that this member will vote in the same way next time?
[Polya Urn] The Polya Urn model is as follows. We start with an urn which contains one white ball and one black ball. At each second we choose a ball at random from the urn and replace it together with one more ball of the same colour. Calculate the probability that when \(n\) balls are in the urn, \(i\) of them are white.
This section is under construction
The following handouts have a lot of problems on Makov Chains (some out of the scope of this class):
When you solve one of these problems, try to think about computational aspects of the Markov Chain. Run simulations and see if these match the theoretical result.