Data Science: Supervision 1 (Additional Exercises)

The following questions are intended for further practice.

Maximum Likelihood Estimate (MLE)

A very inexperienced archer shoots an arrow \(n\) times at a disc of (unknown) radius \(\theta\). The disc is hit every time, but at completely random places. Let \(r_1, \ldots, r_n\) be the distances of the various hits to the center of the disck.

Show that given \(\theta\) the pdf is \(f_\theta(r) = \frac{2r}{\theta^2}\) for \(0 \leq r \leq \theta\).
Determine the Maximum Likelihood estimate for \(\theta\).

Suppose that \(x_1, \ldots, x_n\) is a dataset, which is a realisation of a random sample from a Rayleigh distribution, which is the continuous distribution with pdf \(f_\theta(x) = \frac{x}{\theta^2}\exp{\lbrace -\frac{1}{2} \frac{x^2}{\theta^2} \rbrace}\) for \(x\geq 0\). Determine the maximum likelihood estimator for \(\theta\).

Suppose that \(x_1, \ldots, x_n\) is a dataset, which is a realisation of a random sample from a distribution with pdf

\[f_\theta(x) = \frac{\theta}{(x + 1)^{\theta + 1}} \text{ for } x > 0\]

Determine the MLE for \(\theta\).

Suppose that \(x_1, \ldots, x_n\) is a dataset, which is a realisation of a random sample from a distribution with pdf

\[f_\theta(x) = \begin{cases} e^{\theta - x} & \text{for }\ x > \theta \\ 0 & \text{for }\ x \leq \theta \end{cases}\]

Determine the MLE for \(\theta\).
Is there anything weird with this distribution?

(+) Suppose that \(x_1, \ldots, x_n\) is a dataset, which is a realisation of a random sample from a distribution with pdf

\[f_\theta(x) = \frac{1}{2} e^{-\lvert x - \theta \rvert} \text{ for }-\infty < x < \infty\]

Determine the maximum likelihood estimator for \(\theta\).

Suppose that \(x_1, \ldots, x_n\) is a dataset, which is a realisation of a random sample from a distribution with pdf

\[f_{\mu, \lambda}(x) = \left( \frac{\lambda}{2\pi x^3} \right)^{1/2} \exp{\lbrace - \lambda (x - \mu)^2 / (2\mu^2 x)\rbrace} \text{ for } x>0\]

Determine the maximum likelihood estimator for \(\mu\) and \(\lambda\).

Suppose that \(x_1, \ldots, x_n\) is a dataset, which is a realisation of a random sample from a binomial distribution with parameters \((k, p)\), where \(p\) is known and \(k\) is unknown. (For example, this could correspond to flipping a coin with a known probability, but not knowing how many times the coin was flipped).

Write an expression \(L(k \vert \mathbf{x}, p)\) for the likelihood.
Show that \(L(k \vert \mathbf{x}, p) = 0\) for \(k < \max_i x_i\).
Explain why if the MLE is \(k\), then
\[\frac{L(k, \vert \mathbf{x}, p)}{L(k-1, \vert \mathbf{x}, p)} \geq 1 \text{ and } \frac{L(k + 1, \vert \mathbf{x}, p)}{L(k, \vert \mathbf{x}, p)} < 1\]
(+++) Show that there is a unique \(k\) that satisfies this. [Do not spend too much time on this]

(+) The unigram model in natural language processing models the probability of a sentence as \(s\) as \(\mathbb{P}(s) = p_{s_1} \cdot p_{s_2} \cdot \ldots \cdot p_{s_n}\) where \(s_1, \ldots, s_n\) are the \(n\) words of the sentence. Given \(M\) sentences \(s^1, \ldots s^M\), show that the MLE for the parameters \(p_{w}\) are \(\frac{c_w}{W}\), where \(c_w\) is the number of times \(w\) occurs in all sentences and \(W\) is the total number of words in all sentences.

Linear independence

Are the following vectors independent?

\(v_1 = (1, 2)\) and \(v_2 = (-5, 3)\)
\(v_1 = (1, 2)\) and \(v_2 = (-4, -8)\)
\(v_1 = (2, -1, 1)\), \(v_2 = (3, -4, 2)\) and \(v_3 = (5, -10, -8)\)

Show that the if a vector \(v\) belongs to the span of vectors \(v_1, v_2, v_3\) then \(\lbrace v, v_1, v_2, v_3 \rbrace\) are not linearly independent.

Given \(n + 1\) vectors in \(\mathbb{R}^n\), can they be independent? (Do not give a proof for this).

What is the minimum number \(k\) of vectors in \(\mathbb{R}^n\) that can be linearly dependent?

Give \(n\) vectors in \(\mathbb{R}^n\) which are linearly independent.

Variable transformations

Explain why the softmax transformation solves Exercise 1.6. Find a different transformation that also solves this problem.

(Optional +) Find a linear time algorithm to find the shortest distance between a pair of points in 2D, given that the distance between \((x_1, y_1)\) and \((x_2, y_2)\) is given by \(\lvert x_1 - x_2\rvert + \lvert y_1-y_2\rvert\).