5: Probability Density Estimation
after reviewing the slides I have a question towards slides 13f. It's not quite clear to me why the ML dosen't give us a useful estimate. Because the suggestion is to put a prior on the mean.
1. How do we compute/estimate the prior?
--> We just compute the mean of our data X and then multiply it with p(Theta)?
2. Why do we want to put the prior on the mean?
3. I don't understand the trick of formalizing p(x|X) to a conditional probability. Why do we rewrite p(x|X) over an Integral p(x,Theta|X)? And why do we disregard the normalization term (i.e. Totale Wahrscheinlichkeit) p(x) [I think it get cancelled because of the chain rule for probabilities(?)]?