For the loss function \(J(\theta)\),the element of the hessian
matrix is: \[
H_{ij} = \frac{ \partial^{2} J(\theta)}{\partial \theta_i\partial
\theta_j}
\] So for the specific loss function \(J(\theta)\)\[
J(\theta) = -\frac{1}{m}\sum_{i =
1}^{m}y^{(i)}log(h_{\theta}(x^{(i)}))\, + \, (1 - y^{(i)})log(1 -
h_{\theta}(x^{(i)}))
\] We got: \[
\frac{\partial {J(\theta)}}{\partial \theta_i} = -\frac{1}{m}\sum_{k =
1}^{m}(x_{i}^{(k)}y^{(k)} - x_{i}^{(k)}g(\theta^{T}x))
\] So the second derivative is: \[
\frac{\partial ^{2}J(\theta)}{\partial {\theta_i}{\theta_j}} = \sum_{k =
1}^{m}\frac{x_{i}^{(k)}x_{j}^{(k)}}{m}g(\theta^{T}x)(1 - g(\theta^{T}x))
\] We want to prove \(z^{T}Hz \geq
0\): \[
z^THz = \sum_{k = 1}^{m}\sum_{i = 1}^{m}\sum_{j =
1}^{m}\frac{z_{i}x_{i}^{(k)}x_{j}^{(k)}z_{j}}{m}g(\theta^{T}x)(1 -
g(\theta^{T}x))
\] Consider the sum, we got: \[
\sum_{i = 1}^{m}\sum_{j = 1}^{m}z_{i}x_{i}x_{j}z_{j} = (xz)^{T}(xz) \geq
0
\] the resident part \(0 <
g(\theta^Tx) < 1\),so we finish the proof.
defmain(train_path, eval_path, pred_path): """Problem 1(b): Logistic regression with Newton's Method.
Args: train_path: Path to CSV file containing dataset for training. eval_path: Path to CSV file containing dataset for evaluation. pred_path: Path to save predictions. """ x_train, y_train = util.load_dataset(train_path, add_intercept=True)
# *** START CODE HERE *** model = LogisticRegression(eps=1e-5) model.fit(x_train, y_train) util.plot(x_train, y_train,model.theta,'output/p01b.png') x_out,y_out = util.load_dataset(eval_path, add_intercept=True) prediction = model.predict(x_eval) np.savetxt(pred_path,prediction > 0.5, fmt='%d') # *** END CODE HERE ***
classLogisticRegression(LinearModel): """Logistic regression with Newton's Method as the solver.
From the known factors: \[
p(t = 1|y = 1) = 1\\
p(y = 1|t = 1,x) = p(y = 1|t = 1)
\] So we can derivate: \[
\frac{p(t = 1|x)}{p(y = 1|x)} = \frac{p(t = 1|y = 1,x)}{p(y = 1|t =
1,x)} = \frac{1}{p(y = 1 | t = 1)}
\] From the problem we know it is a constant.
Sub Problem (b)
From sub problem (a) we know: \[
h(x) \approx p(y = 1|x) \approx \alpha
\]
t_pred_e = y_pred / alpha np.savetxt(pred_path_e, t_pred_e > 0.5, fmt='%d') ####################################################################################### # *** END CODER HERE
Problem 3
Sub Problem (a)
Poisson distribution can be written as: \[
p(y;\lambda) = \frac{1}{y!}exp\{ylog\lambda - \lambda\}
\] For the parameters in exponential family: \[
b(y) = \frac{1}{y!}\\
\eta = log\lambda\\
T(y) = y \\
\alpha(\eta) = \exp\{\eta\}
\] Proof finished.
So we take the derivative of \(\theta_j\): \[
\frac{\partial log\,p(y^{(i)}|x^{(i)};\theta)}{\partial \theta_j} =
x_{j}^{(i)}y^{(i)} - x_{j}^{(i)}exp\{\theta^Tx^{(i)}\}
\] The adjustment to the \(\theta_j\) is: \[
\theta_j = \theta_j + \alpha(x_{j}^{(i)}y^{(i)} -
x_{j}^{(i)}exp\{\theta^Tx^{(i)}\})
\]
Args: lr: Learning rate for gradient ascent. train_path: Path to CSV file containing dataset for training. eval_path: Path to CSV file containing dataset for evaluation. pred_path: Path to save predictions. """ # Load training set x_train, y_train = util.load_dataset(train_path, add_intercept=True)
# *** START CODE HERE *** model = PoissonRegression(step_size=lr,eps=1e-5) model.fit(x_train, y_train) x_out,y_out = util.load_dataset(eval_path, add_intercept=True) prediction = model.predict(x_out) np.savetxt(pred_path,prediction > 0.5, fmt='%d') plt.figure() plt.plot(y_out,prediction,'bx') plt.xlabel('True data') plt.ylabel('Predict data') plt.savefig('p03d.png') # *** END CODE HERE ***
So the derivative is: \[
\frac{\partial \ell (\theta)}{\partial \theta_j} = \sum_{i =
1}^{m}x_{j}^{(i)}\frac{\partial \alpha(\theta^Tx)}{\partial \theta_j}
- x_j^{(i)}y^{(i)}
\] The positive definiteness has been proved before, so we don't
prove again.
Problem 5
Sub Problem (a)
problem i
Vectorize the \(X,Y,\theta\),and we
define the weight matrix: \[
W_{ij} = \frac{1}{2}w^{(i)} \;(i = j)\\
W_{ij} = 0\;\;(i \neq j)
\] So it can be easily proved.
Args: tau: Bandwidth parameter for LWR. train_path: Path to CSV file containing dataset for training. eval_path: Path to CSV file containing dataset for evaluation. """ # Load training set x_train, y_train = util.load_dataset(train_path, add_intercept=True)
# *** START CODE HERE *** # Fit a LWR model # Get MSE value on the validation set # Plot validation predictions on top of training set # No need to save predictions # Plot data # *** END CODE HERE ***
import matplotlib.pyplot as plt import numpy as np import util
from p05b_lwr import LocallyWeightedLinearRegression
defmain(tau_values, train_path, valid_path, test_path, pred_path): """Problem 5(b): Tune the bandwidth paramater tau for LWR.
Args: tau_values: List of tau values to try. train_path: Path to CSV file containing training set. valid_path: Path to CSV file containing validation set. test_path: Path to CSV file containing test set. pred_path: Path to save predictions. """ # Load training set x_train, y_train = util.load_dataset(train_path, add_intercept=True)
# *** START CODE HERE *** # Search tau_values for the best tau (lowest MSE on the validation set) # Fit a LWR model with the best tau value # Run on the test set to get the MSE value # Save predictions to pred_path # Plot data x_eval, y_eval = util.load_dataset(valid_path, add_intercept=True) x_test, y_test = util.load_dataset(test_path, add_intercept=True) model = LocallyWeightedLinearRegression(tau=0.5) model.fit(x_train, y_train) MSE = [] for tau in tau_values: model.tau = tau pred = model.predict(x_eval) squared_estimate = np.mean((pred - y_eval) ** 2) MSE.append(squared_estimate)