import numpy as np
We want to look at the Spotify(c) music dataset by Brice Vergnou on Kaggle. The dataset consists of $150$ songs (randomly selected from $195$ available on Kaggle). The testing set consists of the rest $45$ songs. Each song is characterised by $13$ numbers representing
Every song included in the training set is also assigned with the binary ($0$ or $1$) variable that provides an information on the dataset author's music preferences.
We first want a function $\mathtt{linear\_model\_function}$ that implements a linear model function for binary logistic regression defined as: $$ f\left(\mathbf{x}, \mathbf{w}\right) = \left\langle \phi\left(\mathbf{x}\right),\mathbf{w}\right\rangle, $$ where $\phi\left(\mathbf{x}\right)$ is an augmented data vector and $\mathbf{w}$ is a weight vector
def linear_model_function(data_matrix, weights):
return np.dot(data_matrix, weights)
Next we want a function $\mathtt{binary\_logistic\_activation\_function}$ that takes an argument named inputs and returns the output of the sigmoid function $$ \sigma\left(\mathbf{x}\right) = \frac{1}{1+\mathrm{e}^{-\mathbf{x}}}. $$ applied to the NumPy array input. Here $\mathbf{x}$ is the mathematical notation for the argument input.
def binary_logistic_activation_function(inputs):
return 1/(1+np.exp(-inputs))
Now we need two functions $\mathtt{binary\_logistic\_prediction\_function}$ and $\mathtt{classification\_accuracy}$ that turn our predictions into classification results and compare how many labels have been classified correctly. The function $\mathtt{binary\_logistic\_prediction\_function}$ takes the argument logistic_values as inputs and returns a vector of class labels with binary values in $\left\{0, 1\right\}$ as its output. The function $\mathtt{classification\_accuracy}$ takes two inputs true_labels and recovered_labels and returns the percentage of correctly classified labels divided by $100$.
def binary_logistic_prediction_function(logistic_values):
return logistic_values >= 0.5
def classification_accuracy(true_labels, recovered_labels):
count = 0
for i, elem in enumerate(true_labels):
if elem == recovered_labels[i]:
count += 1
return count / len(true_labels)
We now want two functions that implement the cost function for binary logistic regression as well as its gradient, as defined below. $$ \mathrm{L}\left(\mathbf{w}\right) = \frac{1}{s} \left(\sum\limits_{i=1}^s \log\left[1+\exp\left(f\left(\mathbf{x}^{(i)},\mathbf{w}\right)\right)\right] - y_i\cdot f\left(\mathbf{x}^{(i)},\mathbf{w}\right)\right), $$ where $\phi\left(\mathbf{x}^{(i)}\right)$ is an augmented $i$-th data vector and $f$ is a model function. In the case of linear model function $f\left(\mathbf{x},\mathbf{w}\right) = \left\langle \phi\left(\mathbf{x}\right),\mathbf{w} \right\rangle$ one has $$ \nabla \mathrm{L}\left(\mathbf{w}\right) = \frac{1}{s} \left( \sum\limits_{i=1}^s \phi\left(\mathbf{x}^{(i)}\right)\cdot\sigma \left(\left\langle \phi\left(\mathbf{x}^{(i)}\right),\mathbf{w} \right\rangle \right) - y_i\cdot \phi\left(\mathbf{x}^{(i)}\right) \right), $$ where $y_i$ are the corresponding data labels.
def binary_logistic_regression_cost_function(data_matrix, data_labels, weights):
return np.mean(np.log(1 + np.exp(np.dot(data_matrix, weights))) - data_labels * np.dot(data_matrix, weights))
def binary_logistic_regression_gradient(data_matrix, data_labels, weights):
return data_matrix.T @ (binary_logistic_activation_function(np.dot(data_matrix, weights)) - data_labels) / len(data_matrix)
We implement the gradient descent algorithm, along with a $\mathtt{gradient\_descent\_v2}$ algorith that includes a stopping criterion to end the process when:
$$ \left\| \nabla L\left(\mathbf{w}^{(k)}\right)\right\|_2 \leq \mathrm{tolerance}, $$is satisfied. Here $L$ and $\mathbf{w}^{(k)}$ are the mathematical representations of the objective $\mathrm{objective}$ and the weight vector weights, at iteration k. The parameter tolerance is a non-negative threshold that controls the Euclidean norm of the gradient. The function $\mathtt{gradient\_descent\_v2}$ takes the arguments $\mathtt{objective}$, $\mathtt{gradient}$, initial_weights, step_size, no_of_iterations, print_output and tolerance. The arguments $\mathtt{objective}$ and $\mathtt{gradient}$ are functions that can take (weight-)arrays as arguments and return the scalar value of the objective, respectively the array representation of the corresponding gradient. The argument initial_weights specifies the initial value of the variable over which you iterate. The argument step_size is the gradient descent step-size parameter, the argument no_of_iterations specifies the maximum number of iterations, print_output determines after how many iterations the function produces a text output and tolerance controls the norm of the gradient as described in the equation above.
def gradient_descent(objective,
gradient,
initial_weights,
step_size=1,
no_of_iterations=100,
print_output=10):
objective_values = []
weights = np.copy(initial_weights)
objective_values.append(objective(weights))
for counter in range(no_of_iterations):
weights -= step_size * gradient(weights)
objective_values.append(objective(weights))
if (counter + 1) % print_output == 0:
print(f'Iteration {counter+1}/{no_of_iterations}, objective = {objective_values[counter]}.')
print(f'Iteration completed after {counter+1}/{no_of_iterations}, objective = {objective_values[counter]}.')
return weights, objective_values
def gradient_descent_v2(objective, gradient, initial_weights, \
step_size=1, no_of_iterations=100, print_output=10, tolerance=1e-6):
objective_values = []
weights = np.copy(initial_weights)
objective_values.append(objective(weights))
for counter in range(no_of_iterations):
weights -= step_size * gradient(weights)
objective_values.append(objective(weights))
if (counter + 1) % print_output == 0:
print(f'Iteration {counter+1}/{no_of_iterations}, objective = {objective_values[counter]}.')
if np.linalg.norm(gradient(weights)) <= tolerance:
break
print(f'Iteration completed after {counter+1}/{no_of_iterations}, objective = {objective_values[counter]}.')
return weights, objective_values
The code in the following cell
spotify_training_data = np.genfromtxt('spotify_training.csv',
skip_header=True,
dtype=None,
delimiter=',')
spotify_testing_data_input = np.genfromtxt('spotify_testing.csv',
skip_header=True,
dtype=None,
delimiter=',')
spotify_training_data_input = spotify_training_data[:, :-1]
spotify_training_data_labels = spotify_training_data[:, -1].reshape(-1, 1)
In the following cell we write a function $\mathtt{standardise}$ that standardises the columns of a two-dimensional NumPy array data_matrix. The function returns a triple: the normalised matrix, the row of column averages and the row of column standard deviations. We also include a function $\mathtt{de_standardise}$ to reverse the process.
def standardise(data_matrix):
means = np.mean(data_matrix, axis=0)
standardised_matrix = data_matrix - means
stds = np.std(standardised_matrix, axis=0)
return (standardised_matrix / stds), means, stds
def de_standardise(standardised_matrix, row_of_means, row_of_stds):
return standardised_matrix * row_of_stds + row_of_means
Standardising our data.
spotify_training_data_input, spotify_row_of_avgs, spotify_row_of_stds = standardise(spotify_training_data_input)
spotify_testing_data_input = (spotify_testing_data_input - spotify_row_of_avgs) / spotify_row_of_stds
In order to prepare our normalised data for a data analysis we also need to build an augmented data matrix. We implement a function $\mathtt{linear\_regression\_data}$ that computes (and outputs) the linear regression data_matrix for a given data_inputs matrix.
def linear_regression_data(data_inputs):
first_column = np.ones((len(data_inputs), 1))
X_matrix = np.c_[first_column,data_inputs]
return X_matrix
We now apply the above to the Spotify dataset:
spotify_training_data_matrix = linear_regression_data(spotify_training_data_input)
spotify_objective = lambda weights: binary_logistic_regression_cost_function(spotify_training_data_matrix, spotify_training_data_labels, weights)
spotify_gradient = lambda weights: binary_logistic_regression_gradient(spotify_training_data_matrix, spotify_training_data_labels, weights)
spotify_initial_weights = np.zeros(len(spotify_training_data_matrix.T)).reshape(len(spotify_training_data_matrix.T), 1)
spotify_step_size = 3.9 * len(spotify_training_data_matrix) / np.linalg.norm(spotify_training_data_matrix)**2
spotify_optimal_weights, spotify_objective_values = gradient_descent_v2(spotify_objective, spotify_gradient,
spotify_initial_weights,
spotify_step_size, no_of_iterations=1000, print_output=10,
tolerance = 1e-2)
Iteration 10/1000, objective = 0.3656217497255595. Iteration 20/1000, objective = 0.30596451520120277. Iteration 30/1000, objective = 0.27905827259944155. Iteration 40/1000, objective = 0.2627961153435555. Iteration 50/1000, objective = 0.25161785537553033. Iteration 60/1000, objective = 0.243342246725595. Iteration 70/1000, objective = 0.23690261321297962. Iteration 80/1000, objective = 0.23170531567632618. Iteration 90/1000, objective = 0.2273907132713435. Iteration 100/1000, objective = 0.22372733676269302. Iteration 110/1000, objective = 0.22055922100700967. Iteration 120/1000, objective = 0.21777738870557936. Iteration 130/1000, objective = 0.21530339462833412. Iteration 140/1000, objective = 0.21307934159732492. Iteration 150/1000, objective = 0.2110615717681608. Iteration 160/1000, objective = 0.20921654328811126. Iteration 170/1000, objective = 0.20751805697725514. Iteration 180/1000, objective = 0.2059453443989625. Iteration 190/1000, objective = 0.2044817210576845. Iteration 200/1000, objective = 0.20311361947131654. Iteration 210/1000, objective = 0.20182988312341005. Iteration 220/1000, objective = 0.20062124302753828. Iteration 230/1000, objective = 0.1994799243263153. Iteration 240/1000, objective = 0.19839934692994124. Iteration 250/1000, objective = 0.19737389512587236. Iteration 260/1000, objective = 0.19639873842677535. Iteration 270/1000, objective = 0.1954696909328364. Iteration 280/1000, objective = 0.1945830999582056. Iteration 290/1000, objective = 0.1937357571150063. Iteration 300/1000, objective = 0.19292482679017547. Iteration 310/1000, objective = 0.1921477882072467. Iteration 320/1000, objective = 0.19140238818244162. Iteration 330/1000, objective = 0.1906866023609994. Iteration 340/1000, objective = 0.1899986032236738. Iteration 350/1000, objective = 0.1893367335322925. Iteration 360/1000, objective = 0.18869948417071905. Iteration 370/1000, objective = 0.18808547555738597. Iteration 380/1000, objective = 0.18749344197498705. Iteration 390/1000, objective = 0.18692221829443936. Iteration 400/1000, objective = 0.18637072867302754. Iteration 410/1000, objective = 0.18583797688751488. Iteration 420/1000, objective = 0.18532303802700822. Iteration 430/1000, objective = 0.1848250513212996. Iteration 440/1000, objective = 0.18434321392116687. Iteration 450/1000, objective = 0.18387677547987788. Iteration 460/1000, objective = 0.18342503341161065. Iteration 470/1000, objective = 0.1829873287239673. Iteration 480/1000, objective = 0.1825630423392369. Iteration 490/1000, objective = 0.18215159183334703. Iteration 500/1000, objective = 0.18175242853315693. Iteration 510/1000, objective = 0.1813650349223749. Iteration 520/1000, objective = 0.18098892231433117. Iteration 530/1000, objective = 0.18062362875640717. Iteration 540/1000, objective = 0.18026871713637088. Iteration 550/1000, objective = 0.17992377346539687. Iteration 560/1000, objective = 0.17958840531632078. Iteration 570/1000, objective = 0.17926224039882857. Iteration 580/1000, objective = 0.17894492525591488. Iteration 590/1000, objective = 0.17863612406815424. Iteration 600/1000, objective = 0.1783355175541878. Iteration 610/1000, objective = 0.1780428019573889. Iteration 620/1000, objective = 0.1777576881099966. Iteration completed after 623/1000, objective = 0.17767359477303019.
We now evaluate classification accuracy.
spotify_recovered_labels = binary_logistic_activation_function(np.dot(spotify_training_data_matrix, spotify_optimal_weights)) >= 0.5
spotify_classification_accuracy = classification_accuracy(spotify_training_data_labels, spotify_recovered_labels)
print(f'Spotify Classification Accuracy: {100*spotify_classification_accuracy:.2f}%')
Spotify Classification Accuracy: 92.67%
We now attempt to increase the classification accuracy by considering the ridge binary logistic regression.
We define two functions $\mathtt{ridge\_binary\_logistic\_regression\_cost\_function}$ and $\mathtt{ridge\_binary\_logistic\_regression\_gradient}$ that take $4$ arguments: the NumPy arrays data_matrix, data_labels and weights, and a positive float number regularisation_parameter. The modified cost function is defined as $$ \mathrm{L}_{\alpha}\left(\mathbf{w}\right) = \mathrm{L}\left(\mathbf{w}\right) + \frac{\alpha}{2}\left\|\mathbf{w}\right\|^2, $$ where $\mathrm{L}\left(\mathbf{w}\right)$ is a cost function for the binary logistic regression defined above, while the gradient function correspondingly is given by $$ \nabla \mathrm{L}_{\alpha}\left(\mathbf{w}\right) = \nabla \mathrm{L}\left(\mathbf{w}\right) + \alpha \mathbf{w}, $$ where $\alpha$ is a mathematical representation of the regularisation_parameter.
def ridge_binary_logistic_regression_cost_function(data_matrix,
data_labels,
weights,
regularisation_parameter):
return binary_logistic_regression_cost_function(data_matrix, data_labels, weights) + regularisation_parameter * np.linalg.norm(weights)**2 / 2
def ridge_binary_logistic_regression_gradient(data_matrix,
data_labels,
weights,
regularisation_parameter):
return binary_logistic_regression_gradient(data_matrix, data_labels, weights) + regularisation_parameter * weights
We implement a function $\mathtt{grid\_search}$ that performs a search for a minimum value of a given function on a given grid points. This takes two parameters:
def grid_search(objective, grid):
values = np.array([])
for point in grid:
values = np.append(values, objective(point))
return grid[np.argmin(values)]
The code below finds the optimal value of hyperparameter regularisation_parameter and then finds the spotify_optimal_weights corresponding to this hyperparameter value.
regularisation_parameter = 1
spotify_regularisation_parameter_grid = np.arange(0, .1, 0.01)
spotify_validation_error = lambda regularisation_parameter: 1-classification_accuracy(spotify_training_data_labels,\
binary_logistic_prediction_function(
binary_logistic_activation_function(
linear_model_function(spotify_training_data_matrix,
gradient_descent_v2(
objective = lambda weights: ridge_binary_logistic_regression_cost_function(
spotify_training_data_matrix,
spotify_training_data_labels,
weights,
regularisation_parameter),
gradient = lambda weights: ridge_binary_logistic_regression_gradient(
spotify_training_data_matrix,
spotify_training_data_labels,
weights,
regularisation_parameter),
initial_weights = np.zeros(shape = (spotify_training_data_matrix.shape[1], 1)),
step_size = 1/(np.linalg.norm(spotify_training_data_matrix)**2/(3.9*len(spotify_training_data_matrix))+regularisation_parameter),
no_of_iterations = 2000,
print_output = 2001,
tolerance = 1e-5
)[0]))))
spotify_optimal_regularisation_parameter = grid_search(
spotify_validation_error, spotify_regularisation_parameter_grid)
spotify_optimal_weights = gradient_descent_v2(
objective=lambda weights: ridge_binary_logistic_regression_cost_function(
spotify_training_data_matrix, spotify_training_data_labels, weights,
spotify_optimal_regularisation_parameter),
gradient=lambda weights: ridge_binary_logistic_regression_gradient(
spotify_training_data_matrix, spotify_training_data_labels, weights,
spotify_optimal_regularisation_parameter),
initial_weights=np.zeros(shape=(spotify_training_data_matrix.shape[1], 1)),
step_size=1 /
(np.linalg.norm(spotify_training_data_matrix)**2 /
(3.9 * len(spotify_training_data_matrix)) + regularisation_parameter),
no_of_iterations=2000,
print_output=2001,
tolerance=1e-5)[0]
spotify_training_regression_values = linear_model_function(
spotify_training_data_matrix, spotify_optimal_weights)
spotify_training_predicted_labels = binary_logistic_prediction_function(
binary_logistic_activation_function(spotify_training_regression_values))
spotify_classification_accuracy = classification_accuracy(true_labels=spotify_training_data_labels,\
recovered_labels=spotify_training_predicted_labels)
print(
"The classification accuracy for the training set is {p:.2f} %. This is achieved for the hyperparameter value {a}"
.format(p=100 * spotify_classification_accuracy,
a=spotify_optimal_regularisation_parameter))
Iteration completed after 2000/2000, objective = 0.16459394936632762. Iteration completed after 1636/2000, objective = 0.24916613121759873. Iteration completed after 952/2000, objective = 0.28121024509895964. Iteration completed after 676/2000, objective = 0.30325224599054196. Iteration completed after 525/2000, objective = 0.32046027272947586. Iteration completed after 430/2000, objective = 0.3347162465704719. Iteration completed after 365/2000, objective = 0.3469480847841871. Iteration completed after 317/2000, objective = 0.3576911695182459. Iteration completed after 281/2000, objective = 0.3672861042167601. Iteration completed after 252/2000, objective = 0.37596462134901215. Iteration completed after 858/2000, objective = 0.30325224599048967. The classification accuracy for the training set is 92.67 %. This is achieved for the hyperparameter value 0.03
Ridge regression does not seem to improve our model accuracy.