UFLDL 学习笔记

1178 查看

前言

最近开始看Andrew Ng 大牛的深度学习教程,算是作为对自己的一个激励,也作为日后回顾的办法,开始记录学习笔记,每一章节分别对应,所有章节写在这一片文章里便于查询。所以我会不断更新滴~

线性回归

本章大致讲解了线性分类器的原理(他假设我们已经有这些基础了,只是作为复习梯度下降的一个办法,其实能看这些教程的都应该有机器学习的基础知识,所以有好多基础知识我就直接省略不写啦),然后练习是实现目标函数以及所有参数对应的梯度的计算,我的代码如下:

function [f,g] = linear_regression(theta, X,y)
  %
  % Arguments:
  %   theta - A vector containing the parameter values to optimize.
  %   X - The examples stored in a matrix.
  %       X(i,j) is the i'th coordinate of the j'th example.
  %   y - The target value for each example.  y(j) is the target for example j.
  %
  
  m=size(X,2);%样本数量
  n=size(X,1);%特征维度

  f=0;
  g=zeros(size(theta));

  %
  % TODO:  Compute the linear regression objective by looping over the examples in X.
  %        Store the objective function value in 'f'.
  %
  % TODO:  Compute the gradient of the objective with respect to theta by looping over
  %        the examples in X and adding up the gradient for each example.  Store the
  %        computed gradient in 'g'.
  
%%% YOUR CODE HERE %%%
 for j = 1:m
     f = f + 0.5*(theta'*X(:,j)-y(j))^2;
 end
%  ----------
 for i = 1:n
     for j = 1:m
         g(i) = g(i) + X(i,j)*(theta'*X(:,j)-y(j))
     end
 end

最终结果如下:

Optimization took 128.640734 seconds.%花这么多时间是因为我把循环里的参数打出来了
RMS training error: 4.843147
RMS testing error: 4.151706

Logistics回归

说是回归,其实是分类,本章节主要实现了一个手写字符分类,而且是最简单的0-1分类,所以结果正确率相当之高。我的代码如下:

  function [f,g] = logistic_regression(theta, X,y)
  %
  % Arguments:
  %   theta - A column vector containing the parameter values to optimize.
  %   X - The examples stored in a matrix.  
  %       X(i,j) is the i'th coordinate of the j'th example.
  %   y - The label for each example.  y(j) is the j'th example's label.
  %

  m=size(X,2);%训练图片数量
  n=size(X,1);%图片像素点数+1
  
  % initialize objective value and gradient.
  f = 0;
  g = zeros(size(theta));


  %
  % TODO:  Compute the objective function by looping over the dataset and summing
  %        up the objective values for each example.  Store the result in 'f'.
  %
  % TODO:  Compute the gradient of the objective by looping over the dataset and summing
  %        up the gradients (df/dtheta) for each example. Store the result in 'g'.
  %
%%% YOUR CODE HERE %%%
for j = 1:m
     f = f - ( y(j)*log(1/(1+exp(-theta'*X(:,j)))) + (1-y(j))*log(1-(1/(1+exp(-theta'*X(:,j))))) );    
 end
%  ----------
 for i = 1:n
     for j = 1:m
         g(i) = g(i) + X(i,j)*(1/(1+exp(-theta'*X(:,j)))-y(j));
     end
 end
    

结果:

Optimization took 7874.049756 seconds.%我等到花儿都谢了
Training accuracy: 100.0%
Test accuracy: 100.0% 

向量化

向量化是节约时间的一大法宝,说白了就是利用matlab矩阵计算的优势弥补它在循环上的劣势。我的线性回归代码:

function [f,g] = linear_regression_vec(theta, X,y)
  %
  % Arguments:    
  %   theta - A vector containing the parameter values to optimize.
  %   X - The examples stored in a matrix.
  %       X(i,j) is the i'th coordinate of the j'th example.
  %   y - The target value for each example.  y(j) is the target for example j.
  %
  m=size(X,2);
  
  % initialize objective value and gradient.
  f = 0;
  g = zeros(size(theta));

  %
  % TODO:  Compute the linear regression objective function and gradient 
  %        using vectorized code.  (It will be just a few lines of code!)
  %        Store the objective function value in 'f', and the gradient in 'g'.
  %
%%% YOUR CODE HERE %%%
 f = sum((theta'*X - y).^2) * 0.5;
 
 y_hat = theta'*X;
 g = X*(y_hat' - y');

结果:

Optimization took 0.108650 seconds.
RMS training error: 4.650101
RMS testing error: 4.856230

真是非常省时省力哈。不过这些i,j下标,还有转置真是让人头晕,实际写的时候可以用调试模式来观察你的数据,然后修改你的小标,决定是否转置(目的不都是为了矩阵符合相乘的条件嘛)。还有在一次试验中尽量记住每一个常用变量的含义,比如在整篇教程中,m 代表样本数量,n 代表特征维度。

下面是Logistic 回归的向量化代码:

function [f,g] = logistic_regression_vec(theta, X,y)
  %
  % Arguments:
  %   theta - A column vector containing the parameter values to optimize.
  %   X - The examples stored in a matrix.  
  %       X(i,j) is the i'th coordinate of the j'th example.
  %   y - The label for each example.  y(j) is the j'th example's label.
  %
  m=size(X,2);
  
  % initialize objective value and gradient.
  f = 0;
  g = zeros(size(theta));

  %
  % TODO:  Compute the logistic regression objective function and gradient 
  %        using vectorized code.  (It will be just a few lines of code!)
  %        Store the objective function value in 'f', and the gradient in 'g'.
  %
%%% YOUR CODE HERE %%%
 h = sigmoid(theta'*X);
 f = -sum(y.*log(h) + (1-y).*log(1 - h));
 g = X*(h - y)'; 

结果:

Optimization took 3.064685 seconds.
Training accuracy: 100.0%
Test accuracy: 100.0%

梯度验证

简单说来就是用求导的近似值去验证我们按照公式计算的导数值是否正确。

我们使用grad_check.m:

function average_error = grad_check(fun, theta0, num_checks, varargin)

  delta=1e-3; 
  sum_error=0;

  fprintf(' Iter       i             err');
  fprintf('           g_est               g               f\n')

  for i=1:num_checks
    T = theta0;
    j = randsample(numel(T),1);%theta选择一个随机下标
    T0=T; T0(j) = T0(j)-delta;%θ(j-),亦即θ的第j个元素减去delta
    T1=T; T1(j) = T1(j)+delta;%θ(j+)

    [f,g] = fun(T, varargin{:});
    f0 = fun(T0, varargin{:});%J(θ(j-))
    f1 = fun(T1, varargin{:});%J(θ(j+))

    g_est = (f1-f0) / (2*delta);
    error = abs(g(j) - g_est);
    %循环次数,theta下标,偏差绝对值,真实值,估计值,函数值
    fprintf('% 5d  % 6d % 15g % 15f % 15f % 15f\n', ...
            i,j,error,g(j),g_est,f);
            

    sum_error = sum_error + error;
  end

  average_error=sum_error/num_checks;

在ex1a_linreg.m中加入;

average_error = grad_check(@linear_regression_vec,theta,30,train.X,train.y);  
fprintf('The Average error is :%f\n',average_error);  

运行结果:

Iter       i             err           g_est               g               f
    1      14      8.0571e-06  1418640.687110  1418640.687102  14517559.734147
    2       3     3.73228e-07  1100385.922200  1100385.922200  14517559.734147
    3       4     2.48384e-06  1236106.996470  1236106.996473  14517559.734147
    4      13     5.16325e-06  38562142.957593  38562142.957588  14517559.734147
    5      14      8.0571e-06  1418640.687110  1418640.687102  14517559.734147
    6      10      6.0685e-06  1118680.054414  1118680.054408  14517559.734147
    7      13     5.16325e-06  38562142.957593  38562142.957588  14517559.734147
    8      10      6.0685e-06  1118680.054414  1118680.054408  14517559.734147
    9      14      8.0571e-06  1418640.687110  1418640.687102  14517559.734147
   10      11     2.87592e-06  45661592.041328  45661592.041331  14517559.734147
   11      13     5.16325e-06  38562142.957593  38562142.957588  14517559.734147
   12       2     1.97807e-06   436767.013214   436767.013212  14517559.734147
   13      14      8.0571e-06  1418640.687110  1418640.687102  14517559.734147
   14      14      8.0571e-06  1418640.687110  1418640.687102  14517559.734147
   15      11     2.87592e-06  45661592.041328  45661592.041331  14517559.734147
   16       1     3.02999e-06   106041.865458   106041.865461  14517559.734147
   17       5     1.42339e-06     6344.599333     6344.599332  14517559.734147
   18       9      3.8307e-06   389421.210472   389421.210468  14517559.734147
   19       7     3.66173e-06   660532.159808   660532.159812  14517559.734147
   20       5     1.42339e-06     6344.599333     6344.599332  14517559.734147
   21       4     2.48384e-06  1236106.996470  1236106.996473  14517559.734147
   22       9      3.8307e-06   389421.210472   389421.210468  14517559.734147
   23       7     3.66173e-06   660532.159808   660532.159812  14517559.734147
   24      11     2.87592e-06  45661592.041328  45661592.041331  14517559.734147
   25      11     2.87592e-06  45661592.041328  45661592.041331  14517559.734147
   26      12     2.83984e-06  1978417.905024  1978417.905027  14517559.734147
   27       5     1.42339e-06     6344.599333     6344.599332  14517559.734147
   28      12     2.83984e-06  1978417.905024  1978417.905027  14517559.734147
   29       5     1.42339e-06     6344.599333     6344.599332  14517559.734147
   30      10      6.0685e-06  1118680.054414  1118680.054408  14517559.734147
The Average error is :0.000004

可见我们的梯度计算是正确的。(其实这个代码还是可优化的哈,循环里有几行可以提到循环外面去,比如

 T = theta0;
 [f,g] = fun(T, varargin{:});

SoftMax 回归

其实就是多类别的Logistics回归(区分于二分类),我的代码如下:

function [f,g] = softmax_regression_vec(theta, X,y)
  %
  % Arguments:
  %   theta - A vector containing the parameter values to optimize.
  %       In minFunc, theta is reshaped to a long vector.  So we need to
  %       resize it to an n-by-(num_classes-1) matrix.
  %       Recall that we assume theta(:,num_classes) = 0.
  %
  %   X - The examples stored in a matrix.  
  %       X(i,j) is the i'th coordinate of the j'th example.
  %   y - The label for each example.  y(j) is the j'th example's label.
  %
  m=size(X,2);%样本数量
  n=size(X,1);%特征维度

  % theta is a vector;  need to reshape to n x num_classes.
  theta=reshape(theta, n, []);
  num_classes=size(theta,2)+1;
  
  % initialize objective value and gradient.
  f = 0;
  g = zeros(size(theta));

  %
  % TODO:  Compute the softmax objective function and gradient using vectorized code.
  %        Store the objective function value in 'f', and the gradient in 'g'.
  %        Before returning g, make sure you form it back into a vector with g=g(:);
  %
%%% YOUR CODE HERE %%%
  indictor = full(sparse(y, 1:m, 1));%示性函数
  theta = [theta,zeros(n,1)]; %恢复theta,增加一行 
  a =  exp(theta'*X);
  p = bsxfun(@rdivide,a,sum(a));  
  l = log(p);
  %f = -sum(indictor*log(p);%这样的话产生过大的矩阵,不允许
  f = -indictor(:)'*l(:);
  g = -X * (indictor-p)';
  g = g(:,1:end- 1); %减去一行 
  
  g=g(:); % make gradient a vector for minFunc

结果:

Optimization took 91.072469 seconds.
Training accuracy: 94.4%
Test accuracy: 92.2%