SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Pastimes : Computer Learning

 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext  
To: Frank Sully who wrote (109400)2/27/2022 8:09:47 PM
From: Frank Sully   of 110581
 
Re: Introduction

FWIW, I got hit by the fifteen minute post editing SI limit again so didn’t get the machine learning videos embedded in time. Here is that section of my post with embedded videos

Deep Learning

Modern AI is based on Deep Learning algorithms. Deep learning is a subset of machine learning, which is essentially a neural network with three or more layers. These neural networks attempt to simulate the behavior of the human brain—albeit far from matching its ability—allowing it to “learn” from large amounts of data.

Deep Learning Algorithms for AI

The first one-hour video explains how this works. Amazingly, it is just least squares minimization of the neural network loss function using multi-dimensional Newton Raphson (gradient descent). See second one half hour video. Who thought Calculus would come in handy?

1. MIT Introduction to Deep Learning



2. Gradient Descent, Step-by-Step



Math Issues: Optimizing With Multiple Peaks Or Valleys



A problem with gradient descent optimization is that it can find minima of functions as well as maxima of functions. Worse, there can be multiple peaks and valleys, so more properly gradient descent finds local extrema. One is interested in machine learning, e.g., in global minima. This makes the problem considerably more difficult. This is particularly true since loss functions for deep learning neural networks can have millions or even billions of parameters.

Another problem has to do with the size of the data sets used to train deep learning neural networks, which can be huge. Since gradient descent is an iterative process, it becomes prohibitively time-consuming to evaluate the loss function at each and every data point, even with high-performance AI chips. This leads to stochastic gradient descent: the loss function is evaluated at a relatively small random sample of the data at each iterative step.

3. Stochastic Gradient Descent:



Any comments/discussion?

Slainte! (Irish for “Cheers!”)
Frank “Sully”

Report TOU ViolationShare This Post
 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext