Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

meta #2268

Open
wants to merge 45 commits into
base: master
Choose a base branch
from
Open

meta #2268

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
1cd9b25
Update _config.yml
bindugeo Jan 4, 2020
3bc41a0
Create EulersNumber.md
bindugeo Jan 5, 2020
045d1a5
Rename EulersNumber.md to 2020-01-04-EulersNumber.md
bindugeo Jan 5, 2020
6704759
Update and rename 2020-01-04-EulersNumber.md to 2020-1-4-Understandin…
bindugeo Jan 5, 2020
6a6847e
Update 2020-1-4-UnderstandingEulersNumber.md
bindugeo Jan 5, 2020
620624d
Add files via upload
bindugeo Jan 5, 2020
173a1d5
Update 2020-1-4-UnderstandingEulersNumber.md
bindugeo Jan 5, 2020
83325e5
Update 2020-1-4-UnderstandingEulersNumber.md
bindugeo Jan 5, 2020
a6f0439
Add files via upload
bindugeo Jan 5, 2020
117d670
Add files via upload
bindugeo Jan 5, 2020
3fc86c0
Update 2020-1-4-UnderstandingEulersNumber.md
bindugeo Jan 5, 2020
2d72a5c
Update 2020-1-4-UnderstandingEulersNumber.md
bindugeo Jan 5, 2020
3ebb9ff
Update 2020-1-4-UnderstandingEulersNumber.md
bindugeo Jan 5, 2020
f0ae685
Update 2020-1-4-UnderstandingEulersNumber.md
bindugeo Jan 5, 2020
53198b9
Add files via upload
bindugeo Jan 5, 2020
4941a00
Rename 2020-1-4-UnderstandingEulersNumber.md to 2019-10-2-Understandi…
bindugeo Jan 5, 2020
569a723
Create 2019-10-14-LogisticRegression
bindugeo Jan 5, 2020
72d0b8b
Update and rename 2019-10-14-LogisticRegression to 2019-10-14-Logisti…
bindugeo Jan 8, 2020
a9b70e6
Update 2019-10-14-LogisticRegression.md
bindugeo Jan 8, 2020
91f9bbf
Update 2019-10-14-LogisticRegression.md
bindugeo Jan 8, 2020
782791f
Add files via upload
bindugeo Jan 8, 2020
e57091e
Update 2019-10-14-LogisticRegression.md
bindugeo Jan 8, 2020
f0c11d1
Create 2019-11-1-MLEntropy.md
bindugeo Jan 10, 2020
3502cd7
Update 2019-11-1-MLEntropy.md
bindugeo Jan 12, 2020
120a319
Update 2019-11-1-MLEntropy.md
bindugeo Jan 12, 2020
834b726
Add files via upload
bindugeo Jan 12, 2020
e4b2926
Create 2019-11-14-GradientDescent.md
bindugeo Jan 12, 2020
e95d63a
Update 2019-11-14-GradientDescent.md
bindugeo Jan 12, 2020
ad25be6
Update 2019-11-14-GradientDescent.md
bindugeo Jan 12, 2020
0ab029f
Add files via upload
bindugeo Jan 12, 2020
31b7adb
Create 2020-1-15-NeuralNetwork.md
bindugeo Feb 16, 2020
6e2cca5
Update 2020-1-15-NeuralNetwork.md
bindugeo Feb 17, 2020
d589eca
Update 2020-1-15-NeuralNetwork.md
bindugeo Feb 17, 2020
5665c4c
Update 2020-1-15-NeuralNetwork.md
bindugeo Feb 17, 2020
478b1b6
Add files via upload
bindugeo Feb 17, 2020
06b64d0
Update and rename 2014-3-3-Hello-World.md to 2020-1-30-CNN.md
bindugeo Mar 5, 2020
12a26d5
Update 2020-1-30-CNN.md
bindugeo Mar 5, 2020
cef8d04
Update 2020-1-30-CNN.md
bindugeo Mar 5, 2020
00b0702
Update 2020-1-30-CNN.md
bindugeo Mar 5, 2020
3da79ae
Add files via upload
bindugeo Mar 5, 2020
ca6e063
Update 2020-1-30-CNN.md
bindugeo Mar 26, 2020
8f2400c
Update 2020-1-30-CNN.md
bindugeo Mar 26, 2020
3ec7744
Add files via upload
bindugeo Mar 26, 2020
39e67cf
Update 2020-1-30-CNN.md
bindugeo Mar 26, 2020
464324c
background meta image
bindugeo Apr 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions _config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@
#

# Name of your site (displayed in the header)
name: Your Name
name: Bindu George

# Short bio or description (displayed in the header)
description: Web Developer from Somewhere
description: In pursuit of Deep Learning

# URL of your avatar or profile pic (you could use your GitHub profile pic)
avatar: https://raw.githubusercontent.com/barryclark/jekyll-now/master/images/jekyll-logo.png
Expand Down
2 changes: 1 addition & 1 deletion _includes/meta.html
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
<meta property="og:description" content="{{ site.description }}" />
{% endif %}
<meta name="author" content="{{ site.name }}" />

<meta property="og:image" content="https://raw.githubusercontent.com/barryclark/jekyll-now/master/images/nn.png" />
{% if page.title %}
<meta property="og:title" content="{{ page.title }}" />
<meta property="twitter:title" content="{{ page.title }}" />
Expand Down
10 changes: 0 additions & 10 deletions _posts/2014-3-3-Hello-World.md

This file was deleted.

70 changes: 70 additions & 0 deletions _posts/2019-10-14-LogisticRegression.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
---
layout: post
title: Logistic Regression
---

A logistic regression unit is the base unit of an Artificial Neural Network. Therefore, an understanding of Logistic Regression is essential to Deep Learning principles.

Deep neural networks and their potential have come to the fore with a new verve in the last two decades. Developments in Natural Language Processing, face recognition technologies to the latest breakthroughs in pharma like protein folding, compound reactions attest to the improved and more robust deep learning architectures of today. This field has brought together neural sciences, probability theories, computational geometry and topology, convex analysis and so on. A logistic regression unit is the base unit of an Artificial Neural Network. Therefore, understanding logistic regression is essential to Deep learning principles.

### Logistic Regression and Activation Functions

![_config.yml]({{ site.baseurl }}/images/lr2-01-reg-vs-class.png)

Above figures show the classical difference between a regression (left) and a classification (right). In machine learning, when problems surrounding classification came up, we needed to predict the probability of belonging to a class. But the linear equation

![_config.yml]({{ site.baseurl }}/images/lr2-02-eqn.png)

In order to depict probability, we now need a device that will output values between 0 and 1 _(0 ≤ p ≤1 )_ For eg , for a Binary classification problem _P(Y=1| X) = 1 - P(Y=0 |X)_ . Also, _P(Y) = 0.5_ would mean that the data point could belong to either of the class with equal chance. (in other words, right on the hyperplane dividing the two classes)

How do we transform the equation above to something that restrains itself between 0 and 1, without having to sacrifice the ease of a linear representation? Let’s do that in two steps:

1. Make it greater than 0
From my previous article, one can see that _e^z_ ( exponential function )is always greater than 0. So, if we assign

![_config.yml]({{ site.baseurl }}/images/lr2-03-abovex.png)

2. Bring _p_ down from infinity to 1 or less
Any fraction with a denominator slightly higher than its numerator is always less than 1. Therefore, if we assign

![_config.yml]({{ site.baseurl }}/images/lr2-04-pbetween.png)

, p duly falls in between -1 & 1. Voila, we have grabbed it by horns and tamed our unruly linear relation to be represented as probability.

### Logistic Sigmoid Function

![_config.yml]({{ site.baseurl }}/images/lr2-05-sigmoidfn.png)

This function is also known as Sigmoid function. As depicted here, it takes an ‘S’ form and at 0, _f(0) = 0.5_.The Sigmoid function is usually denoted as _σ_.

![_config.yml]({{ site.baseurl }}/images/lr2-06-siggraph.png)

### Tanh Function

This is also used as a sigmoid function where in it also generates an ‘S’ form, except that the range is between _-1 < f(x) < 1_.

![_config.yml]({{ site.baseurl }}/images/lr2-07-Hyptanfunction.png)

We’ll discuss more on this when we get to neural networks.

![_config.yml]({{ site.baseurl }}/images/lr2-08-tanhgraph.png)

### Logit Function

![_config.yml]({{ site.baseurl }}/images/lr2-09-logitfn.png)

, which is also called the odds of likelihood. In other words, the ratio of the _P(event)_ to _P(non-event)_

![_config.yml]({{ site.baseurl }}/images/lr2-10-oddslh.png)

This function,

![_config.yml]({{ site.baseurl }}/images/lr2-11-lnf.png)

is also called logit function, using which a Machine learning model adjusts its coefficients

![_config.yml]({{ site.baseurl }}/images/lr2-12-beta.png)

A simple logistic regression model is shown here. The weight set _{w1, w2, w3...}_ is the coefficient set _{β0, β1, β2,...}_ we try to find with the help of logit function.

![_config.yml]({{ site.baseurl }}/images/lr2-13-graph.png)
109 changes: 109 additions & 0 deletions _posts/2019-10-2-UnderstandingEulersNumber.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
---
layout: post
title: Understanding Euler's Number
---

Discover why mathematicians and statisticians are so enamored by the Exponential Constant _e_

Before one dives into the deep recesses of neural networks, it will benefit to take a good look at a peculiar number - Euler’s number (_e_ ≃ 2.71828...). Also called ‘exponential constant’, it is largely used in infinitesimal calculus - a tool relied upon by all engineering sciences and now hugely employed in Deep learning principles as well.

Interestingly, though the constant is attributed to Euler - a versatile mathematician of 18th century who made major contributions in various fields from calculus, trignometry, physics to even music, it was Bernoulli that originally stumbled upon this number. Jacob Bernoulli was working on compounding principle when he noticed that when a dollar($1) with a 100% annual interest is compounded yields higher and higher as the frequency is increased, but that it approximates to a certain value between 2.69 and 3.

![_config.yml]({{ site.baseurl }}/images/eu1-bernoullis.png)

The value of this approximation eluded Bernoulli, but his prodigious student Euler some years later found a clever approximation to this constant which he denoted as _e_. He found that

![_config.yml]({{ site.baseurl }}/images/eu2-econstant.png)

He established that the constant _e_ (2.7182824...) was irrational as well as transcendental. Euler also generalized that

![_config.yml]({{ site.baseurl }}/images/eu3-expfunc.png)

which we now have come to call as exponential function, the base of many calculus problems. To appreciate this function better, let us check out how this great mathematician connected abstract mathematics to physical realm - a complex plane

### Euler's Formula and Euler's Identity

Let’s see how a complex number plane can be represented by the series

![_config.yml]({{ site.baseurl }}/images/eu3-expfunc.png)

Using __McLaurin’s Power Series__, we know the following

![_config.yml]({{ site.baseurl }}/images/eu4-mclaurins.png)

Substituting **_ix_** in place of **_x_** in the below

![_config.yml]({{ site.baseurl }}/images/eu3-expfunc.png)

, we get

![_config.yml]({{ site.baseurl }}/images/eu5-preeulers.png)

This gives us,

![_config.yml]({{ site.baseurl }}/images/eu6-eulerseqn.png)

popularly known as **Euler’s Formula**.

This equation in turn gave way to findings like

![_config.yml]({{ site.baseurl }}/images/eu7-posteulers.png)

helpful tools in topics like Signal processing, etc.


We can also see that when _x_ is substituted by _π_ , we arrive at the Euler’s Identity

![_config.yml]({{ site.baseurl }}/images/eu8-identity.png)

implying that in an imaginary plane, growth means rotating around a circle by so many radians while real growth scales up the magnitude.

![_config.yml]({{ site.baseurl }}/images/eu9-circle.png)

### Its own Derivative

Given,

![_config.yml]({{ site.baseurl }}/images/eu9a-prediff.png)

Differentiating this expansion, we get

![_config.yml]({{ site.baseurl }}/images/eu10-diff.png)

What does this mean? The rate of growth of e^x is equal to itself i.e., e^x. This very property makes it unique and popular as the base of any functions involving natural growth or continuous function.

![_config.yml]({{ site.baseurl }}/images/eu11-graph.png)

In other words, e^x is the amount to which continuous growth will happen from time unit _1_ to time unit _x_ when rate of growth is 100%.

This should explain why _e_ is the preferred base for calculus based problems and not other digits like 2, 3 or 10.

### Natural Logarithm

We cannot discuss exponential function without also talking about its inverse - Natural logarithm

![_config.yml]({{ site.baseurl }}/images/eu12-lnx.png)

It can be interpreted as the unit of time an exponential function will take to reach a certain growth. For eg, with e^x as growth function, the number of time units required to reach y= 20 is _ln_ 20

Given,

![_config.yml]({{ site.baseurl }}/images/eu13-elnx.png)

, then differentiating on both sides wrt x, we get

![_config.yml]({{ site.baseurl }}/images/eu14-lne1.png)

, meaning horizontal traversal on the x axis when e^x grows from 1 to _e_ is 1 unit.

### Why e^x is favored for Calculus

Simply put, it is because of its unique property of being a derivative of itself

![_config.yml]({{ site.baseurl }}/images/eu15-dex.png)

Let’s say we have a function y = 5^x . This can easily be represented in terms of _e_ as

![_config.yml]({{ site.baseurl }}/images/eu16-ybar.png)

This can be easily be interpreted as the rate of growth for 5 ^(x ) is proportional to _ln_ 5. This ease and beauty is what has made mathematicians and statisticians to be so enamoured by this magical number called the ‘exponential constant’.
57 changes: 57 additions & 0 deletions _posts/2019-11-1-MLEntropy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
---
layout: post
title: Information Theory and Cross Entropy Function
---

Next, let's learn about a concept that's applied to diverse areas such a Machine Learning, Information and Statistical Theories

### Best Fitting through Maximum Likelihood

Last time, we saw that a logistic unit adjusts and readjusts the weights {w1, w2, w3...} or coefficients {β_0, β_1,β_2} to arrive at a generalized solution. This process is also known as Best Fitting as you can see in the diagram below - finding out the line that best fits the maximum number of data points. That would mean we are looking for a solution that has ‘maximum likelihood of correctness‘ of belonging to a certain category.

### Maximum Likelihood Estimation

Remember our logit function ranges from (-∞, +∞), and that the dependent variable (Y) is binary (0, 1) since we are predicting the probability of belonging to a class. Maximum likelihood can then be pivoted around the number of successes the σ or 〖logit〗^(-1) plot encapsulates (Probability Mass function).

For eg, if a fair coin is tossed 2 times and the number of times H (head) appears is 4, then the

![_config.yml]({{ site.baseurl }}/images/ml3-01-prob.png)

Note this formula holds as Y is discrete and binary (Bernoulli Distribution).

Therefore, the Likelihood (logistic function) or _L(W|X)_, where W is coefficient set {w1, w2, w3...} when there are N independent and normally distributed samples can be given as

![_config.yml]({{ site.baseurl }}/images/ml3-02-lwx.png)

The above equation can be simplified even more if we apply log. Maximizing the likelihood will also end up maximizing the log of likelihood. Moreover, a log (product of N samples) = sum of individual log, which is much easier to compute and compare. Hence,

![_config.yml]({{ site.baseurl }}/images/ml3-03-log.png)

### Logistic Cost/Cross Entropy Function

The inverse of the point where Maximum likelihood converges would give the least error function (Cross Entropy Function / Cost Function) which we denote as J.

![_config.yml]({{ site.baseurl }}/images/ml3-04-entfn.png)

### Information Theory and Cross Entropy

In fact, the concept of Cross Entropy has come from Information theory founded by Claude Shannon, an American mathematician and Electrical Engineer. In his 1948 book,’ A Mathematical Theory of Communication ‘- a study on sending data with least error from one point to the other, he writes that the more predictable an outcome of a communication channel is, less un-certainty on the recipient side.

Simply put, if there’s a communication channel that outputs a sequence of 2 letters [A,B ] and the recipient is asked to predict the next letter that will come through the channel , prediction is easier when there’s a heavy bias to either A or B than when both having equal chance of being output. In other words, predicting A is easier when its probability is 75% than when it is 50 % or less. If this was represented by a binary decision tree , ‘A’ would come up much higher in level than B. Therefore, the total entropy =

![_config.yml]({{ site.baseurl }}/images/ml3-05-wtbias.png)

It is apparent from the figure (source: Wikipedia) that the maximum entropy is when it is uniformly distributed. In our example, therefore the entropy is

![_config.yml]({{ site.baseurl }}/images/ml3-06-entropy.png)

This will be the uncertainity of the network given the probability. But if the prediction by recipient doesn’t match with true prediction, and A’s probability is wrongly predicted to be 60%, then it brings in some divergence from the true entropy as the entropy rises to.
This is Cross Entropy, given as

![_config.yml]({{ site.baseurl }}/images/ml3-07-xentropy.png)

It’s amazing to behold how the same concept is applied across diverse areas such as Machine learning, Information and Statistical theories.



The easiest way to make your first post is to edit this one. Go into /_posts/ and update the Hello World markdown file. For more instructions head over to the [Jekyll Now repository](https://github.com/barryclark/jekyll-now) on GitHub.
66 changes: 66 additions & 0 deletions _posts/2019-11-14-GradientDescent.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
---
layout: post
title: Gradient Descent and Global Minimum
---

The next concept is important when we do back propagation calculations in neural networks.

![_config.yml]({{ site.baseurl }}/images/gd4-01-entfn.png)

Our aim is to minimize this error function to its least possible value , to adjust the coefficients ( β_0,β_1,β_2...) or in other words the weights (w_0,w_1,w_2... ) of our logisitic unit because Minimizing error → Maximizing efficiency

### Convex Minimization

Convexity of a function in a way can be defined as ALL lines connecting two points on the graph appears on/above the graph or function.

![_config.yml]({{ site.baseurl }}/images/gd4-02-convex.png)

Let’s translate our error function to a 3D landscape, say a Golf course, and assume our current position in the field relate to the current coefficients in our error function . In order to optimize the function, we may want to find the lowest point in the entire field, which will be our **Global Minimum**. Mind you, there might be low lying areas which could be categorized as **Local Minimum**, but your machine learning algorithm should keep going looking for its lowest point without getting caught in these mini contours. In order to reach this global minimum, we scale the landscape in small steps using an optimization method called **Gradient Descent**.

![_config.yml]({{ site.baseurl }}/images/gd4-03-gdgraph.png)

In linear regression the Cost function usually resorted to is Mean Squared Error (MSE) , where

![_config.yml]({{ site.baseurl }}/images/gd4-04-mse.png)

Being a quadratic equation, it surely promised a topology close to an upward parabola. However, we soon found out that on logistic parameters, this can generate more than one local minimum rendering optimization difficult. In contrast, the logistic cost function when plotted gives a symmetric convex plot with single minima, which is its global minima. Figure below shows when logistic cost function was plotted. The global minimum is at point 0.5.

![_config.yml]({{ site.baseurl }}/images/gd4-05-logy.png)

But wait a minute, our logistic function is made up of log(y) & log (1 -y). Logarithmic expressions are always linear, and how come we have a plot similar to a parabola here?. The trick is mini stepping. In this case, I used a step of 0.025 → p = np.arange(0,1,0.025) . (Remember, a circle is made up of infinitesimally small straight lines). So, our descension from the top of the plot was in small magnitude of 0.025. Had I increased the stepping, the convergence at the Global minima would have been quicker. Hence this rate is also called **Learning rate** in ML lingo.

### Gradient Descent

This is an optimization technique adopted to learn and converge to global minimum. In essence, you find out the slope of a random point on plot and descend or ascend the plot in search of ‘minima’. When the direction of the slope changes, that is an indication that we have swung past the minima and hence need to move back in the opposite direction.

Generally, we learn the optimum parameters (weights or biases) by adjusting and readjusting as below

![_config.yml]({{ site.baseurl }}/images/gd4-06-wnew.png)

### Gradient descent on cross entropy function

![_config.yml]({{ site.baseurl }}/images/gd4-07-lpy.png)

#### i)

![_config.yml]({{ site.baseurl }}/images/gd4-08-step1.png)

#### ii)

![_config.yml]({{ site.baseurl }}/images/gd4-09-step2.png)

### iii)

![_config.yml]({{ site.baseurl }}/images/gd4-10-step3.png)

More important than the extensive calculation is that, to find the overall effect w_1has on the Loss function we first derive the effect of z on minimum Likelihood function

In other words, if we denote

![_config.yml]({{ site.baseurl }}/images/gd4-11-dlpy.png)

and so on, we could state as follows

![_config.yml]({{ site.baseurl }}/images/gd4-12-dlw1.png)

In order to find the derivative on current node, we only need to take into consideration the derivative of its immediate successor node. And this makes it an easier technique to start the derivation from the outer and work backwards. This is an important concept to keep in mind when we do back propagation calculations in neural networks.
Loading