barryclark · bindugeo · Jan 4, 2020 · Jan 5, 2020 · Jan 5, 2020 · Jan 5, 2020
diff --git a/_config.yml b/_config.yml
@@ -3,10 +3,10 @@
 #
 
 # Name of your site (displayed in the header)
-name: Your Name
+name: Bindu George
 
 # Short bio or description (displayed in the header)
-description: Web Developer from Somewhere
+description: In pursuit of Deep Learning
 
 # URL of your avatar or profile pic (you could use your GitHub profile pic)
 avatar: https://raw.githubusercontent.com/barryclark/jekyll-now/master/images/jekyll-logo.png

diff --git a/_includes/meta.html b/_includes/meta.html
@@ -11,7 +11,7 @@
     <meta property="og:description" content="{{ site.description }}" />
     {% endif %}
     <meta name="author" content="{{ site.name }}" />
-
+    <meta property="og:image" content="https://raw.githubusercontent.com/barryclark/jekyll-now/master/images/nn.png" />
     {% if page.title %}
     <meta property="og:title" content="{{ page.title }}" />
     <meta property="twitter:title" content="{{ page.title }}" />

diff --git a/_posts/2014-3-3-Hello-World.md b/_posts/2014-3-3-Hello-World.md
diff --git a/_posts/2019-10-14-LogisticRegression.md b/_posts/2019-10-14-LogisticRegression.md
@@ -0,0 +1,70 @@
+---
+layout: post
+title: Logistic Regression
+---
+
+A logistic regression unit is the base unit of an Artificial Neural Network. Therefore, an understanding of Logistic Regression is essential to Deep Learning principles.
+
+Deep neural networks and their potential have come to the fore with a new verve in the last two decades. Developments in Natural Language Processing, face recognition technologies to the latest breakthroughs in pharma like protein folding, compound reactions attest to the improved and more robust deep learning architectures of today. This field has brought together neural sciences, probability theories, computational geometry and topology, convex analysis and so on.  A logistic regression unit is the base unit of an Artificial Neural Network. Therefore, understanding logistic regression is essential to Deep learning principles. 
+
+### Logistic Regression and Activation Functions
+
+![_config.yml]({{ site.baseurl }}/images/lr2-01-reg-vs-class.png)
+
+Above figures show the classical difference between a regression (left) and a classification (right). In machine learning, when problems surrounding classification came up, we needed to predict the probability of belonging to a class. But the linear equation
+
+![_config.yml]({{ site.baseurl }}/images/lr2-02-eqn.png)
+
+In order to depict probability, we now need a device that will output values between 0 and 1 _(0 ≤ p ≤1 )_  For eg , for a Binary classification problem  _P(Y=1| X)  = 1 - P(Y=0 |X)_  . Also, _P(Y) = 0.5_ would mean that the data point could belong to either of the class with equal chance. (in other words, right on the hyperplane dividing the two classes) 
+
+How do we transform the equation above to something that restrains itself between 0 and 1, without having to sacrifice the ease of a linear representation? Let’s do that in two steps:
+
+1. Make it greater than 0
+   From my previous article, one can see that _e^z_  ( exponential function )is always greater than 0.  So, if we assign
+
+   ![_config.yml]({{ site.baseurl }}/images/lr2-03-abovex.png)
+
+2. Bring _p_ down from infinity to 1 or less
+   Any fraction with a denominator slightly higher than its numerator is always less than 1.  Therefore, if we assign
+
+   ![_config.yml]({{ site.baseurl }}/images/lr2-04-pbetween.png)
+
+   , p duly falls in between -1 & 1. Voila, we have grabbed it by horns and tamed our unruly linear relation to be represented as probability.  
+
+### Logistic Sigmoid Function
+
+![_config.yml]({{ site.baseurl }}/images/lr2-05-sigmoidfn.png)
+
+This function is also known as Sigmoid function. As depicted here, it takes an ‘S’ form and at 0, _f(0) = 0.5_.The Sigmoid function is usually denoted as _σ_.  
+
+![_config.yml]({{ site.baseurl }}/images/lr2-06-siggraph.png)
+
+### Tanh Function
+
+This is also used as a sigmoid function where in it also generates an ‘S’ form, except that the range is between _-1 < f(x) < 1_. 
+
+![_config.yml]({{ site.baseurl }}/images/lr2-07-Hyptanfunction.png)
+
+We’ll discuss more on this when we get to neural networks.  
+
+![_config.yml]({{ site.baseurl }}/images/lr2-08-tanhgraph.png)
+
+### Logit Function
+
+![_config.yml]({{ site.baseurl }}/images/lr2-09-logitfn.png)
+
+, which is also called the odds of likelihood. In other words, the ratio of the _P(event)_ to _P(non-event)_
+
+![_config.yml]({{ site.baseurl }}/images/lr2-10-oddslh.png)
+
+This function,
+
+![_config.yml]({{ site.baseurl }}/images/lr2-11-lnf.png)
+
+is also called logit function, using which a Machine learning model adjusts its coefficients 
+
+![_config.yml]({{ site.baseurl }}/images/lr2-12-beta.png)
+
+A simple logistic regression model is shown here.  The weight set _{w1, w2, w3...}_ is the coefficient set _{β0, β1, β2,...}_ we try to find with the help of logit function.
+
+![_config.yml]({{ site.baseurl }}/images/lr2-13-graph.png)
diff --git a/_posts/2019-10-2-UnderstandingEulersNumber.md b/_posts/2019-10-2-UnderstandingEulersNumber.md
@@ -0,0 +1,109 @@
+---
+layout: post
+title: Understanding Euler's Number
+---
+
+Discover why mathematicians and statisticians are so enamored by the Exponential Constant _e_
+
+Before one dives into the deep recesses of neural networks, it will benefit to take a good look at a peculiar number - Euler’s number (_e_ ≃ 2.71828...). Also called ‘exponential constant’, it is largely used in infinitesimal calculus - a tool relied upon by all engineering sciences and now hugely employed in Deep learning principles as well. 
+
+Interestingly, though the constant is attributed to Euler - a versatile mathematician of 18th century who made major contributions in various fields from calculus, trignometry, physics to even music, it was Bernoulli that originally stumbled upon this number. Jacob Bernoulli was working on compounding principle when he noticed that when a dollar($1) with a 100% annual interest is compounded yields higher and higher as the frequency is increased, but that it approximates to a certain value between 2.69 and 3. 
+
+![_config.yml]({{ site.baseurl }}/images/eu1-bernoullis.png)
+
+The value of this approximation eluded Bernoulli, but his prodigious student Euler some years later found a clever approximation to this constant which he denoted as _e_. He found that
+
+![_config.yml]({{ site.baseurl }}/images/eu2-econstant.png)
+
+He established that the constant _e_ (2.7182824...) was irrational as well as transcendental.  Euler also generalized that
+
+![_config.yml]({{ site.baseurl }}/images/eu3-expfunc.png)
+
+which we now have come to call as exponential function, the base of many calculus problems. To appreciate this function better, let us check out how this great mathematician connected abstract mathematics to physical realm - a complex plane
+
+### Euler's Formula and Euler's Identity
+
+Let’s see how a complex number plane can be represented by the series
+
+![_config.yml]({{ site.baseurl }}/images/eu3-expfunc.png)
+
+Using __McLaurin’s Power Series__, we know the following
+
+![_config.yml]({{ site.baseurl }}/images/eu4-mclaurins.png)
+
+Substituting **_ix_** in place of **_x_** in the below
+
+![_config.yml]({{ site.baseurl }}/images/eu3-expfunc.png)
+
+, we get
+
+![_config.yml]({{ site.baseurl }}/images/eu5-preeulers.png)
+
+This gives us,
+
+![_config.yml]({{ site.baseurl }}/images/eu6-eulerseqn.png)
+
+popularly known as **Euler’s Formula**. 
+
+This equation in turn gave way to findings like
+
+![_config.yml]({{ site.baseurl }}/images/eu7-posteulers.png)
+
+helpful tools in topics like Signal processing, etc. 
+
+
+We can also see that when _x_ is substituted by _π_ ,  we arrive at the Euler’s Identity
+
+![_config.yml]({{ site.baseurl }}/images/eu8-identity.png)
+
+implying that in an imaginary plane, growth means rotating around a circle by so many radians while real growth scales up the magnitude. 
+
+![_config.yml]({{ site.baseurl }}/images/eu9-circle.png)
+
+### Its own Derivative
+
+Given,
+
+![_config.yml]({{ site.baseurl }}/images/eu9a-prediff.png)
+
+Differentiating this expansion, we get 
+
+![_config.yml]({{ site.baseurl }}/images/eu10-diff.png)
+
+What does this mean? The rate of growth of e^x is equal to itself i.e., e^x. This very property makes it unique and popular as the base of any functions involving natural growth or continuous function.
+
+![_config.yml]({{ site.baseurl }}/images/eu11-graph.png)
+
+In other words, e^x is the amount to which continuous growth will happen from time unit _1_ to time unit _x_ when rate of growth is 100%.   
+
+This should explain why _e_ is the preferred base for calculus based problems and not other digits like 2, 3  or 10.
+
+### Natural Logarithm
+
+We cannot discuss exponential function without also talking about its inverse - Natural logarithm
+
+![_config.yml]({{ site.baseurl }}/images/eu12-lnx.png)
+
+It can be interpreted as the unit of time an exponential function will take to reach a certain growth. For eg, with e^x as growth function, the number of time units required to reach y= 20 is _ln_ 20 
+
+Given, 
+
+![_config.yml]({{ site.baseurl }}/images/eu13-elnx.png)
+
+, then differentiating on both sides wrt x, we get 
+
+![_config.yml]({{ site.baseurl }}/images/eu14-lne1.png)
+
+, meaning horizontal traversal on the x axis when e^x grows from 1 to _e_ is 1 unit. 
+
+### Why e^x is favored for Calculus
+
+Simply put, it is because of its unique property of being a derivative of itself 
+
+![_config.yml]({{ site.baseurl }}/images/eu15-dex.png)
+
+Let’s say we have a function y = 5^x . This can easily be represented in terms of _e_ as 
+
+![_config.yml]({{ site.baseurl }}/images/eu16-ybar.png)
+
+This can be easily be interpreted as the rate of growth for 5 ^(x ) is proportional to _ln_ 5.  This ease and beauty is what has made mathematicians and statisticians to be so enamoured by this magical number called the ‘exponential constant’. 
diff --git a/_posts/2019-11-1-MLEntropy.md b/_posts/2019-11-1-MLEntropy.md
@@ -0,0 +1,57 @@
+---
+layout: post
+title: Information Theory and Cross Entropy Function
+---
+
+Next, let's learn about a concept that's applied to diverse areas such a Machine Learning, Information and Statistical Theories
+
+### Best Fitting through Maximum Likelihood
+
+Last time, we saw that a logistic unit adjusts and readjusts the weights {w1, w2, w3...}  or coefficients {β_0, β_1,β_2} to arrive at a generalized solution. This process is also known as Best Fitting as you can see in the diagram below - finding out the line that best fits the maximum number of data points. That would mean we are looking for a solution that has ‘maximum likelihood of correctness‘ of belonging to a certain category. 
+
+### Maximum Likelihood Estimation
+
+Remember our logit function ranges from (-∞, +∞), and that the dependent variable (Y) is binary (0, 1) since we are predicting the probability of belonging to a class.  Maximum likelihood can then be pivoted around the number of successes the σ  or 〖logit〗^(-1)  plot encapsulates (Probability Mass function). 
+
+For eg, if a fair coin is tossed 2 times and the number of times H (head) appears is 4, then the   	
+
+![_config.yml]({{ site.baseurl }}/images/ml3-01-prob.png)
+
+Note this formula holds as Y is discrete and binary (Bernoulli Distribution).
+
+Therefore, the Likelihood (logistic function) or _L(W|X)_, where W is coefficient set {w1, w2, w3...} when there are N independent and normally distributed samples can be given as 
+
+![_config.yml]({{ site.baseurl }}/images/ml3-02-lwx.png)
+
+The above equation can be simplified even more if we apply log. Maximizing the likelihood will also end up maximizing the log of likelihood. Moreover, a log (product of N samples) = sum of individual log, which is much easier to compute and compare. Hence, 
+
+![_config.yml]({{ site.baseurl }}/images/ml3-03-log.png)
+
+### Logistic Cost/Cross Entropy Function 
+
+The inverse of the point where Maximum likelihood converges would give the least error function (Cross Entropy Function / Cost Function) which we denote as J. 
+
+![_config.yml]({{ site.baseurl }}/images/ml3-04-entfn.png)
+
+### Information Theory and Cross Entropy
+
+In fact, the concept of Cross Entropy has come from Information theory founded by Claude Shannon, an American mathematician and Electrical Engineer. In his 1948 book,’ A Mathematical Theory of Communication ‘- a study on sending data with least error from one point to the other, he writes that the more predictable an outcome of a communication channel is, less un-certainty on the recipient side. 
+
+Simply put, if there’s a communication channel that outputs a sequence of 2 letters [A,B ]  and the recipient is asked to predict the next letter that will come through the channel  , prediction is easier when there’s a heavy bias to either A or B  than when both having equal chance of being output. In other words, predicting A is easier when its probability is 75% than when it is 50 % or less. If this was represented by a binary decision tree ,  ‘A’ would come up much higher in level than B. Therefore, the total entropy =
+
+![_config.yml]({{ site.baseurl }}/images/ml3-05-wtbias.png)
+
+It is apparent from the figure (source: Wikipedia) that the maximum entropy is when it is uniformly distributed. In our example, therefore the entropy is
+
+![_config.yml]({{ site.baseurl }}/images/ml3-06-entropy.png)
+
+This will be the uncertainity of the network given the probability. But if the prediction by recipient doesn’t match with true prediction, and A’s probability is wrongly predicted to be 60%, then it brings in some divergence from the true entropy as the entropy rises to.  
+This is Cross Entropy, given as
+
+![_config.yml]({{ site.baseurl }}/images/ml3-07-xentropy.png)
+
+It’s amazing to behold how the same concept is applied across diverse areas such as Machine learning, Information and Statistical theories. 
+
+
+
+The easiest way to make your first post is to edit this one. Go into /_posts/ and update the Hello World markdown file. For more instructions head over to the [Jekyll Now repository](https://github.com/barryclark/jekyll-now) on GitHub.
diff --git a/_posts/2019-11-14-GradientDescent.md b/_posts/2019-11-14-GradientDescent.md
@@ -0,0 +1,66 @@
+---
+layout: post
+title: Gradient Descent and Global Minimum
+---
+
+The next concept is important when we do back propagation calculations in neural networks.  
+
+![_config.yml]({{ site.baseurl }}/images/gd4-01-entfn.png)
+
+Our aim is to minimize this error function to its least possible value , to adjust the coefficients ( β_0,β_1,β_2...)  or in other words the weights (w_0,w_1,w_2... ) of our logisitic unit because Minimizing error → Maximizing efficiency 
+
+### Convex Minimization
+
+Convexity of a function in a way can be defined as ALL lines connecting two points on the graph appears on/above the graph or function.
+
+![_config.yml]({{ site.baseurl }}/images/gd4-02-convex.png)
+
+Let’s translate our error function to a 3D landscape, say a Golf course, and assume our current position in the field relate to the current coefficients in our error function . In order to optimize the function, we may want to find the lowest point in the entire field, which will be our **Global Minimum**. Mind you, there might be low lying areas which could be categorized as **Local Minimum**, but your machine learning algorithm should keep going looking for its lowest point without getting caught in these mini contours.  In order to reach this global minimum, we scale the landscape in small steps using an optimization method called **Gradient Descent**.
+
+![_config.yml]({{ site.baseurl }}/images/gd4-03-gdgraph.png)
+
+In linear regression the Cost function usually resorted to is Mean Squared Error (MSE) , where
+
+![_config.yml]({{ site.baseurl }}/images/gd4-04-mse.png)
+
+Being a quadratic equation, it surely promised a topology close to an upward parabola. However, we soon found out that on logistic parameters, this can generate more than one local minimum rendering optimization difficult.  In contrast, the logistic cost function when plotted gives a symmetric convex plot with single minima, which is its global minima.  Figure below shows when logistic cost function was plotted. The global minimum is at point 0.5.
+
+![_config.yml]({{ site.baseurl }}/images/gd4-05-logy.png)
+
+But wait a minute, our logistic function is made up of log(y) & log (1 -y). Logarithmic expressions are always linear, and how come we have a plot similar to a parabola here?. The trick is mini stepping. In this case, I used a step of 0.025   → p = np.arange(0,1,0.025) . (Remember, a circle is made up of infinitesimally small straight lines). So, our descension from the top of the plot was in small magnitude of 0.025. Had I increased the stepping, the convergence at the Global minima would have been quicker. Hence this rate is also called **Learning rate** in ML lingo.
+
+### Gradient Descent
+
+This is an optimization technique adopted to learn and converge to global minimum. In essence, you find out the slope of a random point on plot and descend or ascend the plot in search of ‘minima’. When the direction of the slope changes, that is an indication that we have swung past the minima and hence need to move back in the opposite direction.  
+
+Generally, we learn the optimum parameters (weights or biases) by adjusting and readjusting as below 
+
+![_config.yml]({{ site.baseurl }}/images/gd4-06-wnew.png)
+
+### Gradient descent on cross entropy function
+
+![_config.yml]({{ site.baseurl }}/images/gd4-07-lpy.png)
+
+#### i)
+
+![_config.yml]({{ site.baseurl }}/images/gd4-08-step1.png)
+
+#### ii)
+
+![_config.yml]({{ site.baseurl }}/images/gd4-09-step2.png)
+
+### iii)
+
+![_config.yml]({{ site.baseurl }}/images/gd4-10-step3.png)
+
+More important than the extensive calculation is that, to find the overall effect w_1has on the Loss function we first derive the effect of  z on minimum Likelihood function
+
+In other words, if we denote
+
+![_config.yml]({{ site.baseurl }}/images/gd4-11-dlpy.png)
+
+and so on, we could state as follows
+
+![_config.yml]({{ site.baseurl }}/images/gd4-12-dlw1.png)
+
+In order to find the derivative on current node, we only need to take into consideration the derivative of its immediate successor node. And this makes it an easier technique to start the derivation from the outer and work backwards.   This is an important concept to keep in mind when we do back propagation calculations in neural networks.