The post Machine Learning part 7: random forests appeared first on Python Members Club.

]]>Machine Learning

♡ supervised learning

♡ unsupervised learning

♡ reinforcement learning

recap:

types of supervised learning

classification

regression

mixed

– tree based

– random forest

– neural networks

– support vector machines

overfitting and the problem with trees

trees classify by drawing square boxes around the data, which does not work well where many separations are needed. it overfits the data

pruning

pruning means to trim (a tree, shrub, or bush) by cutting away dead or overgrown branches or stems (or unwanted parts), especially to encourage growth.

for example, if you had a single data (a leaf) that is making your square a large one, so you just remove it so that the tree still maintains relevence. improves accuracy of the tree.

random forests

just like a forest is made up of trees, similarly random forest is a machine learning method made up of tree methods. random as it randomises the input of the trees.

the basic form of it acts on tally. a tree gives out an output, this one collects all the outputs of the trees and try to make up an answer. if 10% of trees said A and 90% said B, it will say B (majority). much like you ask questions to 10 people as to what book they think will be most popular this year or what team will win. just here as the inputs are randomised, each trees are not the same. it might be further tweaked to give more or less meanings to the answers in the tree.

use of random forests

used to identify customer type, whether people will like a product or predict behaviour.

exercise

1) dig more into the uses of random forests

2) compare it’s implementation across languages ( with libs included ), how elegant you think python is?

next:

support vector machines#8 support vector machines

The post Machine Learning part 7: random forests appeared first on Python Members Club.

]]>The post Machine Learning part 6: enthropy and gain appeared first on Python Members Club.

]]> ♡ supervised learning

♡ unsupervised learning

♡ reinforcement learning

recap:

types of supervised learning

classification

regression

mixed

- tree based :balloon:
- random forest
- neural networks
- support vector machines

enthropy

enthropy is just another word for expected value

in the past post, we decided what to use to split based on purity index. we can do the same thing mathematically (easier) by

P+ means probability of target (good day in our case)

P- means probability not target (bad day)

log2(P-) means log(P-) base 2

H = -(P+)log2(P+) – (P-)log2(P-)

the above is the formula for the purity of subset. it measures how likely you get a positive element if you select randomly from a particular subset

let us take teacher’s presence. we had

absent

good 5 bad 1

present

good 2 bad 1

H(absent) = -(5/6)log2(5/6) – (1/6)log2(1/6)

= 0.65

H(present) = -(2/3)log2(2/3) – (1/3)log2(1/3)

= 0.92

0 is extremely pure and 1 is extremely impure

so absent is purer than than present

gain

gain is used to determine what to split first by finding the feature with the highrst gain. 0 means irrelevent. 1 means very relevent

gain is defined by

gain = H(S) − Σ (SV/S) H(SV)

S -> total number of points in leaf

v is the possible values and SV -> subset for which we have those values

H is our enthropy as above

taking for presence, our gain is

our S is 9 that is (6+3)

gain = H(presence) – sum( SV/9 * H(SV))

gain = H(presence) – (6/9 * 0.65) – (3/9 * 0.92)

our H(presence) is -(7/9)log2(7/9) – (2/9)log2(2/9)

= 0.76

gain = 0.76 – (6/9 * 0.65) – (3/9 * 0.92)

gain = 0.02

exercise:

google up information gain related to decision trees as well as associated concepts

next:

random forest

The post Machine Learning part 6: enthropy and gain appeared first on Python Members Club.

]]>The post Machine Learning part 5: mixed methods appeared first on Python Members Club.

]]>Machine Learning

♡ supervised learning

♡ unsupervised learning

♡ reinforcement learning

types of supervised learning

classification

regression

mixed

– tree based

– random forest

– neural networks

– support vector machines

mixed methods are used for classification and regression.

tree based method

those trees used for both for classification and regression are called Classification And Regression Trees (CART) models

let us say that we want to predict whether an event will be good or bad, the event being having a good day at school

our data looks as follows

t. represents teacher

a means abscent

p means present

mood stands for parents’ mood

g good. b bad

hwork means homework

d done

nd not done

t. | mood | hwork | result

—————————————

a | g | d | g

p | b | d | g

a | g | nd | g

a | b | d | b

p | b | nd | b

a | g | nd | g

p | b | d | g

a | g | nd | g

a | g | d | g

let us say that today the student entered the school. he wants to know how his day will go, today he has

t. p

mood g

hwork nd

the first step is to split the tree to get a high purity index

if we split by teacher’s presence first, we get

a

good 5 bad 1

p

good 2 bad 1

if we split by parents’ mood we get

g

good 5 bad 0

b

good 2 bad 2

if we split by homework done we get

d

good 4 bad 1

nd

good 3 bad 1

the highest index of purity was with parents’ mood with good 5 and 0 bad day

we start with it

mood

— g

a | g | d | g

a | g | nd | g

a | g | nd | g

a | g | nd | g

a | g | d | g

good 5 bad 0

— b

p | b | d | g

a | b | d | b

p | b | nd | b

p | b | d | g

good 2 bad 2

so bad mood must be split further as good mood had 100% purity with 5 good result

now our condition is

t. p

mood g

hwork nd

if we go for mood g, we can stop spliting as our purity is 100%. we’ll get a good day

next:

enthropy and gain

random forest

support vector machines (SVM)

neural networks

The post Machine Learning part 5: mixed methods appeared first on Python Members Club.

]]>The post Machine Learning part 4: Gradient Descent and cost function appeared first on Python Members Club.

]]>Machine Learning

♡ supervised learning

♡ unsupervised learning

♡ reinforced learning

cost function is also called mean squared error.

well mean means sum of elements / number of elements. here we take the sum of all squared errors

(error1 ^ 2 + error2 ^ 2 + error3 ^ 2)/3

/3 as there are 3 errors

we define error as the difference in y between your point and the y on thwline you are trying to fit

-> y predicted – y on line

-> y predicted – m*x on line + c

wikipaedia defines gradient descent such:

Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function.

well first, that has nothing specific to machine learning but concerns more maths

iterative means it repeats a process again and again

the minimum of a function is the lowest point of a u shape curve

in machine learning it means finding the line of best fit for a given training data

if you plot cost v/s m or cost v/s c you’ll get a u shape graph

when you plot the above graph, the minimum is the point where the error is the lowest (that’s when you get the line of best fit). now, how exactly do you find it?

well you plot either of the above graphs i.e. cost versus m or cost versus b and you find the minimum by checking the gradient at each point. at the minimum the gradient is 0 (gradient of straight line)

for the curve, you apply calculus to get the gradient function.

well when you start, you need to check the gradient after an interval of c. when we see the gradient rising again we know we’ve found our minimum.

too small an interval of c (step) and your program might run too long

too big an interval of c and you miss your minimum

the code

import numpy as np def gradient_descent(x,y): m_curr = b_curr = 0 iterations = 10000 n = len(x) learning_rate = 0.08 for i in range(iterations): y_predicted = m_curr * x + b_curr cost = (1/n) * sum([val**2 for val in (y-y_predicted)]) md = -(2/n)*sum(x*(y-y_predicted)) bd = -(2/n)*sum(y-y_predicted) m_curr = m_curr - learning_rate * md b_curr = b_curr - learning_rate * bd print ("m {}, b {}, cost {} iteration {}".format(m_curr,b_curr,cost, i)) x = np.array([1,2,3,4,5]) y = np.array([5,7,9,11,13]) gradient_descent(x,y)

notes

md means derivative (gradient) of m

the above calculates the m and c to get the right amount to find the relationship between 1 and 5, 2 and 7 etc (see array in code)

exercise:

1. google up stochastic gradient descent

code credits: code basics

The post Machine Learning part 4: Gradient Descent and cost function appeared first on Python Members Club.

]]>The post Machine Learning part 2: supervised learning appeared first on Python Members Club.

]]>**♡ supervised learning**

♡ unsupervised learning

♡ reinforcement learning

**#2 supervised learning**

in supervised learning we have labelled data available. the machine just see and do as we do.

types of supervised learning

classification :

– logistic regression

– supervised clustering

regression

– linear regression (single value)

– multivariate linear regression

mixed

– tree based

– random forest

– neural networks

– support vector machine

**classification** :

– **logistic regression**

classifying data into categories. instead of predicting continuous values, we predict discrete values.

* continuous and discrete values*

continuous values means if you have y = mx + c, you can have y for x: 34, x:34.1, x:34.12 etc, as much as you want. discrete values means the next value after a is b then c etc

to sum it, there are two ways to see counting. one is you say: 1 2 3 4 (discrete), and one you see infinity between 1 and 2 like 1.1, 1.11, 1.111111 etc (continuous)

logistic : the careful organization of a complicated activity so that it happens in a successful and effective way

regression means prediction

le us say we have a table of watermelon and apple weights. let us put a 0 for watermelon and a 1 for apple

our data in kg/(0,1) :

9.5 | 0

10.0 | 0

0.1 | 1

9.4 | 0

0.2 | 1

etc

we build a model according to our data. so, light weights (0.1 to 0.2) seem to be 1 and heavy (10 to 11) seems to be 0. that’s our model.

now we throw in a weight, let us say 10.4, it will pass it in the model and say 0.

– **supervised clustering methods (kNN)**

well that is a kmeans method. let us say we have some data

:cherry_blossom:

petal – plant type

length | width | type

——————————–

2 | 0.5 | A

4 | 1 | B

1.5 | 0.3 | A

6 | 1.7 | B

let us have a sample of petal length 5 and petal width 1.5 and we want to say if it’s of type A or B. if we plot the graph, we can see two points (a cluster) at the bottom left corner and two points (another cluster) at the top right corner. 5,1.5 is nearer to the top right corner, hence type B

but how do we define near? near is by distance. we calculate distance between each coordinate and if it is nearer to more type B, we say it is of type B. now we can add more data and they will thus be classified

next:

**regression**

The post Machine Learning part 2: supervised learning appeared first on Python Members Club.

]]>The post Machine Learning part 1: introduction appeared first on Python Members Club.

]]>wikipaedia defines machine learning as:

_the scientific study of algorithms and statistical models that computer systems use to progressively improve their performance on a specific task. Machine learning algorithms build a mathematical model of sample data, known as “training data”, in order to make predictions or decisions without being explicitly programmed to perform the task_

data -> train -> do tasks

since data is used, it should be cleaned etc. knowing what data to collect, how etc, also about formulas needed, is part of data science

machine learning is 100% applied maths, more specifically, statistics :bar_chart: plays a very important part. a computer is there to calculate and repeat faster

image recognition like recognising digits and letters from pictures, books recommendations on some sites, criminal identification on anonymous networks, speech recognition etc.

♡ supervised learning

♡ unsupervised learning

♡ reinforcement learning

The post Machine Learning part 1: introduction appeared first on Python Members Club.

]]>The post calculating distance for four features appeared first on Python Members Club.

]]>A (1,2,5,1)

B (7, 1, 3, 0)

import math dist = math.sqrt((7-1)**2+(1-2)**2+(3-5)**2+(0-1)**2) print(dist) >>> 6.4807 ...

The post calculating distance for four features appeared first on Python Members Club.

]]>The post home machine learning project : identifying cutlery items appeared first on Python Members Club.

]]>from start to finish, machine learning is just statistics, equations, calculations and repetitions. you just code an algorithm. tensors are just big words for matrices.

we are identifying items from my kitchen so that when presented with an unidentified cutlery, we can attempt to classify it

we will be identifying plates, mugs and bowls. here is an overview of their common characteristics

here is our measurement sheet, it could have been a spreadsheet. heights in cm

we reformat it into cutlery.csv as follows :

D,h,type 17,5,bowl 15.5,5,bowl 22.5,1,plate 8,7.5,mug 8,9,mug 6,11,mug 6,10,mug 24,2.5,plate 26,2,plate 11,8,mug 18,5.5,bowl 14,8,bowl

let us plot our data :

now to select the entire D column we do :

df['D']

outputs :

0 17.0 1 15.5 2 22.5 3 8.0 4 8.0 5 6.0 6 6.0 7 24.0 8 26.0 9 11.0 10 18.0 11 14.0 Name: D, dtype: float64

same for df[‘h’]

please see this article on annotating data

code :

import pandas as pd import matplotlib.pyplot as plt # df is short for dataframe df = pd.read_csv("<path here>/Desktop/cutlery.csv") D = df['D'] h = df['h'] for i,type_ in enumerate(df['type']): D_ = D[i] h_ = h[i] if type_ == 'bowl': plt.scatter(D_, h_, marker='o', color='red', label='a') plt.text(D_+0.3, h_+0.3, type_, fontsize=9) elif type_ == 'mug': plt.scatter(D_, h_, marker='o', color='blue') plt.text(D_+0.3, h_+0.3, type_, fontsize=9) elif type_ == 'plate': plt.scatter(D_, h_, marker='o', color='green') plt.text(D_+0.3, h_+0.3, type_, fontsize=9) plt.show()

output :

now let us say that we have this :

width D : 8 and h : 7.5

without seeing the image, having D:9 and h:14, how do we guess what type of cutlery it is?

since this is but points on graph, we’ll measure the distance between D:9 and h:14 i.e. (9,14) to each point

we’ll use the simple distance formula :

distance = square_root( ( X2-X1)^2 + (Y2-Y1)^2 )

code :

import pandas as pd import matplotlib.pyplot as plt from math import sqrt # df is short for dataframe df = pd.read_csv("<path here>/cutlery.csv") D = df['D'] h = df['h'] target = (8, 7.5) # tuple Dt = target[0] # Dt for D-target ht = target[1] # ht for h-target plt.figure(figsize=(14,5)) # size of plot in inches for i,type_ in enumerate(df['type']): # i list for index type_ for list element D_ = D[i] h_ = h[i] dist = sqrt( (Dt-D_)**2 + (ht-h_)**2 ) # formula label = '{} \ndist:{}'.format(type_, round(dist, 2)) if type_ == 'bowl': plt.scatter(D_, h_, marker='o', color='red') plt.text(D_+0.3, h_, label, fontsize=9) elif type_ == 'mug': plt.scatter(D_, h_, marker='o', color='blue') plt.text(D_+0.3, h_, label, fontsize=9) elif type_ == 'plate': plt.scatter(D_, h_, marker='o', color='green') plt.text(D_+0.3, h_, label, fontsize=9) plt.scatter(Dt, ht, marker='x', color='green') # target point plt.annotate('target', ha = 'center', va = 'bottom', xytext = (Dt-2.5, ht-2), xy = (Dt, ht), arrowprops = { 'facecolor' : 'green', 'shrink' : 0.05 }) # target arrow plt.xlabel('diameter') plt.ylabel('height') plt.show()

output :

we’ll see that the distance is nearest to all mug samples i.e. from 1.5 to 4 than it is to the nearest bowl (dist:6.02) or plate (dist:15.89)

so we can say that it is a mug / cup

of course, since we are only calculating dist, **we can tweak our code to do everything without graphs**

consider the following cup with D:9 and h:14

we set our target to

```
target = (9, 14) # tuple
```

and our output is :

this is a k-nearest neighbour application which is labelled under classification aka the basic of machine learning.

learning type: supervised learning

this was a demo project normally much more data has to be collected !

i wanted to put up something that would not scare beginners off. my first title was :

**identifying home cutlery items with a k-nearest neighbours inspired method **

but no uninitiated would probably want to click on the link!

The post home machine learning project : identifying cutlery items appeared first on Python Members Club.

]]>