The post Machine Learning part 7: random forests appeared first on Python Members Club.

]]>Machine Learning

β‘ supervised learning

β‘ unsupervised learning

β‘ reinforcement learning

recap:

types of supervised learning

classification

regression

mixed

– tree based

– random forest

– neural networks

– support vector machines

overfitting and the problem with trees

trees classify by drawing square boxes around the data, which does not work well where many separations are needed. it overfits the data

pruning

pruning means to trim (a tree, shrub, or bush) by cutting away dead or overgrown branches or stems (or unwanted parts), especially to encourage growth.

for example, if you had a single data (a leaf) that is making your square a large one, so you just remove it so that the tree still maintains relevence. improves accuracy of the tree.

random forests

just like a forest is made up of trees, similarly random forest is a machine learning method made up of tree methods. random as it randomises the input of the trees.

the basic form of it acts on tally. a tree gives out an output, this one collects all the outputs of the trees and try to make up an answer. if 10% of trees said A and 90% said B, it will say B (majority). much like you ask questions to 10 people as to what book they think will be most popular this year or what team will win. just here as the inputs are randomised, each trees are not the same. it might be further tweaked to give more or less meanings to the answers in the tree.

use of random forests

used to identify customer type, whether people will like a product or predict behaviour.

exercise

1) dig more into the uses of random forests

2) compare it’s implementation across languages ( with libs included ), how elegant you think python is?

next:

support vector machines#8 support vector machines

The post Machine Learning part 7: random forests appeared first on Python Members Club.

]]>The post Machine Learning part 6: enthropy and gain appeared first on Python Members Club.

]]> β‘ supervised learning

β‘ unsupervised learning

β‘ reinforcement learning

recap:

types of supervised learning

classification

regression

mixed

- tree based :balloon:
- random forest
- neural networks
- support vector machines

enthropy

enthropy is just another word for expected value

in the past post, we decided what to use to split based on purity index. we can do the same thing mathematically (easier) by

P+ means probability of target (good day in our case)

P- means probability not target (bad day)

log2(P-) means log(P-) base 2

H = -(P+)log2(P+) – (P-)log2(P-)

the above is the formula for the purity of subset. it measures how likely you get a positive element if you select randomly from a particular subset

let us take teacher’s presence. we had

absent

good 5 bad 1

present

good 2 bad 1

H(absent) = -(5/6)log2(5/6) – (1/6)log2(1/6)

= 0.65

H(present) = -(2/3)log2(2/3) – (1/3)log2(1/3)

= 0.92

0 is extremely pure and 1 is extremely impure

so absent is purer than than present

gain

gain is used to determine what to split first by finding the feature with the highrst gain. 0 means irrelevent. 1 means very relevent

gain is defined by

gain = H(S) β Ξ£ (SV/S) H(SV)

S -> total number of points in leaf

v is the possible values and SV -> subset for which we have those values

H is our enthropy as above

taking for presence, our gain is

our S is 9 that is (6+3)

gain = H(presence) – sum( SV/9 * H(SV))

gain = H(presence) – (6/9 * 0.65) – (3/9 * 0.92)

our H(presence) is -(7/9)log2(7/9) – (2/9)log2(2/9)

= 0.76

gain = 0.76 – (6/9 * 0.65) – (3/9 * 0.92)

gain = 0.02

exercise:

google up information gain related to decision trees as well as associated concepts

next:

random forest

The post Machine Learning part 6: enthropy and gain appeared first on Python Members Club.

]]>The post Machine Learning part 5: mixed methods appeared first on Python Members Club.

]]>Machine Learning

β‘ supervised learning

β‘ unsupervised learning

β‘ reinforcement learning

types of supervised learning

classification

regression

mixed

– tree based

– random forest

– neural networks

– support vector machines

mixed methods are used for classification and regression.

tree based method

those trees used for both for classification and regression are called Classification And Regression Trees (CART) models

let us say that we want to predict whether an event will be good or bad, the event being having a good day at school

our data looks as follows

t. represents teacher

a means abscent

p means present

mood stands for parents’ mood

g good. b bad

hwork means homework

d done

nd not done

t. | mood | hwork | result

—————————————

a | g | d | g

p | b | d | g

a | g | nd | g

a | b | d | b

p | b | nd | b

a | g | nd | g

p | b | d | g

a | g | nd | g

a | g | d | g

let us say that today the student entered the school. he wants to know how his day will go, today he has

t. p

mood g

hwork nd

the first step is to split the tree to get a high purity index

if we split by teacher’s presence first, we get

a

good 5 bad 1

p

good 2 bad 1

if we split by parents’ mood we get

g

good 5 bad 0

b

good 2 bad 2

if we split by homework done we get

d

good 4 bad 1

nd

good 3 bad 1

the highest index of purity was with parents’ mood with good 5 and 0 bad day

we start with it

mood

— g

a | g | d | g

a | g | nd | g

a | g | nd | g

a | g | nd | g

a | g | d | g

good 5 bad 0

— b

p | b | d | g

a | b | d | b

p | b | nd | b

p | b | d | g

good 2 bad 2

so bad mood must be split further as good mood had 100% purity with 5 good result

now our condition is

t. p

mood g

hwork nd

if we go for mood g, we can stop spliting as our purity is 100%. we’ll get a good day

next:

enthropy and gain

random forest

support vector machines (SVM)

neural networks

The post Machine Learning part 5: mixed methods appeared first on Python Members Club.

]]>The post Machine Learning part 4: Gradient Descent and cost function appeared first on Python Members Club.

]]>Machine Learning

β‘ supervised learning

β‘ unsupervised learning

β‘ reinforced learning

cost function is also called mean squared error.

well mean means sum of elements / number of elements. here we take the sum of all squared errors

(error1 ^ 2 + error2 ^ 2 + error3 ^ 2)/3

/3 as there are 3 errors

we define error as the difference in y between your point and the y on thwline you are trying to fit

-> y predicted – y on line

-> y predicted – m*x on line + c

wikipaedia defines gradient descent such:

Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function.

well first, that has nothing specific to machine learning but concerns more maths

iterative means it repeats a process again and again

the minimum of a function is the lowest point of a u shape curve

in machine learning it means finding the line of best fit for a given training data

if you plot cost v/s m or cost v/s c you’ll get a u shape graph

when you plot the above graph, the minimum is the point where the error is the lowest (that’s when you get the line of best fit). now, how exactly do you find it?

well you plot either of the above graphs i.e. cost versus m or cost versus b and you find the minimum by checking the gradient at each point. at the minimum the gradient is 0 (gradient of straight line)

for the curve, you apply calculus to get the gradient function.

well when you start, you need to check the gradient after an interval of c. when we see the gradient rising again we know we’ve found our minimum.

too small an interval of c (step) and your program might run too long

too big an interval of c and you miss your minimum

the code

import numpy as np def gradient_descent(x,y): m_curr = b_curr = 0 iterations = 10000 n = len(x) learning_rate = 0.08 for i in range(iterations): y_predicted = m_curr * x + b_curr cost = (1/n) * sum([val**2 for val in (y-y_predicted)]) md = -(2/n)*sum(x*(y-y_predicted)) bd = -(2/n)*sum(y-y_predicted) m_curr = m_curr - learning_rate * md b_curr = b_curr - learning_rate * bd print ("m {}, b {}, cost {} iteration {}".format(m_curr,b_curr,cost, i)) x = np.array([1,2,3,4,5]) y = np.array([5,7,9,11,13]) gradient_descent(x,y)

notes

md means derivative (gradient) of m

the above calculates the m and c to get the right amount to find the relationship between 1 and 5, 2 and 7 etc (see array in code)

exercise:

1. google up stochastic gradient descent

code credits: code basics

The post Machine Learning part 4: Gradient Descent and cost function appeared first on Python Members Club.

]]>The post Plotting Hotspots in Mauritius with Python and Folium appeared first on Python Members Club.

]]>download the data file here

#Import Library import folium import pandas as pd #Load Data data = pd.read_csv("hotspot.csv") lat = data['Longitude'] lon = data['Latitude'] elevation = data['Location'] #Create base map map = folium.Map(location=[37.296933,-121.9574983], zoom_start = 5, tiles = "Mapbox bright") #Plot Markers for lat, lon, elevation in zip(lat, lon, elevation): folium.Marker(location=[lat, lon], popup=str(elevation), icon=folium.Icon(color = 'gray')).add_to(map) #Save the map map.save("hostspots.html")

The post Plotting Hotspots in Mauritius with Python and Folium appeared first on Python Members Club.

]]>The post Machine Learning part 3: regression appeared first on Python Members Club.

]]>**β‘ supervised learning**

β‘ unsupervised learning

β‘ reinforcement learning

**#3 supervised learning: regression**

note: **independent variables** are also called *features*

regression simply means prediction. there are many types of regression methods:

– simple linear regression

– multivariate linear regression

– polynomial regression

– ridge regression

– lasso regression

**simple linear regression** means predicting for only two variables, a dependent and an independent one

let us say that we have many points on a graph of x,y and want to draw a line of best fit. we draw the line of best fit then we predict for further values of x and y. i.e. it calculates the values of m and c in **y = m * x + c**

: **multivariate linear regression** means predicting for one dependent variable and two or more independent variables

the formula is in the format

ind. var. means independent variable

stuff = m1 * ind. var. 1+ m2 * ind. var. 2 + … + b

or

stuff = m1*feature1 + m2*feature2 + … + b

b is the intercept

many real world examples better fit with multivariate regression. like you might want to calculate fuel cost of trip based on many factors such as state of engine, age etc etc

we train our data to get the m1 m2 m3 etc then we can predict our target value for such and such situations

**ridge regression**

definition of ridge:

A ridge or a mountain ridge is a geological feature consisting of a chain of mountains or hills that form a continuous elevated crest for some distance.

just like the tops of mountains go up and down, similarly in the case where a graph’s shape is like mountain tops, going up and down, ridge regression is used as a *regularisation* method to have a simple curve so as to ignore large coefficients and get better predictions

another name: tikhonov regularization

exercise:

1) dig into the maths of how

i. linear regression works (how the line of best fit is drawn)

ii. multivariate regression works

iii. polynomial regression works

next:

cost function and gradient descent

The post Machine Learning part 3: regression appeared first on Python Members Club.

]]>The post Machine Learning part 2: supervised learning appeared first on Python Members Club.

]]>**β‘ supervised learning**

β‘ unsupervised learning

β‘ reinforcement learning

**#2 supervised learning**

in supervised learning we have labelled data available. the machine just see and do as we do.

types of supervised learning

classification :

– logistic regression

– supervised clustering

regression

– linear regression (single value)

– multivariate linear regression

mixed

– tree based

– random forest

– neural networks

– support vector machine

**classification** :

– **logistic regression**

classifying data into categories. instead of predicting continuous values, we predict discrete values.

* continuous and discrete values*

continuous values means if you have y = mx + c, you can have y for x: 34, x:34.1, x:34.12 etc, as much as you want. discrete values means the next value after a is b then c etc

to sum it, there are two ways to see counting. one is you say: 1 2 3 4 (discrete), and one you see infinity between 1 and 2 like 1.1, 1.11, 1.111111 etc (continuous)

logistic : the careful organization of a complicated activity so that it happens in a successful and effective way

regression means prediction

le us say we have a table of watermelon and apple weights. let us put a 0 for watermelon and a 1 for apple

our data in kg/(0,1) :

9.5 | 0

10.0 | 0

0.1 | 1

9.4 | 0

0.2 | 1

etc

we build a model according to our data. so, light weights (0.1 to 0.2) seem to be 1 and heavy (10 to 11) seems to be 0. that’s our model.

now we throw in a weight, let us say 10.4, it will pass it in the model and say 0.

– **supervised clustering methods (kNN)**

well that is a kmeans method. let us say we have some data

:cherry_blossom:

petal – plant type

length | width | type

——————————–

2 | 0.5 | A

4 | 1 | B

1.5 | 0.3 | A

6 | 1.7 | B

let us have a sample of petal length 5 and petal width 1.5 and we want to say if it’s of type A or B. if we plot the graph, we can see two points (a cluster) at the bottom left corner and two points (another cluster) at the top right corner. 5,1.5 is nearer to the top right corner, hence type B

but how do we define near? near is by distance. we calculate distance between each coordinate and if it is nearer to more type B, we say it is of type B. now we can add more data and they will thus be classified

next:

**regression**

The post Machine Learning part 2: supervised learning appeared first on Python Members Club.

]]>The post Machine Learning part 1: introduction appeared first on Python Members Club.

]]>wikipaedia defines machine learning as:

_the scientific study of algorithms and statistical models that computer systems use to progressively improve their performance on a specific task. Machine learning algorithms build a mathematical model of sample data, known as “training data”, in order to make predictions or decisions without being explicitly programmed to perform the task_

data -> train -> do tasks

since data is used, it should be cleaned etc. knowing what data to collect, how etc, also about formulas needed, is part of data science

machine learning is 100% applied maths, more specifically, statistics :bar_chart: plays a very important part. a computer is there to calculate and repeat faster

image recognition like recognising digits and letters from pictures, books recommendations on some sites, criminal identification on anonymous networks, speech recognition etc.

β‘ supervised learning

β‘ unsupervised learning

β‘ reinforcement learning

The post Machine Learning part 1: introduction appeared first on Python Members Club.

]]>The post Checking For Prime appeared first on Python Members Club.

]]>def is_prime(num): if num > 1: for i in range(2, num): if num%i == 0: return False return True

we take all numbers from 2 to the num and we start dividing by 2, 3, 4, 5, …

but really we need only to go to half the numbers, upto n//2 + 1

def is_prime(num): if num > 1: for i in range(2, (num//2)+1): if num%i == 0: return False return True

then we can use it like that:

for i in range(100): prime = is_prime(i) if prime: print(i)

for primes upto 100. pretty easy.

primes = [] for i in range(10000): prime = is_prime(i) if prime: primes.append(i) def first_100_diff(primes): for primeA in primes: for primeB in primes: if abs(primeA - primeB) == 100: print(primeA, primeB) return first_100_diff(primes)

the above checks the first time there is a difference of 100. modify the loop to check for the next 10 occurances.

The post Checking For Prime appeared first on Python Members Club.

]]>The post Reviving Bertrand Russell Through Python appeared first on Python Members Club.

]]>markov chains are probabilistic pattern generating models. ‘patterns’ can be events, text or geometrical arrangements

let us take the example of a person going to a fast food outlet where there are only three types of foods, rounder, burger and panini.

the probability of him taking a rounder after already taking one is 0.2 or 2/10

after a rounder, the probability of taking a burger is 0.4 or 4/10

after a rounder, the probability of taking a panini is 0.4 or 4/10

notice that there are three arrows going out for three choices, which when added make up 1

let us make up the remaining part for burger

now completed it looks like

B P P R P P …

the above can also be written as

but whatever does that means?

that’s why we have state space to specify from where we begin

**state space {1 = rounder, 2 = burger, 3 = panini} **

means we start first by rounder, then burger then panini

the idea is to:

- build a markov model
- then to generate text through the model

we’ll have a program that scans every word and records the next word

let us take the text:

*the meadow was green. the sun was shining. the boy went away. the glass was shining*

tabulating next words we get:

the -> meadow, sun, boy, glass

meadow -> was

was -> green, shining,shining

sun -> was

boy -> went

went -> away

glass -> was

from those we can generate the text

the sun was green

with path

—————————————————————–

the meadow was shining

well see

was -> green, shining,shining

when we randomly choose from [ green, shining,shining ] we actually have more chance getting shining than green as they occur twice more, no need of writing up the probabilities. the only draw back is programming efficiency, which we’ll clean up in another post

def format_text(file_read): # getting the data ready for generation data = {} lines = file_read.replace('\n', ' ').split('.') for line in lines: words = line.split() for i,word in enumerate(words): if i+1 < len(words): next_word = words[i+1] if word not in data: data[word] = [next_word] else: data[word].append(next_word) return data def rand(data): # chooses random element from dictionary return random.choice(list(data)) def generate(times, data): # specifies length of generated phrase current = rand(data) gens = [current] try: for i in range(times): current = random.choice(data[current]) gens.append(current) except: pass return gens with open('source.txt', 'r') as source: file = source.read() data = format_text(file) print(' '.join(generate(10, data)))

format text returns a dictionary. here is a snapshot:

{ 'is,': ['besides'], 'unsupported': ['by'], 'one': ['which', 'of', 'at', 'reason', 'proposition', 'form', 'of', 'which', 'thing', 'thing', 'magnitude', 'kind', 'of', 'relation', 'fact,', 'fact,', 'involving', 'case', 'of', 'kind', 'atomic', 'of', 'happens,', 'something', 'is', 'would'], ...

i ran the program over some paragraphs of

*Our Knowledge of the External World as a Field for Scientific Method in Philosophy *

from

Mathematical logic, even in its most modern form, is not

Our Knowledge of the External World as a Field for Scientific Method in Philosophydirectlyof

philosophical importance except in its beginnings. After ...

by Bertrand Russell

to

Charles I. and death and his bed

Our Knowledge of the External World as a Field for Scientific Method in Philosophy

are objective, but they are not, except in my thought, put together as

my false belief supposes. It is therefore necessary, in analysing a

belief, to look for some other logical form than a two-term relation.

Failure to realise this necessity has, in my opinion, vitiated almost

everything that has hitherto been written on the theory of knowledge,

making the problem of error insoluble and the difference between belief

and perception inexplicable.

by Bertrand Russell

the core is here

current = rand(data) gens = [current] try: for i in range(times): current = random.choice(data[current]) gens.append(current) except: pass return gens

we start by adding a random word (a random element of the dictionary) to the list

current = rand(data) gens = [current]

then we choose a random word from the words coming next

the code produces like

with length 20

again length 10

the phrase is correct euu yes, it imitates the style for now

infinite number of which we have the one thing having some other terms or opinion about Socrates--that he feels his insight, we can discover, for example, two classifications we mean that two terms, being equally true when one of the most

generated text - length 40

inductive principle, which need not known form made space and so that there were none except prejudice, so long as follows that, if they may sometimes know that we saw that I shall bring my umbrella if you could hardly be outlined in the whole theory of general truths by no

generated text - length 50

return to asymmetrical relations, such as follows that, if this way to be in the existent world, and red, and pure logic, no particular subject-matter otherwise than the case of the supposed common constituent, but are true or "one inch taller" or deny this is true, we are more than a "fact," I have that membership of have knowledge is any

generated text - length 60

Thus "father" is the difference between B is something altogether more difficult, and common world would be inferred from sense, and mortality that certain relation is apt to be asserted or deny this hypothetical form, but the second being equally true in order to have accepted and B, and philosophically it is said to properties becomes obviously impossible to infer all propositions are and B, also knew that all things have

generated text - length 70

From poverty in any such as regards the subject-predicate form--in other thing is in its constituents and were less anxious to deal throughout the other property, then B and most of unreality of the weather had been written on the marks of a certain known objects are indispensable in the everyday world with completely general propositions, and so that there is independent of three things, it should be explained would not transitive, but to give rise to exist

generated text - length 80

well we can improve many things such as: first word first, choosing probabilities as numbers rather than from list, be sensitive to punctuations, improve context etc. a rough code lands up a pretty nice result

The post Reviving Bertrand Russell Through Python appeared first on Python Members Club.

]]>