### Machine Learning with Python

Posted on Updated on

One of the fundamental Data Mining Techniques is finding ‘Association Rules’. This allows retailers to identify purchase pattern of customers. Here I am going to demonstrate a simple technique with Python capabilities at ‘Vissicomp Technologies’ to unravel such a pattern.

Let us assume that a store is selling only three items namely ‘bread’,’butter’ and ‘jam’. A customer can purchase all three items i.e. [‘bread’,’butter’,’jam’], a list in Python . He can also buy any combinations of them. Therefore I need to find all subsets of above set. A program to find subset is given here.

#recursion in arrays

def subs(l):

if l == []:

return [[]]

x=subs(l[1:])

return x+[[l[0]]+y for y in x]

The function subs() defined above will generate power set of set [‘bread’,’butter’,’jam’]. In a screen shot given below program and its output both are shown.

Output shows all possible subsets of a universal set . For further analysis and finding “large item set “   I will label these sets as l1, l2 l3…… and so on. Therefore , let

l1=[‘bread’], l2= [‘butter’], l3=[‘jam’],l4=[‘bread’, ‘butter’],  l5= [‘bread’,’jam’], l6= [‘jam’,’butter’],  l7= [‘bread’,’butter’,’jam’]. We leave out an empty set. Let us say on a particular day we have following sales data:

daysale= [‘l1’,’l2’,’l3’, ‘l4’,’l7’,’l4’, ‘l5’,’l5’,’l6’,‘l4’,’l4’,’l5’,‘l7’,’l4’,’l6’]

Data shows we have 15 sale deal on that particular day. To determine large dataset , we need to determine frequency of each of l1,l2,l3,….. . This is done by following program.

Output shows  ‘l4’ has occurred 5 times, ‘l5’ has occurred 3 times and so on . so if we set threshold as 3 only ‘l4’ and ‘l5’ will be candidate for large dataset.

I will welcome suggestions from all of you.