# Assignment

Use the following table to answer question #1.

1a.Using an Excel spreadsheet, create abinarized version of the data set with the following categories:

Note:the following are also the itemset names in the spreadsheet)

Sky Fair, Sky Stormy, Status Impaired, Status Sober, Violation None, Violation Speeding, Violation Stop, Violation Signal, Restraint = No, Restraint=Yes, Crash Major, Crash Minor

Paste the Excel spreadsheet into this document here.

1b.What is the maximum width of each transaction in the binarized data?

1c.How did you determine the answer for item 1b?

1d.Assuming that support threshold is 30%, how many candidate and frequent itemsets will be generated?

1e.Again using Excel, create a data set that contains only the following asymmetric binary attributes:

(Weather = Bad, Impaired, Traffic violation = Yes, Restraint = No, Crash Severity = Major).

The itemset headings are:Bad, Impaired, Violdation, NoRestraint, and Major

For Traffic violation, only None has a value of 0. The rest of the attribute values are assigned to 1.

Copy and paste the Excel spreadsheet here:

Assuming that support threshold is 30%, how many candidate and frequent itemsets will be generated?

1f.Compare the number of candidate and frequent itemsets generated in 1(d) and 1(e).What is your analysis?

2.Find all the frequent subsequences with support >= 50% given the sequence shown below.Assume there are no timing constraints imposed on the sequence.

**Answer:**

**3.**For each of the sequences *w *=*< e*1*e*2 *. . . e**i **. . . e**i*+1 *. . . e**last **> *given below, determine whether they are subsequences of the sequence

< *{*1*, *2*, *3*} {*2*, *4*} {*2*, *4*, *5*} {*3*, *5*} {*6*} *>

subjected to the following timing constraints:

mingap = 0 (interval between last event in *e**i *and first event in *e**i*+1 is *> *0)

maxgap = 3 (interval between first event in *e**i *and last event in *e**i*+1 is *≤ *3)

maxspan = 5 (interval between first event in *e*1 and last event in *e**last *is *≤ *5)

*ws *= 1 (time between first and last events in *e**i *is *≤ *1)

• *w *=*< *{1} {2} {3} >

**Answer: **

• w =< {1, 2, 3, 4} {5, 6} *>*

**Answer: **

**Answer: **

• w =< {1} {2, 4} {6} *>*

**Answer: **

• w =< {1, 2} {3, 4} {5, 6} *>*

**Answer: **