DataMining

Data mining for TCGA data (with Reileen, post doc from Chris Sander lab) Estimated time of completion Feb 14th for INITIAL PROTOTYPE data set. Total time per week 4 hours
 * I have manages to acquire segmented data from Reileen for 200 samples from the TCGA data. (time spent : 4 hours over Lunch time in 4 weeks)
 * Next step is to Run data mining algorithms on this data to come up with co-occurrence statistics. (Estimated time required: 24 hours)
 * Finally, after presentation of this data to Scott and Chris and having approval from both, we can go ahead and do similar runs for all data sets in TCGA (Estimated) time required 80 hours

Got the data from Rileen. Now in the process of writing code for Association Rule mining. CNV data acquired.

Started work. Did the initial coding. The program did not run due to lack of memory on server. Need to parallelize this in R and on the cluster. 20 hours
 * Figure out with Joanne what server use to parallelize

step1: Got the data for CNV's in order from Chris Sanders lab postdoc Rileen DONE

Step2: Format the data to be used with Apriori algorithm DONE

Step3: Ran the algorithm in R. Too long and runs out of memory. So now, find a C implementation. Also for a small subset perform the prototype task DONE

Step4: For this prototype, make a circos plot.

New Plan: Data mining (With Rileen, Giovanni form Sander Lab and Vishal from Lowe Lab) See meeting notes Deadline: April 6th 2012