Agustin

lane1=H3K27shp53 lane2=H3K27shEZH2 lane3=input DNA

Did the alignment using Bowtie with 2 mismatches allowed. See script.

Calling the script again to align. This time there will be no mismatches allowed. Want to make it as stringent as possible to get the best possible peaks. The directory is /data/lowe/thapar/agustindata0.0/scripts

perl sge_bowtie_agstn_1.3.pl --query /data/lowe/thapar/agustindata4/s_1_sequence.txt --mismatches 0 --jobs 64 --output p53 --format fastq

Stats:

Total Reads: lane1=H3K27shp53
 * 1) reads with at least one reported alignment: 8,402,100 (72.33%)


 * 1) reads that failed to align: 657,100 (5.66%)

lane2=H3K27shEZH2
 * 1) reads with alignments suppressed due to -m: 2,557,600 (22.02%)
 * Experiment || TotalReads || Aligned ||
 * H3K27shp53 || 11616835 || 72% ||
 * H3K27shEZH2 || 6842994 || 69.25% ||
 * input DNA || 27544024 || 73% ||
 * 1) reads with at least one reported alignment: 5,267,800 (69.25%)


 * 1) reads that failed to align: 552,900 (7.27%)


 * 1) reads with alignments suppressed due to -m: 1,786,200 (23.48%)

lane3=input DNA
 * 1) reads with at least one reported alignment: 20,197,300 (73.33%)


 * 1) reads that failed to align: 679,500 (2.47%)


 * 1) reads with alignments suppressed due to -m: 6,667,200 (24.21%)

Now for peak calling. Used MACS with multiple options setting threshhold at 2 fold as well as 10 fold. macs14 -t /datafc/lowe/thapar/Ag.1/bowtie_out/Ag_4_s_1.sorted.bam -c /datafc/lowe/thapar/Ag.ctrl/bowtie_out/Ag_4_s_3.sorted.bam -f BAM -g mm -n P53_CTRL_0.001 --diag -p 1e-3 macs14 -t /datafc/lowe/thapar/Ag.2/bowtie_out/Ag_4_s_2.sorted.bam -c /datafc/lowe/thapar/Ag.ctrl/bowtie_out/Ag_4_s_3.sorted.bam -f BAM -g mm -n ezh2_CTRL_nomodel -p 1e-3 --bw=250 --shiftsize=125 --nomodel --call-subpeaks --wig

Now using SISSR to call peaks

sissrs.pl -i /datafc/lowe/thapar/Ag.1/bowtie_out/Ag_4_s_1.sorted.bed -o ./p53_ctrl -s 1870000000 -b /datafc/lowe/thapar/Ag.ctrl/bowtie_out/Ag_4_s_3.sorted.bed -p 0.05 -u -D 0.05 -w 10 -F 200

MicroArray Analysis for EZh2

E2H2.1.1 - SLAC120410_01 E2H2.1.2 - SLAC120410_02 E2H2.2.1 - SLAC120410_03 E2H2.2.2 - SLAC120410_04 E2H2.3.1 - SLAC120410_05 E2H2.3.2 - SLAC120410_06 MYC.1 - SLAC120410_07 MYC.2 - SLAC120410_08 P53.1 - SLAC120410_09 P53.2 - SLAC120410_10 WT.1 - SLAC120410_11 WT.2 - SLAC120410_12

For ezh2 the first digit is the hairpin, the second is the replicate. We have three different hairpins of different knockdown efficiency, the best is ezh2.3 so any comparisons made with ezh2.3.1/2 and excluding ezh2.1/2 should be more significant. Done Marray comparisons for ez/p53 now particularly wt/myc, myc/ezh2, myc/p53 Done for all conditions. Sent results to Agustin. Presented at lab meeting. Summary:
 * Microarray analysis:
 * Scripts in R
 * Get a list of differentially expressed genes
 * Chip-Seq analysis:
 * Quality Control of data
 * Alignment of reads using Bowtie
 * Statistical analysis with 2 different approaches
 * Using MACS: Published and clearly established as the best differential expression package for Chip-Seq
 * Using edgeR: R based package for differential expression
 * Combine both results to look for overlap
 * Presentation of results by making graphs
 * Pathway analysis using DAVID

Conclusion: Lack of replicates was observed in this experiment. At leaset 3 replicates must be there for each condition to have any statistical significance. Seems like there may have been many more genes that could have come up but they were missed out for this reason.

Feb 7th: 3:55-4:00 pm: Agustin came to ask about How to install genome browser and display output of MACS peak finding. I explained to him and sent the first set of instructions via email:
 * Go to Broadinstitute
 * download: Genomebrowser, Integrated genome viewer []
 * Next connect here smb://skimcs/lowelab
 * Your data is in Lowelab/Vishal/ForAgustin/ directory