Zhen

RNA-Seq Analysis for Zhen:

M: Mx1 cre. Cre recombinase expressed under Mx1 promoter. Mp: Mx1 cre; P53 loxp/loxp P53 allele is deleted between exon 1 and exon 10. Conditional P53 deletion. Mk: Mx1 cre; LSL-Kras612D: Conditional Oncogenic K-RAs activation; Loxp-Stop Codon. Cre: You can only turn it on at a point. Mpk: Mpk is both three alleles together: Mx1-CRE; P53, KRAS Mx1 is a promoter.

Comparison between samples of same treatment, can we normalize the data within the treatment samples with respect to the house keeping genes. Beta-Actin or GapDh, Beta-Tubulin.

% of reads that mapped and didn't.


 * DATA ANALYSIS for Zhen Zhao at MSKCC in Lowe Lab **

FC lane Library Sample
 * FC64E59AAXX lane 1 LID45508 MK #3
 * FC64E59AAXX lane 2 LID45507 MPK #2
 * FC64E59AAXX lane 3 LID45509 MPK #3
 * FC64E59AAXX lane 5 LID45504 M #2
 * FC64E59AAXX lane 6 LID45505 MP #2
 * FC64M4JAAXX lane 1 LID45491 M #1
 * FC64M4JAAXX lane 2 LID45501 MP #1
 * FC64M4JAAXX lane 3 LID45502 MK #1
 * FC64M4JAAXX lane 5 LID45506 MK #2
 * FC64M4JAAXX lane 6 LID45503 MPK #1

Group Library amount Comparision Group


 * M 2 Comparision 1 MK vs M
 * MP 2 Comparision 2 MPK vs MK
 * MK 3 Comparision 3 MPK vs M
 * MPK 3 Comparision 4 MP vs M


 * 1) Step 1: run quality check on all sequence sets: /home/lowe/thapar/downloads/FastQC/fastqc filename.txt
 * What we find is that quality of Mpk_2 is not good
 * convert the file to standard fastq using : fq_all2std.pl sol2std2 M_1.txt> M_1.fastq

>>> 2) refMrna.fa.gz - RefSeq mRNA from the same species as the genome. This sequence data is updated once a week via automatic GenBank updates. >>>
 * 1) Step 2: First we want to find out what the reference genome is going to be. For that we go to this link: []
 * At this link you will see all the sequence that can be aligned on for Mouse. Here there are two references that we can use for RNASeq:
 * 1) mrna.fa.gz - Mouse mRNA from GenBank. This sequence data is updated once a week via automatic GenBank updates.

>> Code for that: #!/bin/tcsh >> #$ -l virtual_free=7.8G >> #$ -cwd >> #$ -N bwa.index.mouse >> #$ -V >> bowtie-build .fasta mouse_genome
 * 1) Step 3: Create an index for the reference genome for both the reference sequence genomes. Put the code below in a shell script and run


 * 1) run it
 * sge_bowtie_zhen.pl --query /data/lowe/thapar/zhen/zhenData/M_1.txt --mismatches 0 --jobs 64 --output M_1 --format fastq --fiveprimebase 26 &

>>>
 * 1) Well, so the last step failed and needs to be looked into because I want to install TopHat but that was a problem on Bluehelix. So I ended up using Galaxy for tophat alignment and then I used cufflinks/cuffdiff for diff expression
 * 2) Before Tophat: Fastx toolkit was used:for
 * 3) Trim the adapter sequence. Use the custom sequence identified earlier to trim reads
 * 4) Params: Keep both CLipped and Not Clipped sequences. Minimum length=15 below which reads are discarded.
 * 5) Collapse the clipped file.
 * 6) For TopHat--max-multihits =40, Reference genome was mm9. Gene Model was mm9 UCSC Refseq
 * 7) tophat -p 10 -o /outDir/ -g 40 --GTF /data/hannon/gordon/databases/gtf/mm9_refGene_2011_02_15.gtf --no-novel-juncs /localdata1/genomes/bowtie/mm9_genome/mm9_genome /localdata1/galaxy/database_prod/files/009/dataset_9519.dat
 * 1) For Cufflinks: Perform Upper Quartile Normalization which removes top 25% of genes from FPKM denominator to improve accuracy of differential expression calls for low abundance transcripts. Normalize to Total Number of Mapped Reads. Perform Bias Correction using standard Genome (-b) to mm9 Canonical. Mask out transcripts -M (No Masking)
 * 2) Cuffdiff with replicates was used: Transcript annotation based on Gene Models: mm9. Added 2 groups and Replicates for each. Standard illumina library. Normalized to Total Number of Mapped Reads. Perform Bias Correction -b=mm9 (standard genome). It produced the gene as well as Isoform list in the contrast.

Cuff Diff Gene FPKM Trk Iso FPKM Trk Gene Expr Iso Expr
 * Wnt5b || - || - || Wnt5b || Wnt5b || - || chr6:119382548-119494365 || - || - || OK || 0.0318101 || 0.0296045 || 0.0340157 || 0 || 0 || 0 ||
 * tracking_id || class_code || nearest_ref_id || gene_id || gene_short_name || tss_id || locus || length || coverage || status || Mpk_FPKM || Mpk_conf_lo || Mpk_conf_hi || Mk_FPKM || Mk_conf_lo || Mk_conf_hi ||
 * NM_009525 || - || - || Wnt5b || Wnt5b || - || chr6:119382548-119494365 || 2371 || - || OK || 0.0318101 || 0.0296044 || 0.0340158 || 0 || 0 || 0 ||
 * tracking_id || class_code || nearest_ref_id || gene_id || gene_short_name || tss_id || locus || length || coverage || status || Mpk_FPKM || Mpk_conf_lo || Mpk_conf_hi || Mk_FPKM || Mk_conf_lo || Mk_conf_hi ||
 * Wnt5b || Wnt5b || Wnt5b || chr6:119382548-119494365 || Mk || Mpk || OK || 0 || 0.031558 || 1.79769e+308 || 1.79769e+308 || 2.94E-183 || 1.93E-180 || yes ||
 * test_id || gene_id || gene || locus || sample_1 || sample_2 || status || value_1 || value_2 || ln(fold_change) || test_stat || p_value || q_value || significant ||
 * NM_009525 || Wnt5b || Wnt5b || chr6:119382548-119494365 || Mpk || Mp || OK || 0.0321314 || 0.0469977 || 0.380266 || -6.86032 || 6.87E-12 || 1.72E-09 || yes ||


 * 1) Final Step. Now I have the following Files as results:
 * //Mpk_3_trans_expr.tabular//

//Mpk_3_gene_expr.tabular//

//Mpk_3_assmbl_transcripts.gtf//

//Mpk_3_spliced.bed// || //Mk_2_trans_expr.tabular//

//Mk_2_gene_expr.tabular//

//Mk_2_assmbl_transcripts.gtf//

//Mk_2_spliced.bed// || //M_Mpk_iso_fpkm_trk.tabular//

//M_Mpk_iso_expr.tabular//

//M_Mpk_gene_fpkm_trk.tabular//

//M_Mpk_gene_expr.tabular// ||
 * //Mpk_2_trans_expr.tabular//

//Mpk_2_gene_expr.tabular//

//Mpk_2_assmbl_transcripts.tabular//

//Mpk_2_spliced.bed// || //M_1_trans_expr.tabular//

//M_1_gene_expr.tabular//

//M_1_assmbl_transcripts.gtf//

//M_1_spliced.bed// || //Mp2_M_iso_fpkm_trk.tabular//

//Mp2_M_iso_expr.tabular//

//Mp_2_M_gene_fpkm_trk.tabular//

//Mp2_M_gene_expr.tabular// ||
 * //Mpk_1_trans_expr.tabular//

//Mpk_1_gene_expr.tabular//

//Mpk_1_assmbl_transcripts.gtf//

//Mpk_1_spliced.bed// || //Mk_1_trans_expr.tabular//

//Mk_1_gene_expr.tabular//

//Mk_1_assmbl_transcripts.gtf//

//Mk_1_spliced.bed// || //Mk_M_iso_fpkm_trk.tabular//

//Mk_M_iso_expr.tabular//

//Mk_M_gene_fpkm_trk.tabular//

//Mk_M_gene_expr.tabular// ||
 * //Mk_3_trans_expr.tabular//

//Mk_3_gene_expr.tabular//

//Mk_3_assmbl_transcripts.gtf//

//Mk_3_spliced.bed// || //M_2_trans_expr.tabular//

//M_2_cuff_gene_expr.tabular//

//M_2_assmbl_transcripts.gtf//

//M_2_spliced.bed// || //Mk_Mpk_iso_fpkm_trk.tabular//

//Mk_Mpk_iso_expr.tabular//

//Mk_Mpk_gene_fpkm_trk.tabular//

//Mk_Mpk_gene_expr.tabular// || Go to David gene search on google for each of the tabular files once you get the list of diff expressed genes. After that you have the pathways for each gene list. Next we need to go and see how these pathways are related.

STEP 1) CLIPPING NEEDS TO BE DONE AGAIN WITH A BETTER CLIPPER cutadapt -a TCGGTTCAGCAGGAATGCCGAGATCGGAAGAGCGGT -a TCAGCAGGAATGCCGAGATCGGAAGAGCGGTTCAGC -a TTCGGTTCAGCAGGAATGCCGAAGATCGGAAGAGCG -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT -a TCGGAAGAGCGGTTCAGCAGGAATGCCGAGATCGGA -a TGCGGTTCAGCAGGAATGCCGAGATCGGAAGAGCGG -a TCGGAAGACGGTTCAGCAGGAATGCCGAGATCGGAA -a TTGCGGTTCAGCAGGAATGCCGAAGATCGGAAGAGC -a TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAA -a TCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAAA -a TCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTC -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAAAAA -m 20 -o M_1_cutadapt.fastq -f fastq M_1.txt > M_1_cutout.txt

cutadapt -a TCGGTTCAGCAGGAATGCCGAGATCGGAAGAGCGGT -a TCAGCAGGAATGCCGAGATCGGAAGAGCGGTTCAGC -a TTCGGTTCAGCAGGAATGCCGAAGATCGGAAGAGCG -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT -a TCGGAAGAGCGGTTCAGCAGGAATGCCGAGATCGGA -a TGCGGTTCAGCAGGAATGCCGAGATCGGAAGAGCGG -a TCGGAAGACGGTTCAGCAGGAATGCCGAGATCGGAA -a TTGCGGTTCAGCAGGAATGCCGAAGATCGGAAGAGC -a TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAA -a TCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAAA -a TCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTC -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAAAAA -m 20 -o M_2_cutadapt.fastq -f fastq M_2.txt > M_2_cutout.txt

cutadapt -a TCGGTTCAGCAGGAATGCCGAGATCGGAAGAGCGGT -a TCAGCAGGAATGCCGAGATCGGAAGAGCGGTTCAGC -a TTCGGTTCAGCAGGAATGCCGAAGATCGGAAGAGCG -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT -a TCGGAAGAGCGGTTCAGCAGGAATGCCGAGATCGGA -a TGCGGTTCAGCAGGAATGCCGAGATCGGAAGAGCGG -a TCGGAAGACGGTTCAGCAGGAATGCCGAGATCGGAA -a TTGCGGTTCAGCAGGAATGCCGAAGATCGGAAGAGC -a TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAA -a TCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAAA -a TCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTC -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAAAAA -m 20 -o Mp_1_cutadapt.fastq -f fastq Mp_1.txt > Mp_1_cutout.txt

cutadapt -a TCGGTTCAGCAGGAATGCCGAGATCGGAAGAGCGGT -a TCAGCAGGAATGCCGAGATCGGAAGAGCGGTTCAGC -a TTCGGTTCAGCAGGAATGCCGAAGATCGGAAGAGCG -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT -a TCGGAAGAGCGGTTCAGCAGGAATGCCGAGATCGGA -a TGCGGTTCAGCAGGAATGCCGAGATCGGAAGAGCGG -a TCGGAAGACGGTTCAGCAGGAATGCCGAGATCGGAA -a TTGCGGTTCAGCAGGAATGCCGAAGATCGGAAGAGC -a TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAA -a TCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAAA -a TCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTC -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAAAAA -m 20 -o Mp_2_cutadapt.fastq -f fastq Mp_2.txt > Mp_2_cutout.txt

cutadapt -a TCGGTTCAGCAGGAATGCCGAGATCGGAAGAGCGGT -a TCAGCAGGAATGCCGAGATCGGAAGAGCGGTTCAGC -a TTCGGTTCAGCAGGAATGCCGAAGATCGGAAGAGCG -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT -a TCGGAAGAGCGGTTCAGCAGGAATGCCGAGATCGGA -a TGCGGTTCAGCAGGAATGCCGAGATCGGAAGAGCGG -a TCGGAAGACGGTTCAGCAGGAATGCCGAGATCGGAA -a TTGCGGTTCAGCAGGAATGCCGAAGATCGGAAGAGC -a TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAA -a TCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAAA -a TCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTC -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAAAAA -m 20 -o Mk_1_cutadapt.fastq -f fastq Mk_1.txt > Mk_1_cutout.txt

cutadapt -a TCGGTTCAGCAGGAATGCCGAGATCGGAAGAGCGGT -a TCAGCAGGAATGCCGAGATCGGAAGAGCGGTTCAGC -a TTCGGTTCAGCAGGAATGCCGAAGATCGGAAGAGCG -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT -a TCGGAAGAGCGGTTCAGCAGGAATGCCGAGATCGGA -a TGCGGTTCAGCAGGAATGCCGAGATCGGAAGAGCGG -a TCGGAAGACGGTTCAGCAGGAATGCCGAGATCGGAA -a TTGCGGTTCAGCAGGAATGCCGAAGATCGGAAGAGC -a TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAA -a TCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAAA -a TCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTC -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAAAAA -m 20 -o Mk_2_cutadapt.fastq -f fastq Mk_2.txt > Mk_2_cutout.txt

cutadapt -a TCGGTTCAGCAGGAATGCCGAGATCGGAAGAGCGGT -a TCAGCAGGAATGCCGAGATCGGAAGAGCGGTTCAGC -a TTCGGTTCAGCAGGAATGCCGAAGATCGGAAGAGCG -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT -a TCGGAAGAGCGGTTCAGCAGGAATGCCGAGATCGGA -a TGCGGTTCAGCAGGAATGCCGAGATCGGAAGAGCGG -a TCGGAAGACGGTTCAGCAGGAATGCCGAGATCGGAA -a TTGCGGTTCAGCAGGAATGCCGAAGATCGGAAGAGC -a TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAA -a TCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAAA -a TCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTC -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAAAAA -m 20 -o Mk_3_cutadapt.fastq -f fastq Mk_3.txt > Mk_3_cutout.txt

cutadapt -a TCGGTTCAGCAGGAATGCCGAGATCGGAAGAGCGGT -a TCAGCAGGAATGCCGAGATCGGAAGAGCGGTTCAGC -a TTCGGTTCAGCAGGAATGCCGAAGATCGGAAGAGCG -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT -a TCGGAAGAGCGGTTCAGCAGGAATGCCGAGATCGGA -a TGCGGTTCAGCAGGAATGCCGAGATCGGAAGAGCGG -a TCGGAAGACGGTTCAGCAGGAATGCCGAGATCGGAA -a TTGCGGTTCAGCAGGAATGCCGAAGATCGGAAGAGC -a TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAA -a TCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAAA -a TCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTC -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAAAAA -m 20 -o Mpk_1_cutadapt.fastq -f fastq Mpk_1.txt > Mpk_1_cutout.txt

cutadapt -a TCGGTTCAGCAGGAATGCCGAGATCGGAAGAGCGGT -a TCAGCAGGAATGCCGAGATCGGAAGAGCGGTTCAGC -a TTCGGTTCAGCAGGAATGCCGAAGATCGGAAGAGCG -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT -a TCGGAAGAGCGGTTCAGCAGGAATGCCGAGATCGGA -a TGCGGTTCAGCAGGAATGCCGAGATCGGAAGAGCGG -a TCGGAAGACGGTTCAGCAGGAATGCCGAGATCGGAA -a TTGCGGTTCAGCAGGAATGCCGAAGATCGGAAGAGC -a TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAA -a TCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAAA -a TCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTC -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAAAAA -m 20 -o Mpk_2_cutadapt.fastq -f fastq Mpk_2.txt > Mpk_2_cutout.txt

cutadapt -a TCGGTTCAGCAGGAATGCCGAGATCGGAAGAGCGGT -a TCAGCAGGAATGCCGAGATCGGAAGAGCGGTTCAGC -a TTCGGTTCAGCAGGAATGCCGAAGATCGGAAGAGCG -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT -a TCGGAAGAGCGGTTCAGCAGGAATGCCGAGATCGGA -a TGCGGTTCAGCAGGAATGCCGAGATCGGAAGAGCGG -a TCGGAAGACGGTTCAGCAGGAATGCCGAGATCGGAA -a TTGCGGTTCAGCAGGAATGCCGAAGATCGGAAGAGC -a TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAA -a TCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAAA -a TCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAA -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTC -a TTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAAAAA -m 20 -o Mpk_3_cutadapt.fastq -f fastq Mpk_3.txt > Mpk_3_cutout.txt
 * = ====REP==== ||= ====Processed reads:==== ||= ====TRIMMED READS==== ||= ====Too SHORT READS====

(% of Processed Reads)
||
 * = ====M_1==== ||= ====23,680,165==== ||= ====12418339 ( 52.4%)==== ||= ====8678063 ( 36.6%)==== ||
 * = ====M_2==== ||= ====25,860,966==== ||= ====13400867 ( 51.8%)==== ||= ====8975733 ( 34.7%)==== ||
 * = ====Mp_1==== ||= ====21,957,697==== ||= ====12926318 ( 58.9%)==== ||= ====9684471 ( 44.1%)==== ||
 * = ====Mp_2==== ||= ====22,207,042==== ||= ====10778602 ( 48.5%)==== ||= ====7134235 ( 32.1%)==== ||
 * = ====Mk_1==== ||= ====21,158,729==== ||= ====12872673 (60.8%)==== ||= ====9435105 (44.6%)==== ||
 * = ====Mk_2==== ||= ====17,225,770==== ||= ====8686878 (50.4%)==== ||= ====5542417 (32.2%)==== ||
 * = ====Mk_3==== ||= ====24,795,561==== ||= ====13716183 (55.3%)==== ||= ====9573935 (38.6%)==== ||
 * = ====Mpk_1==== ||= ====21,498,946==== ||= ====11556158 ( 53.8%)==== ||= ====7642046 ( 35.5%)==== ||
 * = ====Mpk_2==== ||= ====23,096,686==== ||= ====9205995 ( 39.9%)==== ||= ====4609454 ( 20.0%)==== ||
 * = ====Mpk_3==== ||= ====25,558,852==== ||= ====15812724 ( 61.9%)==== ||= ====12217200 ( 47.8%)==== ||

After collapsing:

Mpk_3: 7,094,724 sequences Info: Input: 13341652 sequences (representing 13341652 reads) Output: 7094724 sequences (representing 13341652 reads

Mpk_2:11,184,552 sequences Info: Input: 18487232 sequences (representing 18487232 reads) Output: 11184552 sequences (representing 18487232 reads) :

Mpk_1: 7,463,204 sequences format: fasta , database: mm9 Info: Input: 13856900 sequences (representing 13856900 reads) Output: 7463204 sequences (representing 13856900 reads)

Mk_3: 8,416,317 sequences format: fasta , database: mm9  Info: Input: 15221626 sequences (representing 15221626 reads) Output: 8416317 sequences (representing 15221626 reads) Mk_2: 7,035,630 sequences format: fasta , database: mm9 <span style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;"> Info: Input: 11683353 sequences (representing 11683353 reads) Output: 7035630 sequences (representing 11683353 reads) <span style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;">Mk_1: 6,321,790 sequences format: <span class="fasta" style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;">fasta <span style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;">, database: <span class="mm9" style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;">mm9 <span style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;"> Info: Input: 11723624 sequences (representing 11723624 reads) Output: 6321790 sequences (representing 11723624 reads) <span style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;">Mp_2: 7,975,710 sequences format: <span class="fasta" style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;">fasta <span style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;">, database: <span class="mm9" style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;">mm9 <span style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;">Info: Input: 15072807 sequences (representing 15072807 reads) Output: 7975710 sequences (representing 15072807 reads)

<span style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;">Mp_1: 6,849,247 sequences format: <span class="fasta" style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;">fasta <span style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;">, database: <span class="mm9" style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;">mm9 <span style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;"> Info: Input: 12273226 sequences (representing 12273226 reads) Output: 6849247 sequences (representing 12273226 reads)

<span style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;">M_2: 8,987,233 sequences format: <span class="fasta" style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;">fasta <span style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;">, database: <span class="mm9" style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;">mm9 <span style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;"> Info: Input: 16885233 sequences (representing 16885233 reads) Output: 8987233 sequences (representing 16885233 reads) <span style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;">M_1: 8,933,204 sequences format: <span class="fasta" style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;">fasta <span style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;">, database: <span class="mm9" style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;">mm9 <span style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;"> Info: Input: 15002102 sequences (representing 15002102 reads) Output: 8933204 sequences (representing 15002102 reads)

<span style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;">Comparisons:

<span style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;">Mp_M

<span style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;">Mpk_M

<span style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;">Mk_M

<span style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;">---Mk_Mp

<span style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;">Mp_Mpk

<span style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;">Mk_Mpk

<span style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;">In this step we want to compare and find the intersection of the following:

<span style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;">Lets callM 1Mp 2mk 3Mpk 4

<span style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;">A -->1 vs 4B --> 1 vs 3C --> 3 vs 4D --> 2 vs 4

<span style="background-color: #ccffcc; color: #303030; font-family: 'Lucida Grande',verdana,arial,helvetica,sans-serif; font-size: 12px;">We want to find the intersection of the significant genes in A, B, C, DA AND B AND C AND DA OR B OR C OR D

Meeting with zhen 1/30 9:00 to 9:30: What needs to be done: 1) Take the gene ids that are significant in deSEq and EdgeR and find those in Cuffdiff output: 1 hour 2) Take the gene ids above and find the REFSEQ ids for them and send list back: 1 hour To be delivered Jan 31, by 5:00 pm

Feb 12th 2013

For the SCOR grant: Analyse the M vs Mk comparison. Compare these two and then search for the genes containing DUSP (1-20 or more) SPRY (1-4 or more) (Full name Sprouty) SPRED (1-x) RASGAP (ANY variant) NF1 GAPDH (Control Gene) MYOGGLOBIN (COntrol Gene)