Fungal Genomes
Contents
Fungal genomes mini-hackathon
New procedure [added 2007/09/22 by AVU]
The entire process has been condensed into something like:
cd /nfs/projects/fungiscreen ./env-fungi01.bat make screen
See How To Run Pipeline for more info, including some fungi-specific notes.
Data source
alignments.tar.bz2 | mercator output with pecan alignments in output.mfa |
genomes.tgz | FASTA format |
gff.tgz | GFF format |
prank_alignments.tar.bz2 | PRANK alignments in prank.mfa |
treefile | Tree, four species, Newick Format |
Who
Add yourself...
Jason Stajich, Andrew Uzilov, Ian Holmes, Robert Bradley
Makefile walkthrough /home/projects/hackathon
Environment variables
SCREEN=fungi00 # name of the screen, does not include model name since can run on different models MODEL=ncRnaDualStrand NULL_MODEL=ncRnaDualStrandNull
Makefile.sge -> Sun gridengine Makefile wrappers
- automatically break things into windows, does the tranformations, load data into database
- wrappers you don't need to touch
- does dependancies
- windowlicker breaks things into sub directories
Add rules to work on hardcoded names, to test then can be submitted to SGE
ADD tree
- defined a variable in main makefile FUNGI-TREE
fungi00.%.xrate: $(ADD-TREE) $(FUNGI-TREE) segment.stock > segment.withtree.stock
WINDOWLICKER params:
- w windowsize
- gr reference sequence name
- g min percentage of reference sequence that can be
- r reference sequence name
- b low complexity filer ( bit content or fraction)
- -- passed to xrate
- -s make log xrate inside score
- -l maximum suffix length in DP matrix
- -g grammars
Andrew looked for smallest alignment to set lower bound and do a test $ ls -lrS `find . -name output.mfa`| head 2419 was smallest
We can use stockholm2fasta.pl to convert multi-FASTA files to Stockholm.
The fungiscreen/Makefile rule segments takes care of this.
Ready to run
In the end we should be able to do make screen in the /home/projects/hackathon/ directory because we have set the SCREEN environment variable
... something is happening
look in out directory.
After finishing
Genomic-coordinate annotations will be in gff/${SCREEN}/hitsGenomic_${MODEL}.${SPECIES}.gff
To get a .gff dump sorted by lgOdds and restricting to hits <= 130nt, do
/home/projects/pipeline/perl/dump-to-gff.pl -h sheridan -d fungi01 -t hitsAlign_ncRnaDualStrand_v15 --seqid segment --start start --end end --strand strand --score lgOdds source=ncRnaDualStrand_v15 type=ncRNA --sql "where end-start+1<=130 order by lgOdds desc" > gff/fungi01/hitsAlign_ncRnaDualStrand_v15.gff
and then call the make rule to convert to genomic coords,
make gff/${SCREEN}/hitsGenomic_${MODEL}.${SPECIES}.gff
Hacky--we really should keep everything wrapped in Makefiles--but works for now...
TODO: need to get the structures/sequence out, can use alignmonkey for this easily...
---
Older results:
- fungi00.hitsGenomic.logOdds.sorted.gff: fungal predictions, in genomic coords (sorted by log-odds score)
- fungi00.hitsAlign.structures.stock: alignments and secondary structures of the predictions