Usage and Examples

autobuild

The input of autobuild module is a TXT file containing KEGG abbreviated species names, for example organism_example_list.

$ PhySpeTree -i autobuild organism_example_list.txt [options]*

options

option Description
-h Print help message and exits.
-i Input a TXT file containing abbreviated species names.
-o A directory to store outputs. The default is "Outdata".
-t Number of processing threads (CPUs). The default is 1.
-e FASTA format files to extend the tree with the --ehcp or --esrna option.
-db The absolute path for local database.
--hcp HCP (highly conserved protein) method (default).
--ehcp HCP method with extended HCP sequences.
--srna SSU method.
--esrna SSU rRNA method with extended SSU rRNA sequences.

Example

Download the example input file:

$ wget "https://yangfangs.github.io/physpetools/example/organism_example_list.txt"

--2016-10-29 19:41:53--  https://yangfangs.github.io/physpetools/example/organism_example_list.txt
Resolving yangfangs.github.io (yangfangs.github.io)... 151.101.24.133
Connecting to yangfangs.github.io (yangfangs.github.io)|151.101.24.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 39 [text/plain]
Saving to: ‘organism_example_list.txt’

organism_example_list.txt     100%[==============================================>]      39  --.-KB/s    in 0s      

2016-10-29 19:41:54 (19.0 MB/s) - ‘organism_example_list.txt’ saved [39/39]

$ cat organism_example_list.txt
aca
ace
acl
acn
aco
acp
adg
adk
aeh
aeq

Automatically reconstruct species trees by HCP

$ PhySpeTree autobuild -i organism_example_list.txt --hcp
Loading organisms names success.....

The result are store in:Outdata

Now loading data and constructing phylogenetic tree......
2016-10-29 19:44:11,660 KEGG INDEX DB INFO: Read organisms names success
2016-10-29 19:44:17,296 KEGG INDEX DB INFO: Retrieve and download of highly conserved protein 'Ribosomal protein L1' was successful store in p1.fasta file
2016-10-29 19:44:17,919 KEGG INDEX DB INFO: Retrieve and download of highly conserved protein 'DNA-directed RNA polymerase subunit alpha' was successful store in p2.fasta file
2016-10-29 19:44:18,369 KEGG INDEX DB INFO: Retrieve and download of highly conserved protein 'Leucyl-tRNA synthetase' was successful store in p3.fasta file
2016-10-29 19:44:18,943 KEGG INDEX DB INFO: Retrieve and download of highly conserved protein 'Metal-dependent proteases with chaperone activity' was successful store in p4.fasta file
2016-10-29 19:44:19,660 KEGG INDEX DB INFO: Retrieve and download of highly conserved protein 'Phenylalanine-tRNA synthethase alpha subunit' was successful store in p5.fasta file
2016-10-29 19:44:20,114 KEGG INDEX DB INFO: Retrieve and download of highly conserved protein 'Predicted GTPase probable translation factor' was successful store in p6.fasta file
2016-10-29 19:44:20,505 KEGG INDEX DB INFO: Retrieve and download of highly conserved protein 'Ribosomal protein L11' was successful store in p7.fasta file
2016-10-29 19:44:20,917 KEGG INDEX DB INFO: Retrieve and download of highly conserved protein 'Ribosomal protein L13' was successful store in p8.fasta file
2016-10-29 19:44:21,333 KEGG INDEX DB INFO: Retrieve and download of highly conserved protein 'Ribosomal protein L14' was successful store in p9.fasta file
......

Outputs:

log.log
Outdata/
        RAxML_bestTree.T1
        RAxML_bipartitions.T1
        RAxML_bipartitionsBranchLabels.T1
        RAxML_bootstrap.T1
        RAxML_info.T1
   temp/
        conserved_protein20161029194411/
                                        p1.fasta
                                        p2.fasta
                                        p3.fasta
                                        ......
        alignment20161029194429/
                                    p1.fasta
                                    p2.fasta
                                    p2.fasta
                                    ......
        concatenate20161029194432/
                                  concatenate.fasta
                                  concatenate.fasta-gb1
                                  concatenate.fasta-gb1.htm
                                  concatenate.fasta-gb1.phy
  • log.log: logs.
  • Outdata: tree files.

    • RAxML_bestTree.T1: best ML search tree built by RAxML.
    • RAxML_bipartitions.T1: bipartition tree built by RAxML.
    • RAxML_bipartitionsBranchLabels.T1: bipartition tree by RAxML with branch length.
    • RAxML_bootstrap.T1: bootstrap result.
    • RAxML_info.T1: logs in running RAxML.
  • temp: temporary data used to check the quality of outputs in each step.

    • conserved_protein: highly conserved proteins retrieved from the KEGG database.
    • alignment: aligned sequences.
    • concatenate: concatenated sequences and conserved blocks.

      • concatenate.fasta: concatenated HCP sequences.
      • concatenate.fasta-gb1: conserved blocks (by Gblocks).
      • concatenate.fasta-gb1.htm: conserved blocks displayed in html.
      • concatenate.fasta-gb1.phy: conserved blocks in the PHYLIP format.

Automatically reconstruct species trees by SSU rRNA

$ PhySpeTree autobuild -i organism_example_list.txt --srna
Loading organisms names success.....

The result are store in:Outdata

Now loading data and constructing phylogenetic tree......
2016-10-29 20:12:49,353 SSU rRNA DB INFO: Read organisms names success
2016-10-29 20:12:54,582 SSU rRNA DB INFO: Retrieve and download of organism 'aca' SSU rRNA sequence was successful
2016-10-29 20:12:56,831 SSU rRNA DB INFO: Retrieve and download of organism 'ace' SSU rRNA sequence was successful
2016-10-29 20:12:59,182 SSU rRNA DB INFO: Retrieve and download of organism 'acl' SSU rRNA sequence was successful
2016-10-29 20:13:01,545 SSU rRNA DB INFO: Retrieve and download of organism 'acn' SSU rRNA sequence was successful
2016-10-29 20:13:04,096 SSU rRNA DB INFO: Retrieve and download of organism 'aco' SSU rRNA sequence was successful
2016-10-29 20:13:06,972 SSU rRNA DB INFO: Retrieve and download of organism 'acp' SSU rRNA sequence was successful
2016-10-29 20:13:09,943 SSU rRNA DB INFO: Retrieve and download of organism 'adg' SSU rRNA sequence was successful
2016-10-29 20:13:12,707 SSU rRNA DB INFO: Retrieve and download of organism 'adk' SSU rRNA sequence was successful
2016-10-29 20:13:16,015 SSU rRNA DB INFO: Retrieve and download of organism 'aeh' SSU rRNA sequence was successful
2016-10-29 20:13:18,969 SSU rRNA DB INFO: Retrieve and download of organism 'aeq' SSU rRNA sequence was successful

Outputs:

log.log
Outdata/
        RAxML_bestTree.T1
        RAxML_bipartitions.T1
        RAxML_bipartitionsBranchLabels.T1
        RAxML_bootstrap.T1
        RAxML_info.T1
   temp/
        rna_sequence20161029201249/
                                 rna_sequence.fasta

        rna_alignment20161029201319/
                                  rna_sequence.fasta
                                  rna_sequence.fasta-gb1
                                  rna_sequence.fasta-gb1.htm
                                  rna_sequence.fasta-gb1.phy
  • log.log: logs.
  • Outdata: tree files like the HCP method.
  • temp: temporary data used to check the quality of outputs in each step.

    • rna_sequence: SSU rRNA sequences retrieved from the SILVA database.
    • rna_alignment: aligned sequences and conserved blocks.

      • rna_sequence.fasta: aligned SSU rRNA sequences.
      • rna_sequence.fasta-gb1: conserved blocks (by Gblocks).
      • rna_sequence.fasta-gb1.htm: conserved blocks displayed in html.
      • rna_sequence.fasta-gb1.phy: conserved blocks in the PHYLIP format.

Advanced options

Advanced options of internal software called in PhySpeTree can be set. These options are enclosed in single quotes and start with a space.

Here is an example of setting RAxML advanced options by --raxml_p:

$ PhySpeTree autobuild -i organism_example_list.txt -o test --srna --raxml --raxml_p ' -f a -m GTRGAMMA  -p 12345 -x 12345 -# 100 -n T1'

--muscle

Multiple sequence alignment by MUSCLE (default).

--muscle_p

Set MUSCLE advanced parameters, please see MUSCLE Manual

The default option:

option description
-maxiter Maximum number of iterations to run. The default is 100.

--clustalw

Multiple sequence alignment by ClustalW2.

--clustalw_p

Set ClustalW2 advanced parameters, please see Clustalw Help.

--mafft

Multiple sequence alignment by mafft.

--mafft_p

Set mafft advance parameters. Here use mafft default parameters, please see mafft algorithms

--gblocks

Trim by Gblocks.(default)

--gblocks_p

Set Gblocks advanced parameters, please see Gblocks documentation.

The default option:

option description
-t Choice type of sequence (default).
-e Generic file extension. The default in PhySpeTree is "-gbl1".

--trimal

Trim by trimal.

--trimal_p

Set trimal advance parameters, please seetrimal command line

--ranxml

Reconstruct species tree by RAxML (default).

--raxml_p

Set RAxML advanced parameters, please see RAxML Manual.

The default option:

option description
-f select algorithm. The default in PhySpeTree is a, rapid Bootstrap analysis and search for best­scoring ML tree in one program run.
-m Model of binary (morphological), nucleotide, multi­State, or amino acid substitution. The PhySpeTree default set is PROTGAMMAJTTX.
-p Specify a random number seed for the parsimony inferences. The default in PhySpeTree is 12345.
-x Specify an integer number (random seed) and turn on rapid bootstrapping. The default in PhySpeTree is 12345.
-N The same with -# specify the number of alternative runs on distinct starting trees. The default in PhySpeTree is 100.

--fasttree

Reconstruct species tree by FastTree.

--fasttree_p

Set FastTree advanced parameters, please see FastTree Helps.

--iqtree

Reconstruct species tree by iqtree.

--iqtree_p

Set iqtree advanced parameters, please see IQ-TREE.

build

The build module is used to reconstruct species trees with manually prepared sequences. Advanced options are the same as autobuild module.

# multiple method
$ PhySpeTree build -i example_hcp -o output --multiple

# single method
$ PhySpeTree build -i example_16s_ssurna.fasta -o output --single

build options

option Description
-h Print help message and exits.
-i Input a TXT file containing abbreviated species names.
-o A directory to store outputs. The default is "Outdata".
-t Number of processing threads (CPUs). The default is 1.
--multiple Specify concatenate highly conserved protein method to reconstruct phylogenetic tree.
--single Use SSU rRNA data to reconstruct phylogenetic tree.

Example

Build species trees by manually prepared HCP

The HCP sequences belonging to the same class are prepared in one FASTA format file, and all FASTA format files are stored in the same folder. For example, the folder example_build_hcp contains 10 classes of HCP (p1~p10) corresponding to 10 different organisms. There is no limit number of HCP sequences. We recommend no less than 10 highly conserved proteins to ensure the accuracy of the reconstructed phylogenetic tree.

Download and decompress the example input file:


$ wget "https://yangfangs.github.io/physpetools/example/example_build_hcp.tar.gz"

--2016-10-29 20:40:41--  https://yangfangs.github.io/physpetools/example/example_build_hcp.tar.gz
Resolving yangfangs.github.io (yangfangs.github.io)... 151.101.48.133
Connecting to yangfangs.github.io (yangfangs.github.io)|151.101.48.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 17419 (17K) [application/octet-stream]
Saving to: ‘example_build_hcp.tar.gz’

example_build_hcp.tar.gz     100%[==============================================>]  17.01K  --.-KB/s    in 0.009s  

2016-10-29 20:40:42 (1.92 MB/s) - ‘example_build_hcp.tar.gz’ saved [17419/17419]


$ tar -zxvf example_build_hcp.tar.gz

example_build_hcp/                                                                                                   
example_build_hcp/p1.fasta
example_build_hcp/p2.fasta
example_build_hcp/p3.fasta
example_build_hcp/p4.fasta
example_build_hcp/p5.fasta
example_build_hcp/p6.fasta
example_build_hcp/p7.fasta
example_build_hcp/p8.fasta
example_build_hcp/p9.fasta
example_build_hcp/p10.fasta

Check HCP:

$ cd  example_build_hcp/
$ cat p1.fasta 
>aeh
MARLTKRQKAIREKIDPAQQYPVAEALGLLRELPGAKFTESVEVAVNLGVDPRKSDQIVR
GSTVLPNGTGKTVRVAVFAQGDAAEAAKEAGADIVGMDDLAEQVKGGNLDFDVVVAAPDA
MGVVGRLGPILGPRGLMPNPKVGTVTPDVAGAVKNAKAGQVRYRTDKGGIIHCAIGKVDF
EVEALQQNLQALITDLQKLKPANSKGVYLKKVAVSTTMGPGLAVDLASLET
>adk
MAKLTKKQKAQQGKVDSTKLYPFAEAVALVKEAATAKFDESIDVAVQLGVDAKKSDQVVR
GAVVLPNGTGKTTRVAVFAQGAKAEEAKAAGADVVGMDDLAAQVKAGDMPFDVVIAAPDA
MRVVGTLGQILGPRGLMPNPKVGTVTPDVATAVKNAKAGQVQFRVDKAGIVHTTIGRRSF
ADDKLQGNLAALIEALNKAKPATSKGVYLRKVAVSSTMGVGVRVDTQSIAA
>acp
MAHVAKKYKAAAEKVDRTKRYKLDEAMSLVKQTATKKFDETVDASINLGVDPKHADQVVR
GAVVLPHGMGKTVRLAVFAKGDKAKEAQEAGADIVGAEDLAEKIQGGFMDFDKLIATPDM
MGVVGRLGKILGPRGLMPNPKVGTVTMDLARAVKEQKAGKVEFRVEKAGIVHVPFGKASF
DPDKLKANFSAIMEVIYKAKPQTAKGVYVKNVTLSTTMGPGIKVDLAELAAQHA
>acn
MSGDGSSYSAEEGIRELLQSAKAKFRESVDVAIKLSVADSKSGESVRGAVVLPKGLGREV
RVAVFAKGEHAKHASDAGADVVGDEELIEEIKKGRKLDVDWCIATPDFMPQISAIAKILG
PRGLMPNPKFGTVTLELAKMVGVIKSGQVKFKSDRYGIVHVKIGDVSFTPEDLLENFNAV
VVAVQNLKPATIKGSYVRGVFVNSTMGRSFRIAGIG
>adg
MPKHGKKYLEAKKQVDRTKLYDPYEALELVKRLASAKFDETVEVAVRLGVDPRHADQQVR
GAVVLPHGTGKTRRVLVFARGEKAKEAEAAGADYVGAEDLIARIQGGWLDFDVAIATPDM
MAMVGRIGRILGPRGLMPNPKTGTVTFDVAQAVAEAKAGRVEYRTDKAGIVHAPIGKVSF
EVEKLVENLKALVDALVRAKPPAAKGQYLRSITVSSTMGPGVKVNPAKLLAS
>acl
MKRGKKYLEAVKLYDKSVAYTGLEAVELAKKTSVAKFDATVEVAFRLNVDPRKADQNLRG
AISLPHGTGKTVRVVVIAKPEKAKEALAAGALEAGDVELIDKIGKGWFDFDVMVATPDMM
AQLGKLGRVLGPKGLMPNPKTGTVTLDVAKAVEEIKAGKIEYRTDKVGNIHAPIGKVSFD
SNKLHENMLAIYNQLVRIKPATVKGTYIKKIALSTTMGPGIMVEENNIKK
>ace
MKRGKKYRAAAQLVDRTKLYSPLEAMRLAKQTNTMRVPATVEVAMRLGVDPRKADQMVRG
TVNLPHGTGKTPRVLVFATAERAEEARAAGADYVGADELIEQVANGFLDFDAVVATPDLM
GKVGRLGRILGPRGLMPNPKTGTVTNDVAKAVADIKSGKIEFRVDRQANLHLVIGKTDFT
EQQLVENYAAALDEVLRLKPPTAKGRYLKKVTISTTMGPGIPVDPNRVRNLLAEETAAA
>aeq
MTKHGKKYVEAEKQIPAEPVSPLAAMKLLKEISVANFDETVTGDFRLGIDTRQADQQLRG
TVSLPNGSGKTVRVAVFAEGAAAQAAEEAGADIVGTDELMQQIQAGEFNFDAAVATPDQM
GKVGRLGKILGPRGLMPNPKLGTVTNDVAKAIKELKGGRVEYRADRYGIAHVVLGKVSFT
PEQLAENYGAVYDEILRMKPAAAKGKYVKSITVSGTMTPGVSVDSSVTRAYTESAE
>aca
MSKKVSKNVAKARAAVEPRPYTLQDAVPLLQQVKFAKFDETVDLTMRLGVDPRHADQMVR
GTVVLPHGLGKTKKVAVITTGDRQKEAEAAGAEIVGGEELVEKIQKESWTDFDALIATPD
MMRSVGRLGKVLGPRGLMPNPKTGTVTNDVAAAVKEIKAGKIEYRTDKTALVHVPVGKLS
FPAEKLIDNAMTVITSVVRAKPSAAKGKYIKGITLSSTMGPGIPLDGSVADAAAKA
>aco
MAKKSKRYSEIAAKVDSTKLYGLREAVDLYKEVATAKFDESLEVHIRLGVDPRHADQQVR
GTIVLPHGTGITKRVLVLAVGEKVKEAEDAGADIVGGDDLIQKISTGWLDFDAVIATPDM
MKSVGRLGKILGPRGLMPSAKAGTVTFDVADAIKEIKAGRVEFRVDKTAIIHNMVGKKSF
EAEKLFENLKVLYRAILKARPASAKGTYVRSFYIAPTMGVGIKIDPVAASKEVAEA

Reconstruct species tree and store outputs in the build_hcp_tree folder:

PhySpeTree build -i example_build_hcp -o build_hcp_tree --multiple

Build species trees by manually prepared SSU rRNA

All SSU rRNA sequences are prepared in one FASTA format file, for example example_build_srna.

Download and decompress the example input file:

$ wget "https://yangfangs.github.io/physpetools/example/example_build_srna.fasta"

--2016-10-29 20:56:31--  https://yangfangs.github.io/physpetools/example/example_build_srna.fasta
Resolving yangfangs.github.io (yangfangs.github.io)... 151.101.48.133
Connecting to yangfangs.github.io (yangfangs.github.io)|151.101.48.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14982 (15K) [application/octet-stream]
Saving to: ‘example_build_srna.fasta’

example_build_srna.fasta     100%[==============================================>]  14.63K  --.-KB/s    in 0.005s  

2016-10-29 20:56:33 (3.14 MB/s) - ‘example_build_srna.fasta’ saved [14982/14982]

$ grep '>' example_build_srna.fasta 
>aca
>ace
>acl
>acn
>aco
>acp
>adg
>adk
>aeh
>aeq

Reconstruct species tree and store outputs in the build_srna_tree folder:

PhySpeTree build -i example_build_srna.fasta -o build_srna_tree --single --fasttree

combine

The combine module is used to combine trees generated from different methods. It contains two steps, at first merge different tree files into the same file. You can use cat bash command in the Linux system, for example:

$ cat tree1.tree tree2.tree > combineTree.tree

Then, use combine:

$ PhySpeTree combine -i combineTree.tree [options]*

combine options

option Description
-h Print help message and exits.
-i Input PHYLIP format file containing multiple trees.
-o Output directory. The default is "combineTree".
--mr Majority rule trees.
--mre Extended majority rule trees.
--strict Strict consensus trees.
--astral Use ASTRAL combine multi gene tree.
--supertree Use Spr_Supertree combining conflicting evolutionary histories that are due to lateral gene transfer (LGT).

Example

example_combine_tree.tar.gz contains tree1.tree and tree2.tree reconstructed by the HCP and SSU rRNA method, respectively.

Download and decompress the example input file:


$ wget "https://yangfangs.github.io/physpetools/example/example_combine_tree.tar.gz"

--2016-10-30 13:32:06--  https://yangfangs.github.io/physpetools/example/example_combine_tree.tar.gz
Resolving yangfangs.github.io (yangfangs.github.io)... 151.101.48.133
Connecting to yangfangs.github.io (yangfangs.github.io)|151.101.48.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 661 [application/octet-stream]
Saving to: ‘example_combine_tree.tar.gz’

example_combine_tree.tar.gz  100%[==============================================>]     661  --.-KB/s    in 0s      

2016-10-30 13:32:07 (380 MB/s) - ‘example_combine_tree.tar.gz’ saved [661/661] 


$ tar -zxvf example_combine_tree.tar.gz 
example_combine_tree/
example_combine_tree/tree2.tree
example_combine_tree/tree1.tree

Merge tree1.tree and tree2.tree:

$ cd example_combine_tree/
$ cat tree1.tree tree2.tree > combine.tree

Combine trees:

PhySpeTree combine -i combine.tree -o combineTree

Outputs:

combine/
        RAxML_info.T1
        RAxML_MajorityRuleConsensusTree.T1
  • RAxML_info.T1: logs in running RAxML.
  • RAxML_MajorityRuleConsensusTree.T1: the majority rule consensus tree.

Using --astral option

Notice: The --astral option calls the third-party software ASTRAL. Please be aware that JRE has been installed in your running environments. For users who run the Docker image of PhySpeTree, JRE is unnecessary.

PhySpeTree combine -i combine.tree -o combineTree --astral

Outputs:

combineTree/
           combine.tree

Using --supertree option

  • Use Spr_Supertree combining conflicting evolutionary histories that are due to lateral gene transfer (LGT).

Download example tree:

$ wget "https://yangfangs.github.io/physpetools/example/trees.tree"
--2019-10-02 15:49:42--  https://yangfangs.github.io/physpetools/example/trees.tree
Resolving yangfangs.github.io (yangfangs.github.io)... 185.199.108.153, 185.199.109.153, 185.199.110.153, ...
Connecting yangfangs.github.io (yangfangs.github.io)|185.199.108.153|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 661 [application/octet-stream]
Saving to: “trees.tree”

trees.tree                100%[=====================================>]     140  --.-KB/s  用时 0s

2019-10-02 15:49:44 (168 MB/s) - saved “trees.tree” [140/140])

supertree option:

PhySpeTree combine -i trees.tree -o Supertree --supertree

Outputs:

Supertree/
          spr_supertree.tree

iview

PhySpeTree provides the iview module to annotate taxonomic information (kingdom, phylum, class, or order) of output trees and to generate configure files linked to iTol.

$ PhySpeTree iview -i organism_example_list.txt --range

iview options

option Description
-h Print help message and exits.
-i Input a TXT file containing abbreviated species names.
-o A directory to store outputs. The default is "iview".
-a Colored ranges [kingdom, phylum, class or order].
-r/--range Annotating labels with ranges by kingdom, phylum, class or order. The default is phylum.
-c/--color Annotating labels without ranges by kingdom, phylum, class or order. The default is phylum.
-l/--labels Change species labels from abbreviated names to full names.

Example

Download the example file:

$ wget "https://yangfangs.github.io/physpetools/example/organism_example_list.txt"

--2016-10-30 13:40:48--  https://yangfangs.github.io/physpetools/example/organism_example_list.txt
Resolving yangfangs.github.io (yangfangs.github.io)... 151.101.48.133
Connecting to yangfangs.github.io (yangfangs.github.io)|151.101.48.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 39 [text/plain]
Saving to: ‘organism_example_list.txt’

organism_example_list.txt    100%[==============================================>]      39  --.-KB/s    in 0s      

2016-10-30 13:40:50 (21.5 MB/s) - ‘organism_example_list.txt’ saved [39/39]

Annotate the tree by kingdom

$ PhySpeTree iview -i organism_example_list.txt --range -a kingdom
Color range by kingdom was complete.

The color range file is store in the iview folder:

$ cd iview
$ cat range_color_by_kingdom.txt 
TREE_COLORS
SEPARATOR TAB
DATA
aca     range   #BEBF5A Prokaryotes
ace     range   #BEBF5A Prokaryotes
acl     range   #BEBF5A Prokaryotes
acn     range   #BEBF5A Prokaryotes
aco     range   #BEBF5A Prokaryotes
acp     range   #BEBF5A Prokaryotes
adg     range   #BEBF5A Prokaryotes
adk     range   #BEBF5A Prokaryotes
aeh     range   #BEBF5A Prokaryotes
aeq     range   #BEBF5A Prokaryotes

Annotate the tree by phylum

$ PhySpeTree iview -i organism_example_list.txt --range -a phylum
Color range by phylum was complete.

The color range file is store in the iview folder:

$ cd iview
$ cat range_color_by_phylum.txt 
TREE_COLORS
SEPARATOR TAB
DATA
aca     range   #865142 Bacteria
ace     range   #865142 Bacteria
acl     range   #865142 Bacteria
acn     range   #865142 Bacteria
aco     range   #865142 Bacteria
acp     range   #865142 Bacteria
adg     range   #865142 Bacteria
adk     range   #865142 Bacteria
aeh     range   #865142 Bacteria
aeq     range   #865142 Bacteria

Annotate the tree by class

$ PhySpeTree iview -i organism_example_list.txt --range -a class
Color range by class was complete.

The color range file is store in the iview folder:

$ cd iview
$ cat range_color_by_class.txt 
TREE_COLORS
SEPARATOR TAB
DATA
aca     range   #9AB7F3 Acidobacteria
ace     range   #99D1DB Actinobacteria
acl     range   #A5E58D Tenericutes
acn     range   #94F1C1 Alphaproteobacteria
aco     range   #D67A21 Synergistetes
acp     range   #DD9284 Deltaproteobacteria
adg     range   #3E70B8 Firmicutes - Clostridia
adk     range   #DDC8B7 Betaproteobacteria
aeh     range   #72E137 Gammaproteobacteria - Others
aeq     range   #99D1DB Actinobacteria

Annotate the tree by order

$ PhySpeTree iview -i organism_example_list.txt --range -a order
Color range by order was complete.

The color range file is store in the iview folder:

$ cd iview
$ cat range_color_by_order.txt 
TREE_COLORS
SEPARATOR TAB
DATA
aca     range   #AA8761 Acidobacterium
ace     range   #8770BC Acidothermus
acl     range   #3BD26B Acholeplasma
acn     range   #D1B487 Anaplasma
aco     range   #D96D21 Aminobacterium
acp     range   #AC4E16 Anaeromyxobacter
adg     range   #287AD8 Ammonifex
adk     range   #C8184E Alicycliphilus
aeh     range   #57A569 Alkalilimnicola
aeq     range   #F1A2B7 Adlercreutzia

check

The check module is used to check whether input organisms are in pre-built databases.

$ PhySpeTree check -i organism_example_list.txt --ehcp

check options

option Description
-h Print help message and exits.
-i Input a TXT file containing abbreviated species names.
-o A directory to store outputs. The default is "check".
--hcp Check whether organisms are supported in the KEGG database.
--ehcp Check input organisms prepare for extend autobuild tree module.
--srna Check whether organisms are supported in the SILVA database.

Example

Check extended organisms in autobuild

Determine proteins to be prepared in the autobuild module with the --ehcp option, for example, organism_example_list.txt

Download the example file:

$ wget "https://yangfangs.github.io/physpetools/example/organism_example_list.txt"

--2016-10-30 13:40:48--  https://yangfangs.github.io/physpetools/example/organism_example_list.txt
Resolving yangfangs.github.io (yangfangs.github.io)... 151.101.48.133
Connecting to yangfangs.github.io (yangfangs.github.io)|151.101.48.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 39 [text/plain]
Saving to: ‘organism_example_list.txt’

organism_example_list.txt    100%[==============================================>]      39  --.-KB/s    in 0s      

2016-10-30 13:40:50 (21.5 MB/s) - ‘organism_example_list.txt’ saved [39/39]

Check:

$ PhySpeTree check -i organism_example_list.txt --ehcp

'Ribosomal protein L1' ----------------------------------> p1.fasta

'DNA-directed RNA polymerase subunit alpha' ----------------------------------> p2.fasta

'Leucyl-tRNA synthetase' ----------------------------------> p3.fasta

'Metal-dependent proteases with chaperone activity' ----------------------------------> p4.fasta

'Phenylalanine-tRNA synthethase alpha subunit' ----------------------------------> p5.fasta

'Predicted GTPase probable translation factor' ----------------------------------> p6.fasta

'Ribosomal protein L11' ----------------------------------> p7.fasta

'Ribosomal protein L13' ----------------------------------> p8.fasta

'Ribosomal protein L14' ----------------------------------> p9.fasta

'Ribosomal protein L22' ----------------------------------> p10.fasta

'Ribosomal protein L3' ----------------------------------> p11.fasta

'Ribosomal protein L5' ----------------------------------> p12.fasta

'Ribosomal protein S11' ----------------------------------> p13.fasta

'Ribosomal protein S17' ----------------------------------> p14.fasta

'Ribosomal protein S2' ----------------------------------> p15.fasta

'Ribosomal protein S3' ----------------------------------> p16.fasta

'Ribosomal protein S4' ----------------------------------> p17.fasta

'Ribosomal protein S5' ----------------------------------> p18.fasta

'Ribosomal protein S7' ----------------------------------> p19.fasta

'Ribosomal protein S8' ----------------------------------> p20.fasta

'Ribosomal protein S9' ----------------------------------> p21.fasta

'Seryl-tRNA synthetase' ----------------------------------> p22.fasta

'Arginyl-tRNA synthetase' ----------------------------------> p23.fasta

'DNA-directed RNA polymerase beta subunit' ----------------------------------> p24.fasta

'Ribosomal protein S13' ----------------------------------> p25.fasta

Check extend highly conserved protein is completed.

The check result is stored in the check folder. In physpe_echp_extend.txt file indicates class of HCP and their corresponding names, which will be used to prepare extended HCP sequences.

$ cd check
$ cat physpe_echp_extend.txt 
'Ribosomal protein L1' ----------------------------------> p1.fasta
'DNA-directed RNA polymerase subunit alpha' ----------------------------------> p2.fasta
'Leucyl-tRNA synthetase' ----------------------------------> p3.fasta
'Metal-dependent proteases with chaperone activity' ----------------------------------> p4.fasta
'Phenylalanine-tRNA synthethase alpha subunit' ----------------------------------> p5.fasta
'Predicted GTPase probable translation factor' ----------------------------------> p6.fasta
'Ribosomal protein L11' ----------------------------------> p7.fasta
'Ribosomal protein L13' ----------------------------------> p8.fasta
'Ribosomal protein L14' ----------------------------------> p9.fasta
'Ribosomal protein L22' ----------------------------------> p10.fasta
'Ribosomal protein L3' ----------------------------------> p11.fasta
'Ribosomal protein L5' ----------------------------------> p12.fasta
'Ribosomal protein S11' ----------------------------------> p13.fasta
'Ribosomal protein S17' ----------------------------------> p14.fasta
'Ribosomal protein S2' ----------------------------------> p15.fasta
'Ribosomal protein S3' ----------------------------------> p16.fasta
'Ribosomal protein S4' ----------------------------------> p17.fasta
'Ribosomal protein S5' ----------------------------------> p18.fasta
'Ribosomal protein S7' ----------------------------------> p19.fasta
'Ribosomal protein S8' ----------------------------------> p20.fasta
'Ribosomal protein S9' ----------------------------------> p21.fasta
'Seryl-tRNA synthetase' ----------------------------------> p22.fasta
'Arginyl-tRNA synthetase' ----------------------------------> p23.fasta
'DNA-directed RNA polymerase beta subunit' ----------------------------------> p24.fasta
'Ribosomal protein S13' ----------------------------------> p25.fasta

Check whether input organisms are supported in PhySpeTree

Check whether input species are supported by the KEGG database when using the --hcp method, for example example download.

Download the example file:

$ wget "https://yangfangs.github.io/physpetools/example/191speciesnames.txt"

--2016-10-30 14:48:21--  https://yangfangs.github.io/physpetools/example/191speciesnames.txt
Resolving yangfangs.github.io (yangfangs.github.io)... 151.101.48.133
Connecting to yangfangs.github.io (yangfangs.github.io)|151.101.48.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 773 [text/plain]
Saving to: ‘191speciesnames.txt’

191speciesnames.txt          100%[==============================================>]     773  --.-KB/s    in 0s      

2016-10-30 14:48:22 (322 MB/s) - ‘191speciesnames.txt’ saved [773/773]

The check results show one organism named 'ges' is not supported in PhySpeTree:

$ PhySpeTree check -i 191speciesnames.txt --hcp
WARNING: The following species are not supported by KEGG DATABASE:
ges
Checked  whether the input species names in KEGG DATABASE completed.

Check whether input species are supported by SILVA database when using the --srna metho, for example example download

Download the example file:

$ wget "https://yangfangs.github.io/physpetools/example/191speciesnames.txt"

--2016-10-30 14:48:21--  https://yangfangs.github.io/physpetools/example/191speciesnames.txt
Resolving yangfangs.github.io (yangfangs.github.io)... 151.101.48.133
Connecting to yangfangs.github.io (yangfangs.github.io)|151.101.48.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 773 [text/plain]
Saving to: ‘191speciesnames.txt’

191speciesnames.txt          100%[==============================================>]     773  --.-KB/s    in 0s      

2016-10-30 14:48:22 (322 MB/s) - ‘191speciesnames.txt’ saved [773/773]

The check results show 28 organisms are not supported in PhySpeTree:


(progect) [yangfang@localhost test_check] $ PhySpeTree check -i 191speciesnames.txt --srna
WARNING: The following species are not supported by SILVA DATABASE:
neq
ape
tac
mmp
gla
tps
cho
ddi
spo
aga
tru
mpu
lin
ban
bce
ljo
san
spg
ges
lis
sco
cdi
mle
wsu
rpr
bpe
bpa
ppr
Checked  whether the input species names in SILVA DATABASE completed.

For organisms not in the pre-built list, PhySpeTree provides extend options (--echp or --esrna) to insert manually prepared sequences.