1. What is the input of PhySpeTree?

  • Users only need to prepare a TXT file containing KEGG abbreviated species names. For example, organism_example_list.

  • Now, PhySpeTree not only supported the user input three- or four-letter organism codes (KEGG database organism codes) but also the NCBI taxonomy id (for example human taxonomy id is 9606). You can search taxonomy id form NCBI taxonomy database.

2. How to explain PhySpeTree outputs?

PhySpeTree returns two folders, Outdata contains the output species tree and temp includes temporary data. Files in temp can be used to check the quality of outputs in each step. If HCP method (--hcp) is selected, the temp folder includes:

  • conserved_protein: highly conserved proteins retrieved from the KEGG database.
  • alignment: aligned sequences.
  • concatenate: concatenated sequences and conserved blocks.

If SSU rRNA method (--srna) is selected, the temp folder includes:

  • rna_sequence: SSU rRNA sequences retrieved from the SILVA database.
  • rna_alignment: aligned sequences and conserved blocks.

3. What classes of HCP are selected?

PhySpeTree uses 31 HCP without horizontal transferred genes according to Ciccarelli et al..

cite:

Ciccarelli FD, Doerks T, Von Mering C, et al. Toward automatic reconstruction of a highly resolved tree of life[J]. science, 2006, 311(5765): 1283-1287.

The 31 HCP and corresponding KEGG KO number are shown in the following table:

Protein Names Eukaryotes KO Prokaryotes KO
DNA-directed RNA polymerase subunit alpha K03040 K03040
Ribosomal protein L1 K02865 K02863
Leucyl-tRNA synthetase K01869 K01869
Metal-dependent proteases with chaperone activity K01409 K01409
Phenylalanine-tRNA synthethase alpha subunit K01889 K01889
Predicted GTPase probable translation factor K06942 K06942
Preprotein translocase subunit SecY K10956 K10956
Ribosomal protein L11 K02868 K02867
Ribosomal protein L13 K02873 K02871
Ribosomal protein L14 K02875 K02874
Ribosomal protein L15 K02877 K17437
Ribosomal protein L16/L10E K02866 K02872
Ribosomal protein L18 K02883 K02882
Ribosomal protein L22 K02891 K02890
Ribosomal protein L3 K02925 K02906
Ribosomal protein L5 K02932 K02931
Ribosomal protein L6P/L9E K02940 K02939
Ribosomal protein S11 K02949 K02948
Ribosomal protein S15P/S13E K02958 K02956
Ribosomal protein S17 K02962 K02961
Ribosomal protein S2 K02981 K02967
Ribosomal protein S3 K02985 K02982
Ribosomal protein S4 K02987 K02986
Ribosomal protein S5 K02989 K02988
Ribosomal protein S7 K02993 K02992
Ribosomal protein S8 K02995 K02994
Ribosomal protein S9 K02997 K02996
Seryl-tRNA synthetase K01875 K01875
Arginyl-tRNA synthetase K01887 K01887
DNA-directed RNA polymerase beta subunit K03043 K03043
Ribosomal protein S13 K02953 K02952

4. How are SSU rRAN created?

The SSU rRAN sequences are created from the SILVA database (Release 132, Released: 13.12.2017). Sequences haven been truncated, which means unaligned nucleotides are removed.

5. How do I use PhySpeTree when I can't connect to the Internet?

When users can't connect to the Internet. They can download the HCP or SSU rRNA database to local and reconstruct species tree.

  • SSU rRNA database: database16s.tar.gz
  • HCP database: databasehcp.tar.gz

    If you can not clink the hyperlink to obtain SSU rRNA and HCP databases, you can download from ftp://23.105.208.65 by FTP tools.

Use $ tar -zxvf database16s.tar.gz decompress the download database.

Use -db option setting the absolute path to decompression directory.