What is UBCG2?


UBCG2 stands for Up-to-date Bacterial Core Genes, which is a pipeline that infers phylogenetic relationship using the core genes that we have defined with up-to-date genome databases.

UBCG2 pipelines commonly feature following functions:

  • Extraction of pre-defined core genes from genome assemblies
  • Multiple alignment and concatenation of the genes
  • Phylogenetic analysis using core gene profiles
  • Calculation and embedding of GSI

Genes used in the pipeline


UBCG2 pipeline utilizes the most widely used methodology for the genome-based phylogenetic tree reconstruction, the core genes. Core genes can be defined as:

  • Genes that are present in a majority of species
  • Genes that are present as a single copy (likely orthologous but not paralogous)

We have compiled the core gene set (UBCG2) using the latest version of high-quality genome databases.

Concept of the pipeline


We designed the pipeline for the users who wish to analyze hundreds of whole-genome assemblies. The concept behind our design is briefed here to help you understand and maximize the utility of our pipeline.

  • UBCG2 pipeline extracts a domain-specific core gene, using the corresponding HMM profiles.
  • Extracted UBCG profiles from each genome are stored in a single .ucg formatted file. See below for more information about this file format.
  • UBCGtree pipeline carries out phylogenetic analysis using a set of .ucg files from different species. UBCG2 pipeline will automatically align, concatenate, and filter each gene, calculate GSIs, and build the phylogenetic tree consists of the species.
  • If you want to run the pipeline with another set of .ucg profiles, gather them in another directory and relaunch the UBCGtree pipeline.

File formats used in UBCG2 pipeline

Format Input Output Desciption
.fa .fna .fasta UBCG2 - Standard FASTA file format for genome sequences. UBCG2 pipeline converts these into .ucg files.
.ucg UBCGtree UBCG2 JSON-formatted profile containing extracted sequences of core genes, along with the metadata of the genome. These files can also be read and edited via any text editor.
.nwk - UBCGtree Standard Newick format file for phylogenetic trees.
.trm - UBCGtree JSON-formatted file containing Newick-formatted trees and the metadata of individual gene trees and concatenated UBCG tree.
.log - UBCGtree Log file in a text format containing detailed information about the UBCGtree pipeline run.

 

UBCG TEAM

UBCG2

CA

Jihyeon Kim1,2
Seongin Na1
Dongwook Kim1,2
Jon Jongsik Chun1,2,3

1 Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 00826, Republic of Korea

2 Institute of Molecular Biology and Genetics, Seoul National University, Seoul 00826, Republic of Korea

3 School of Biological Sciences, Seoul National University, Seoul 00826, Republic of Korea