-
Notifications
You must be signed in to change notification settings - Fork 178
Why ggtree is special?
Innovations within ggtree
include.
- parsing data from several evolution software
- not only for visualization in
ggtree
, but also bring these data toR
user for further analysis (e.g. summarization, visualization)
- not only for visualization in
- viewing and annotating phylogenetic tree, programmatically in
R
- other
R
packages that can view phylogenetic tree only contains plot functions for special cases, including those implemented withggplot2
- other
- support grammar of graphics implemented in
ggplot2
- only
ggtree
supports grammar of graphics for phylogenetic tree annotation - unlike other packages defined functions for specific cases with pre-defined style, functions implemented in
ggtree
are only building block to facilitate users to create their tree view - users have no restriction to annotate tree, even with their own data
- only
- two dimension tree
- re-scale y-axis to view the changes along branches
- to my knowledge, not implemented elsewhere
see user comments.
It's different from other tree viewers which all limit a user to pre-defined specific cases of tree views. ggtree
doesn't define how annotation should be presented. Users have no restrictions on presenting data in their favorite way, and complex tree views can be achieved via multiple layers of annotation.
The ggtree
grammar extends ggplot2
which is widely used in biomedicine and ecology. Many researchers in these fields are already familiar with the grammar of graphics.
There are several packages that implement tree viewers using ggplot2
, including ggphylo, OutbreakTools and phyloseq.
Using ggplot2
can't guarantee that the grammar of graphics is supported. Among these packages, only ggtree
fully supports grammar of graphics, while others only implement a limited tree viewer designed for a specific need.
This package is designed for viewing phylogenetic tree with alignment.
It stopped updating since 2012 and the alignment part is not yet implemented.
PS. Viewing phylogenetic tree with alignment is supported in
ggtree
.
The ggphylo function is complex and how to view a tree is pre-defined with parameter to control it's behavior.
As showed in the screenshot, it created several data.frame
and the tree was draw by q <- ggplot(lines.df)
. ggphylo
parses a tree as a collection of lines which is meaningless (information only related to taxa).
OutbreakTools is designed for disease outbreak analysis and viewing phylogenetic tree is not their major focus.
The tree view function plotggphy
is only applicable to obkData
class defined within this package. It can't be used to view phylogenetic tree parsed by newick
file directly.
As showed in the screenshot, it has similar design as in ggphylo
that creates several data.frame
and draws the tree via p <- ggplot(data=df.edge)
. It also parse a tree as a collection of lines.
phyloseq is designed for viewing microbiome census data.
The tree viewer defined in phyloseq
only applied to phyloseq
class. It either can't be used to view tree parsed by newick
file directly.
Internally, it called ape
to calculate edge positions.
It draw horizontal lines followed by vertical lines.
- designed for specific need
- ggphylo for alignment (not implemented yet)
- OutbreakTools for outbreak data
- phyloseq for microbiome census data
- not applicable for widely use tree file format
- plotggphy in OutbreakTools assumes input as an instance of obkData
- plot_tree in phyloseq assumes input as an instance of phyloseq
- not extensible
- tree is draw by lines, but information is related to taxa (nodes & tips)
- tree data (lines, nodes, labels) was separated in different data.frame/data.table, make it impossible for user to further modify the tree
Using
ggplot2
can't guarantee that the grammar of graphics is supported. Among these packages, onlyggtree
supports grammar of graphics, while others only implemented tree viewer for specific need.
As I mentioned at the beginning, only ggtree
supports grammar of graphics.
In ggphylo
:
lines.df <- subset(layout.df, type=='line')
nodes.df <- subset(layout.df, type=='node')
labels.df <- subset(layout.df, type=='label')
internal.labels.df <- subset(layout.df, type=='internal.label')
q <- ggplot(lines.df)
geom.fn <- switch(aes.type,
line='geom_joinedsegment',
node='geom_point',
label='geom_text',
internal.label='geom_text'
)
q <- q + do.call(geom.fn, geom.list)
In OutbreakTools
:
ggphy <- phylo2ggphy(phylo, tip.dates = tip.dates, branch.unit = branch.unit)
##TODO: allow edge and node attributes and merge with df.edge and df.node
df.tip <- ggphy[[1]]
df.node <- ggphy[[2]]
df.edge <- ggphy[[3]]
p <- ggplot(data = df.edge)
p <- p + geom_segment(data = df.edge, aes(x = x.beg, xend = x.end, y = y.beg, yend = y.end), lineend = "round")
p <- p + scale_y_continuous("", breaks = NULL)
if (show.tip.label) {
p <- p + geom_text(data = df.tip, aes(x = x, y = y, label = label), hjust = 0, size = tip.label.size)
}
In phyloseq
:
treeSegs <- tree_layout(phy_tree(physeq), ladderize=ladderize)
edgeMap = aes(x=xleft, xend=xright, y=y, yend=y)
vertMap = aes(x=x, xend=x, y=vmin, yend=vmax)
# Initialize phylogenetic tree.
# Naked, lines-only, unannotated tree as first layers. Edge (horiz) first, then vertical.
p = ggplot(data=treeSegs$edgeDT) + geom_segment(edgeMap) +
geom_segment(vertMap, data=treeSegs$vertDT)
if(!is.null(label.tips)){
# `tiplabDT` has only one row per tip, the farthest horizontal
# adjusted position (one for each taxa)
tiplabDT = dodgeDT
tiplabDT[, xfartiplab:=max(xdodge), by=OTU]
tiplabDT <- tiplabDT[h.adj.index==1, .SD, by=OTU]
if(!is.null(color)){
if(color %in% sample_variables(physeq, errorIfNULL=FALSE)){
color <- NULL
}
}
labelMap <- NULL
if(justify=="jagged"){
labelMap <- aes_string(x="xfartiplab", y="y", label=label.tips, color=color)
} else {
labelMap <- aes_string(x="max(xfartiplab, na.rm=TRUE)", y="y", label=label.tips, color=color)
}
# Add labels layer to plotting object.
p <- p + geom_text(labelMap, tiplabDT, size=I(text.size), hjust=-0.1, na.rm=TRUE)
}
These tree view functions are just other ordinary plot functions. Although they use ggplot2
and we can for example use theme
to change background, scale_X
function to change XY axis and we can add nonsense layer above the tree just as we can produce grammar correct sentence that is nonsense. This is not the philosophy of grammar of graphics. We want to add layer that related to taxa in the tree, which is mostly impossible with these implementations.
The tree view can only be controlled via pre-defined parameters. As the code showed above, if we create a tree without labels we can't even add a layer of tip labels since the information is created within the function and we don't have that information (we only have the positions of lines after the tree was draw).
For example, in OutbreakTools
if (show.tip.label) {
p <- p + geom_text(data = df.tip, aes(x = x, y = y, label = label), hjust = 0, size = tip.label.size)
}
If show.tip.label = FALSE
, the df.tip
will be throw away when p
was returned. Then it's impossible to add tip label. The only way is pass show.tip.label=TRUE
at the very beginning when calling plotggphy
. The implementations in ggphylo
and phyloseq
are similar. User have no idea to add related information if they are not pre-defined in those functions.
All these implementation parsing a tree as a collection of lines. If we want annotate taxa with related information, we should calculate node positions based on the position of lines. It's even harder if the lines.df
doesn't contain information of mapping the line to node. Most of the users don't have that expertise and it's almost impossible to add new layer of related information.
ggtree
is different with the following features:
- extending
ggplot
- parse tree as a collection of taxa
Firstly we separating parsing tree (including common software output) from visualization. Secondly We didn't create complex plot
function, instead we extending ggplot
to support tree objects.
Tree is viewing via geom_tree
layer that created in ggtree
and complex tree view can be achieved via adding annotation layers that freely controlled by users.
tr <- rtree(30)
ggplot(tr, aes(x, y)) + geom_tree()
The ggtree
function is just a shortcut of ggplot() + geom_tree() + xlab(NULL)+ylab(NULL) + theme_tree()
.
We parse a tree as a collection of taxa, and only taxa (node) position was recorded. This make it easy to add information related to taxa (label, clade probability, bootstrap value, dN/dS, etc.).
ggplot(tr, aes(x, y)) + geom_tree() + geom_point() + geom_tiplab()
ggtree
creates many help functions (e.g. geom_tiplap()
in the example for adding tip labels) to highlight clade, annotate clade, classify taxa, rotate clade, rescale clade etc.. Although these functionalities can be found elsewhere, unlike other software, functions implemented in ggtree
are only building blocks and users have no restriction of employing these functionalities to create their own style to view a tree.
Output files of common software are supported and evolution evidences can be viewed directly in a new layer. Results of different software analyzing a same tree can be merged. This make it easy to combine and compare results from different software.
Plot functions defined in ggphylo
, OutbreakTools
and phyloseq
are all special cases that can be easily implemented by a few layers using ggtree
.
Please refer to the following links:
- reproduce ggphylo example using ggtree
- reproduce OutbreakTools example using ggtree
- reproduce phyloseq example using ggtree
As we mentioned above, tree views produced by ggphylo
, OutbreakTools
and phyloseq
are not extensible. This is actually not because they create complex function for specific data and needs. We can create equivalent functions using ggtree
for that specific need and assume specific data input. With ggtree
, such implementations are extensible, we still have ability to add new layers, to create more complex tree view.
ggphylo
, OutbreakTools
and phyloseq
are all good implementation to answer specific question, while ggtree
is designed as a general framework of viewing and annotating diverse meta-data. We aims to provide flexible and powerful tree annotator without restriction.
The differences between ggtree
and other packages is not only the designed, but more fundamentally/basically of how we parse a tree (ggtree
parses a tree as a collection of taxa while others parse a tree as a collection of lines). It's different from other tree viewers at the very beginning.
We use ggtree
to reproduce examples of other packages and compare the running time. The following figure summarize the run time comparison by normalizing run time with ggtree
.