R/VDJ_phylogenetic_trees.R
VDJ_phylogenetic_trees.Rd
Creates phylogenetic trees as tidytree dataframes from an input VDJ dataframe. The resulting phylogenetic trees can be plotted using VDJ_phylogenetic_trees_plot. Both of these functions require the tidytree and ggtree packages.
VDJ_phylogenetic_trees(
VDJ,
sequence.type,
as.nucleotide,
trimmed,
include.germline,
global.clonotype,
VDJ.VJ.1chain,
additional.feature.columns,
filter.na.columns,
maximum.lineages,
minimum.sequences,
maximum.sequences,
tree.algorithm,
tree.level,
n.trees.combined,
germline.scale.factor,
output.format,
parallel
)
VDJ or VDJ.GEX.matrix[[1]] object, as obtained from the VDJ_GEX_matrix function in Platypus.
string - sequences which will be used when creating the phylogenetic trees. 'cdr3' for CDR3s of both VDJs and VJs, 'cdrh3' for VDJ CDR3s, 'VDJ.VJ' for pasted full sequences of both VDJ and VJ, 'VDJ' for full VDJ sequences, 'VJ' for full VJ.
boolean - if T, will only consider the DNA sequences specified by sequence.type, else it will consider the amino acid ones.
boolean - in the case of full VDJ or VJ nt sequences, if the trimmed sequences should be consider (trimmed=T), or raw ones. You need to call MIXCR first on the VDJ dataframe using VDJ_call_MIXCR().
boolean - if T, a germline sequence will be included in the trees (root), obtained by pasting the VDJ_trimmed_ref and VJ_trimmed_ref sequences. You need to call MIXCR first on the VDJ dataframe using VDJ_call_MIXCR().
boolean - if T, will ignore samples from the sample_id column, creating global clonotypes.
boolean - if T, will remove aberrant cells from the VDJ matrix.
list of strings or NULL - VDJ column names which will comprise the per-sequence features to be included in the tidytree dataframe, which will be used to label nodes/ determines their color/ size etc. See also the VDJ_phylogenetic_trees_plot function.
list of strings - VDJ columns names: if a phylogenetic tree/tidytree dataframe has all elements = NA in that feature, that tree will be completely removed.
integer or 'all' - maximum number of clonotypes to create trees for. If 'all', will create trees for all clonotypes.
integer - lower bound of sequences for a tree. Defaults to 3. Trees with a lower number will be automatically removed.
integer - upper bound of sequences for a tree. Additional sequences will be removed, after being ordered by their total frequency.
string - the algorithm used when constructing the phylogenetic trees. 'nj' for Neighbour-Joining, 'bionj', 'fastme.bal', and 'fastme.ols'
string - level at which to build phylogenetic trees. 'intraclonal' - tree per clonotype, per sample, 'global.clonotype' - global clonotype trees (include.germline must be F), irrespective of sample, 'combine.first.trees' will combine the trees for the most expanded clonotypes, per sample (include.germline must be F).
integer - number of trees to combine if tree.level='combine.first.trees'.
numeric - as germlines are incredibly distant from their closest neighbours (in the tree), this controls the scale factor for the germline tree branch length for more intelligible downstream plotting.
string - 'tree.df.list' returns a nested list of tidytree dataframes, per clonotype and per sample; 'lineage.df.list' returns a list of lineage dataframes - unique sequences per clonotype,
string - parallelization method to be used to accelerate computations, 'none', 'mclapply', or 'parlapply'.
Nested list of tidytree dataframes or lineage dataframes.
if (FALSE) {
VDJ_phylogenetic_trees(VDJ=VDJ, sequence.type='VDJ.VJ',
trimmed=TRUE, as.nucleotide=TRUE, include.germline=TRUE,
additional.feature.columns=NULL, tree.level='intraclonal',
output.format='tree.df.list')
}