Creates phylogenetic trees as tidytree dataframes from an input VDJ dataframe. The resulting phylogenetic trees can be plotted using VDJ_phylogenetic_trees_plot. Both of these functions require the tidytree and ggtree packages.

VDJ_phylogenetic_trees(
  VDJ,
  sequence.type,
  as.nucleotide,
  trimmed,
  include.germline,
  global.clonotype,
  VDJ.VJ.1chain,
  additional.feature.columns,
  filter.na.columns,
  maximum.lineages,
  minimum.sequences,
  maximum.sequences,
  tree.algorithm,
  tree.level,
  n.trees.combined,
  germline.scale.factor,
  output.format,
  parallel
)

Arguments

VDJ

VDJ or VDJ.GEX.matrix[[1]] object, as obtained from the VDJ_GEX_matrix function in Platypus.

sequence.type

string - sequences which will be used when creating the phylogenetic trees. 'cdr3' for CDR3s of both VDJs and VJs, 'cdrh3' for VDJ CDR3s, 'VDJ.VJ' for pasted full sequences of both VDJ and VJ, 'VDJ' for full VDJ sequences, 'VJ' for full VJ.

as.nucleotide

boolean - if T, will only consider the DNA sequences specified by sequence.type, else it will consider the amino acid ones.

trimmed

boolean - in the case of full VDJ or VJ nt sequences, if the trimmed sequences should be consider (trimmed=T), or raw ones. You need to call MIXCR first on the VDJ dataframe using VDJ_call_MIXCR().

include.germline

boolean - if T, a germline sequence will be included in the trees (root), obtained by pasting the VDJ_trimmed_ref and VJ_trimmed_ref sequences. You need to call MIXCR first on the VDJ dataframe using VDJ_call_MIXCR().

global.clonotype

boolean - if T, will ignore samples from the sample_id column, creating global clonotypes.

VDJ.VJ.1chain

boolean - if T, will remove aberrant cells from the VDJ matrix.

additional.feature.columns

list of strings or NULL - VDJ column names which will comprise the per-sequence features to be included in the tidytree dataframe, which will be used to label nodes/ determines their color/ size etc. See also the VDJ_phylogenetic_trees_plot function.

filter.na.columns

list of strings - VDJ columns names: if a phylogenetic tree/tidytree dataframe has all elements = NA in that feature, that tree will be completely removed.

maximum.lineages

integer or 'all' - maximum number of clonotypes to create trees for. If 'all', will create trees for all clonotypes.

minimum.sequences

integer - lower bound of sequences for a tree. Defaults to 3. Trees with a lower number will be automatically removed.

maximum.sequences

integer - upper bound of sequences for a tree. Additional sequences will be removed, after being ordered by their total frequency.

tree.algorithm

string - the algorithm used when constructing the phylogenetic trees. 'nj' for Neighbour-Joining, 'bionj', 'fastme.bal', and 'fastme.ols'

tree.level

string - level at which to build phylogenetic trees. 'intraclonal' - tree per clonotype, per sample, 'global.clonotype' - global clonotype trees (include.germline must be F), irrespective of sample, 'combine.first.trees' will combine the trees for the most expanded clonotypes, per sample (include.germline must be F).

n.trees.combined

integer - number of trees to combine if tree.level='combine.first.trees'.

germline.scale.factor

numeric - as germlines are incredibly distant from their closest neighbours (in the tree), this controls the scale factor for the germline tree branch length for more intelligible downstream plotting.

output.format

string - 'tree.df.list' returns a nested list of tidytree dataframes, per clonotype and per sample; 'lineage.df.list' returns a list of lineage dataframes - unique sequences per clonotype,

parallel

string - parallelization method to be used to accelerate computations, 'none', 'mclapply', or 'parlapply'.

Value

Nested list of tidytree dataframes or lineage dataframes.

Examples

if (FALSE) {
VDJ_phylogenetic_trees(VDJ=VDJ, sequence.type='VDJ.VJ',
trimmed=TRUE, as.nucleotide=TRUE, include.germline=TRUE,
additional.feature.columns=NULL, tree.level='intraclonal',
output.format='tree.df.list')
}