AntibodyForest takes the output of either ConvertStructure or CsvToDf or SubRepertoires or RemoveNets and outputs B cell phylogenetic networks in tree format. There is also the possibility to give the full-length list of clonal lineages, which contains both isotype and transcriptional cluster information, only when no prior data transformation is desired. Each network represents a clonal lineage, referring to the number of B cell receptor sequences originating from an independent V(D)J recombination event. Each vertex represents a unique recovered full-length variable heavy and light chain antibody sequence of a clonal family. Edges separating nodes are drawn given that clonal variants are similarly related according to their Levenshtein distance. Edge weights are extacted from the distance matrix apart from the special case of unmutated germline, in which the weights of outgoing edges from it are either set to 1 or to the difference between the corresponding distance from the matrix and the absolute value of the difference between the sequence lengths of germline and corresponding connected nodes. At tree building, starting from the reference ancestral germline, each node is connected to nodes that can be reached via the minimum distance based on the distance matrix calculation. Therefore, potential edges that go back to previous tree layers along with bidirectional circles are eliminated. Polytomies, displayed by B cell clones producing multiple distinct offsprings, are resolved in case of reaching nodes with equal minimum distance. Indeed, the algorithm removes edges either randomly from the recipient nodes,based on the node closest or farthest from the germline, considering the number of intermediate nodes or edge path length, or the highest/lowest counting of cells on the present node. Additional ties are settled by random edge selection. Consequently, parsimony holds, meaning that each daughter node has only one parent. Distinct tree topologies enable to visually investigate the trade-off between balance and evolution, and further quantify the amount of diversification of the subsequent detected clonal abundant clones during somatic hypermutation and class switching. The minimum decision-based criterion determines the amount of balance presented in the tree, while the maximum decision-based method the amount of evolution presented in the tree. Single color or color distribution on each node demonstrates the proportion of B cells with the specific isotype(s) or transcriptional cluster(s), while setting the size of vertices can be performed based on the number of unique sequences per clone, vertex betweenness and vertex closeness. Scaling of nodes by their relative clonal expansion assists in pinpointing identical antibody sequences across a multitude of B cells. Node labeling can depict clonal frequency.

AbForests_AntibodyForest(
  full_list,
  csv,
  files,
  distance_mat,
  clonal_frequency,
  scaleByClonalFreq,
  weight,
  tie_flag,
  scaleBybetweenness,
  scaleByclocloseness_metr,
  opt,
  random.seed,
  alg_opt,
  cdr3
)

Arguments

full_list

a list of clone lineages, represented as data.frames

csv

an indicator variable. TRUE if full_list argument is a list of csv files, FALSE otherwise

files

a list of data.frames. Each data.frame contains 2 columns, one that describes the sequences and the other which type of information (isotype or cluster) is considered in the analysis. All these cases are determined by the user.

distance_mat

a custom integer distance matrix, or NULL for using the default distance matrix (calucated based on the levenshtein distance, which counts the number of mutations between sequences).

clonal_frequency

a logical variable, TRUE if labeling of vertices is based on clonal frequency and FALSE otherwise.

scaleByClonalFreq

logical variable with TRUE if vertex size is scaled by the number of unique sequences per clone and FALSE otherwise.

weight

logical variable. When its value is FALSE, then the weights of outgoing edges from Germline node are set to 1. When its value is TRUE, the weights are set to the difference between the number of mutations among sequences in germline and connected nodes(value in the corresponding distance matrix) and the absolute value of the difference between the sequence lengths of germline and corresponding connected nodes. In both cases, weights of remaining edges are extracted from the distance matrix. Outgoing edges from Germline represent the number of mutations of sequences having as common ancestor the Germline.

tie_flag

a string, with options 'rand', 'full', 'close_to_germ', 'far_from_germ', 'close_path_to_germ', 'far_path_from_germ','most_expanded' and 'least_expanded' for removing edges when equal distance (tie) in distance matrix. 'rand' means random pruning in one of nodes, 'full' means keeping all nodes, close_to_germ means pruning of node(s) farthest from germline (based on number of intermediate nodes), 'far_from_germ' means pruning of node(s) closest to germline (based on number of intermediate nodes), 'close_path_to_germ' means pruning of node(s) farthest from germline (based on edge path length), 'far_path_from_germ' meams pruning of node(s) closest to germline (based on edge path length),'most_expanded' means pruning of node(s) with the lowest B cell count(clonal frequency) and least_expanded, which means pruning of node(s) with the hightest B cell count(clonal frequency). In cases of subsequent ties, a random node is selected.

scaleBybetweenness

logical variable with TRUE if vertex size is scaled by the vertex betweenness centrality.

scaleByclocloseness_metr

logical variable with TRUE if vertex size is scaled by closeness centrality of vertices in graph.

opt

a string with options "isotype" and "cluster". The option "isotype" is utilized when the user desires to do an isotype analysis, while the selection of "cluster" denotes that an analysis based on transcriptome is requested.

random.seed

a random seed, specified by the user, when random sampling of sequences happens in each of the cases described in tie_flag argument.

alg_opt

a string denoting the version of the edge selection algorithm used in the construction of networks. Possible choices: "naive", "two-step".

cdr3

variable with values 0 if the user desires to select full length sequences (only when the input is a list of csv files), 1 for sequences in the CDR3 only (only when the input is a list of csv files) and NULL otherwise.

Value

graphs. A list of lists. E.g graphs[[1][[1]] network: an igraph object, containing the first network in tree format. graphs[[1]][[2]] legend: contains the legend parameters of the first network. graphs[[1]][[3]] count.rand: contains the number of randomly considered nodes for the first network. graphs[[1]][[4]] adj.matrix: contains the adjacency matrix for the first network. graphs[[1]][[5]] distance.matrix: contains the distance matrix for the first network. graphs[[1]][[6]] cells.per.network: contains the number of cells for the first network. graphs[[1]][[7]] variants.per.network: contains the number of variants for the first network. graphs[[1]][[8]] variant.sequences: contains the sequences of the variants for the first network. graphs[[1]][[9]] cells.per.variant: contains the number of cells per variant (clonal frequency) for the first network. graphs[[1]][[10]] cell.indicies.per.variant: the indices of cells per variant for the first network. graphs[[1]][[11]] new.variant.names: contains the names of variants for the first network. graphs[[1]][[12]] germline.index: contains the index of germline sequence for the first network. graphs[[1]][[13]] isotype.per.variant: contains the isotypes corresponding to each variant for the first network. graphs[[1]][[14]] transcriptome.cluster.per.variant: contains the transcriptional clusters corresponding to each variant for the first network. graphs[[1]][[15]] isotype.per.cell: contains the isotype corresponding to each cell for the first network. graphs[[1]][[16]] transcriptome.cluster.per.cell: contains the transcriptional cluster corresponding to each cell for the first network.

See also

ConvertStructure, CsvToDf, SubRepertoires, RemoveNets

Examples

if (FALSE) {
AbForests_AntibodyForest(full_list = Platypus::new,csv=FALSE, files,clonal_frequency=TRUE,
scaleByClonalFreq=TRUE,weight=TRUE,tie_flag='close_to_germ',
scaleBybetweenness=FALSE,scaleByclocloseness_metr=FALSE,
opt="cluster",alg_opt="0",cdr3=NULL)
}