For prediction of antibody structures from a big data set it might be of interest to select the most expanded clonotypes for prediction. This function can select the top most expanded clonotypes based on the desired clone strategy. Among the most expanded clonotypes the cells are ranked according to the UMI count and then the top unique sequences are selected to use for prediction. The function's input is the Platypus VGM object. In order to integrate UMI counts to the data, the raw data which is the output of the PlatypusDB_fetch() function is needed in addition. From the selected clonotypes the germline reference sequences are obtained by calling MIXCR. This requires a local installation of MIXCR on your computer. !FOR WINDOWS USERS THE EXECUTABLE MIXCR.JAR HAS TO PRESENT IN THE CURRENT WORKING DIRECTORY !
The output of the VDJ_select_clonotypes function can directly be used for structure prediction by the AlphaFold_prediction() function.
VDJ_select_clonotypes(
VGM,
raw.data,
clone.strategy,
VDJ.VJ.1chain,
donut.plot,
clonotypes.per.sample,
top.clonotypes,
seq.per.clonotype,
mixcr.directory,
species,
platypus.version,
operating.system,
simplify
)
The platypus vgm object his used as an input for the function.
In order to integrate the UMI counts per cell, the raw data has to be specified as a second input to the function which is the output of the PlatypusDB_fetch() function.
The desired clone strategy can be specified as a string. Possible options are 10x.default, cdr3.nt, cdr3.aa, VDJJ.VJJ, VDJJ.VJJ.cdr3length, VDJJ.VJJ.cdr3length.cdr3.homology, VDJJ.VJJ.cdr3length.VDJcdr3.homology, cdr3.homology, VDJcdr3.homology. 10x.default is used as default. cdr3.aa will convert the default cell ranger clonotyping to amino acid bases. 'VDJJ.VJJ' groups B cells with identical germline genes (V and J segments for both heavy chain and light chain. Those arguments including 'cdr3length' will group all sequences with identical VDJ and VJ CDR3 sequence lengths. Those arguments including 'cdr3.homology' will additionally impose a homology requirement for CDRH3 and CDRL3 sequences.'CDR3.homology',or 'CDRH3.homology' will group sequences based on homology only (either of the whole CDR3 sequence or of the VDJ CDR3 sequence respectively). All homology calculations are performed on the amino acid level.
If the VDJ.VJ.1chain argument is set to TRUE only cells with one VDJ and one VJ sequences are included in the selection.
If set to TRUE a donut plot for visualization of the clonotypes is returned.
By default the top clonotypes are selected per sample. If the top clonotypes over all samples are desired the clonotypes.per.sample argument can be set to FALSE.
Specify the number of top clonotypes that will be selected either per sample if clonotypes.per.sample = T or over all if clonotypes.per.sample = F.
Specify the number of unique sequences per clonotype that are selected. The clonotypes are ordered according to UMI expression.
The path to the directory containing an executable version of MIXCR.
Either "mmu" for mouse or "hsa" for human. These use the default germline genes for both species contained in MIXCR. Default is set to "hsa".
Character. Defaults to "v3". Can be "v2" or "v3" dependent on the input format
Can be either "Windows", "Darwin" (for MAC) or "Linux". If left empty this is detected automatically
Only relevant when platypus.version = "v3". Boolean. Defaults to TRUE. If FALSE the full MIXCR output and computed SHM column is appended to the VDJ If TRUE only the framework and CDR3 region columns and computed SHM column is appended. To discriminate between VDJ and VJ chains, prefixes are added to all MIXCR output columns
ADD DESCRIPTION OF RETURN VALUE HERE