The Platypus family of packages are meant to provide potential pipelines and examples relevant to the broad field of computational immunology. The core set of functions can be found at https://github.com/alexyermanos/Platypus and examples of use can be found in the publications https://doi.org/10.1093/nargab/lqab023 and insert biorxiv database manuscript here, insert biorxiv AntibodyForests manuscript here insert biorxiv Steropodon manuscript here.
Steropodon is a pipeline developed for the local large-scale inference of immune receptor structures from a Platypus VGM object and the comparative analysis across multiple repertoires. Steropodon employs several existing deep-learning methods for folding protein structures, with some specialized for antibody structures: IgFold, DeepAb, OmegaFold, and ColabFold are currently supported. Moreover, in expanding the Steropodon pipeline, we plan to add more models as made available by their developers (e.g., ESMFold, HelixFold, OpenFold, FastFold, etc.). As most models are based on protein language models (pLMs) and do not require performing MSA as done by AlphaFold, Steropodon allows for a faster local inference and a lighter package without pre-downloading any sequence databases. For example, AlphaFold’s sequence MSA databases, which add up to 1TB of memory, are not required to be installed, as Steropodon uses ColabFold as an AlphaFold proxy, which performs the MSA step on a separate server. Therefore, Steropodon allows users to quickly fold several structures across multiple repertoires on a local setup with minimal install requirements and perform exploratory analyses in an expedite manner.
As such, Steropodon also integrates a suite of methods for sequence alignment, structural superposition, obtaining structural features, multimodal integration of structural features and gene expression information (GEX/Seurat object), docking and complex analysis (e.g., obtaining the binding interface, epitope and paratope annotations added to the VGM object), refining existing structures and predicting CDRs (via OpenMM and ABlooper, respectively), comparing structures using a wide range of distance metrics, clustering structures, calculating physico-chemical properties and visualizing them across samples or clonotypes, and, finally, saving the structures as PDB files.
In this vignette, we will first showcase the folding models included in Steropodon and visualizing the resulting structures, then move to the comparative analyses modules: superposing structures, calculating distances between sets of coordinates, obtaining structural features and integrating them with GEX information, calculating physicochemical properties, docking and complex analysis, clustering, refinement, trimming, and saving. This vignette will present, therefore, a comprehensive introduction into the aims of Steropodon and its uses. To quickly parse through this text, we recommend first reading the Steropodon_model presentation and understanding the Steropodon object, as well as the VGM object (sections 2 and 3), then ski;pping to the section of interest for your analysis case (e.g., section 5 for RMSD distances across repertoires via Steropodon_distances).
library(Platypus)
##
## Attaching package: 'Platypus'
## The following object is masked _by_ '.GlobalEnv':
##
## VDJ_call_MIXCR
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.5 ✔ purrr 0.3.4
## ✔ tibble 3.1.6 ✔ dplyr 1.0.7
## ✔ tidyr 1.1.4 ✔ stringr 1.4.0
## ✔ readr 2.1.1 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(foreach)
##
## Attaching package: 'foreach'
## The following objects are masked from 'package:purrr':
##
## accumulate, when
library(reticulate)
#Fetch only samples 1 (bone marrow, 3 months-old mice), 6 (bone marrow, 12 months-old), 9 (bone marrow, 18 months-old), 12 (spleen, 3 months-old mice), 17 (spleen, 12 months-old mice), and 20 (spleen, 18 months-old mice) - one sample from every group.
PlatypusDB_fetch(PlatypusDB.links = c("agrafiotis2021a/TNFR2.BM.3m.S1/ALL",
"agrafiotis2021a/TNFR2.BM.12m.S6/ALL",
"agrafiotis2021a/TNFR2.BM.18m.S9/ALL",
"agrafiotis2021a/TNFR2.SP.3m.S12/ALL",
"agrafiotis2021a/TNFR2.SP.12m.S17/ALL",
"agrafiotis2021a/TNFR2.SP.18m.S20/ALL"
), load.to.enviroment = T, combine.objects = T)
#Integrate samples into a single VGM object
<- VDJ_GEX_matrix(Data.in = list(agrafiotis2021a_TNFR2.BM.3m.S1_VDJGEXdata,
VGM .12m.S6_VDJGEXdata,
agrafiotis2021a_TNFR2.BM.18m.S9_VDJGEXdata,
agrafiotis2021a_TNFR2.BM.3m.S12_VDJGEXdata,
agrafiotis2021a_TNFR2.SP.12m.S17_VDJGEXdata,
agrafiotis2021a_TNFR2.SP.18m.S20_VDJGEXdata
agrafiotis2021a_TNFR2.SP
),parallel.processing = "mclapply", trim.and.align = T)
The output of the VDJ_GEX_matrix is the main object for all downstream functions in Platypus, as well as Steropodon. We can create this directly into the R session by using the public data available on PlatypusDB. Using PlatypusDB_fetch, we were able to download only a subset of our dataset.
The dataset we will be focusing on throughout this vignette comprises 22 total mouse single-cell repertoires (with both sequenced receptors and transcriptome information) following immunizations with the tumor necrosis factor 2 (TNFR2) - https://www.biorxiv.org/content/10.1101/2021.11.09.467876v2: 5 bone-marrow samples from 3-month old mice (s1-s5), 3 bone-marrow samples from 12-month old mice (s6-s8), 3 bone-marrow samples from 18-month old mice (s9-s11), 5 spleen samples from 3-month old mice (s12-s16), 3 spleen samples from 12-month old mice (s17-s19), and 3 spleen samples from 18-month old mice (s17-s19). For a faster computation time and to potentially observe a greater structural diversity, we have decided to select a single sample from each group (s1, s6, s9, s12, s7, and s20 were selected) when creating our VGM object.
#Annotate the VDJ and GEX objects with the site and age information
1]]$site <- NA
VGM[[1]]$age <- NA
VGM[[
1]]$site[VGM[[1]]$sample_id %in% c('s1', 's2', 's3')] <- 'BM'
VGM[[1]]$site[VGM[[1]]$sample_id %in% c('s4', 's5', 's6')] <- 'SP'
VGM[[1]]$age[VGM[[1]]$sample_id %in% c('s1', 's4')] <- '3 months'
VGM[[1]]$age[VGM[[1]]$sample_id %in% c('s2', 's5')] <- '12 months'
VGM[[1]]$age[VGM[[1]]$sample_id %in% c('s3', 's6')] <- '18 months'
VGM[[
2]]@meta.data$site <- NA
VGM[[2]]@meta.data$age <- NA
VGM[[
2]]@meta.data$site[VGM[[2]]@meta.data$sample_id %in% c('s1', 's2', 's3')] <- 'BM'
VGM[[2]]@meta.data$site[VGM[[2]]@meta.data$sample_id %in% c('s4', 's5', 's6')] <- 'SP'
VGM[[2]]@meta.data$age[VGM[[2]]@meta.data$sample_id %in% c('s1', 's4')] <- '3 months'
VGM[[2]]@meta.data$age[VGM[[2]]@meta.data$sample_id %in% c('s2', 's5')] <- '12 months'
VGM[[2]]@meta.data$age[VGM[[2]]@meta.data$sample_id %in% c('s3', 's6')] <- '18 months' VGM[[
We have annotated the VDJ and GEX objects (parts of the VGM object, see the Platypus introductory vignette!) with the site and age information, which will be used in the downstream plots and analyses.
Lastly, we will check whether our annotations are viable in both the VDJ and GEX, by plotting the richness of CDR3 sequences per age and the UMAP projection per site, for each object respectively.
#Check the VDJ annotations
::VDJ_diversity(VGM[[1]], feature.columns = c('VDJ_cdr3s_aa', 'VJ_cdr3s_aa'), grouping.column = 'age', metric = 'richness') Platypus
## Used group 3 months as a reference for subsampling with 5193 entries
## Subsampled group 12 months to 5193 entries
## Subsampled group 18 months to 5193 entries
#Check the GEX annotations
::DimPlot(VGM[[2]], group.by = 'site') Seurat
Next, we will move to the main function in the Steropodon framework: Steropodon_model, which outputs the Steropodon objects used in the downstream analyses. Steropodon_model employs several deep-learning models for protein folding, mostly relying on alignment/MSA-free models for a lighter local setup and, usually, faster inference. For example, IgFold, OmegaFold, and DeepAb make use of protein language model-based (pLM) architectures, completely avoiding the MSA step used in models such as AlphaFold, whereas ColabFold uses a public server for MSA queries. These can be installed directly on a local machine using Reticulate and called with a minimal download requirement. Moreover, by specifying an environment name for either conda or Python’s virtualenv, Steropodon will install all dependencies of a given model when first calling Steropodon_model. If this fails, we recommend following the installation instructions for your desired protein folding model and then using that environment as input to env.name. For ColabFold, we recommend first installing following these instructions https://github.com/YoshitakaMo/localcolabfold, as Steropodon_model requires the colabfold_batch command to be exported to your path variable.
Therefore, we have highlighted the main design philosophy of the Steropodon pipeline: it can be directly used with a single call on a local machine - it does not require extensive installations of either libraries or sequence databases. This enables the fast exploratory analysis of a given repertoire’s structural features and, most importantly, allows researchers to employ this pipeline on their local machines. Of course, this comes with a computational cost, and the number of structures that can be folded in a reliable amount of time depends on each individual machine. However, Steropodon can also be set up on a cluster, while the gpu parameter in Steropodon_model will enable GPU usage and faster inference.
As Steropodon uses MIXCR to determine variable regions and productive sequences, we first must call VDJ_call_MIXCR on our VDJ object!
#Input your mixcrtool directory
<- '../tools/mixcrtool'
mixcr_directory
<- VGM[[1]]
VDJ <- VDJ_call_MIXCR(VDJ = VDJ, mixcr.directory = mixcr_directory, species = 'mmu', simplify = F)
VDJ
#VGM[[1]] <- VDJ
#saveRDS(VGM, 'custom_tnfr2.rds')
IgFold (https://www.biorxiv.org/content/10.1101/2022.04.20.488972v1) predicts antibody structures (either heavy, light, or both chains) from a given sequence by first converting the amino-acid tokens into a vector embedding by using the BERT-based AntiBERTy antibody language model, then using several graph transformer and invariant point attention layers to predict the spatial coordinates from these embeddings. Its inference time is demonstrably faster compared to AlphaFold’s, with its prediction accuracy coming close to the mentioned SOTA (with the worse overall prediction in the CDR3 regions).
IgFold will be installed with all its prerequisites and its weights will be downloaded when first calling Steropodon_model with model = ‘igfold’. Moreover, several other parameters for inference can be added in the additional.model.parameters parameter (such as whether structures should be refined using OpenMM: additional.model.parameters = list(igfold.refine = T, igfold.use.openmm = T)), all documented in the call_igfold subroutine.
<- Steropodon_model(VDJ, model = 'igfold', sequence.type = 'VDJ.VJ',
steropodon_igfold max.clonotypes = 10, max.per.clonotype = 1,
additional.model.parameters = list(igfold.refine = T, igfold.use.openmm = T),
save.rds = T, save.dir = './steropodon_RDS',
use.conda = T, env.name = 'vignette_env'
)
$s1 steropodon_igfold
## $clonotype1
## $clonotype1$`1`
## Steropodon object
## Sample: s1 Clonotype: clonotype1 Rank: 1
## Number of cells: 587
## Structures available: main, H, L, CDRH3, CDRL3
## Sequence: EVKLVESEGGLVQPGSSMKLSCTASGFTFSDYYMAWVRQVPEKGLEWVANINYDGSSTYYLDSLKSRFIISRDNAKNILYLQMSSLKSEDTATYYCARDLDYFDYWGQGTTLTVSSDVVVTQTPLSLPVSFGDQVSISCRSSQSLANSYGNTYLSWYLHKPGQSPQLLIYGISNRFSGVPDRFSGSGSGTDFTLKISTIKPEDLGMYYCLQGTHQPRTFGGGTKLEIK
##
##
##
##
## $clonotype2
## $clonotype2$`1`
## Steropodon object
## Sample: s1 Clonotype: clonotype2 Rank: 1
## Number of cells: 211
## Structures available: main, H, L, CDRH3, CDRL3
## Sequence: EVQLLETGGGLVQPGGSRGLSCEGSGFTFSGFWMSWVRQTPGKTLEWIGDINSDGSAINYAPSIKDRFTIFRDNDKSTLYLQMSNVRSEDTATYFCMRYGSSYWYFDVWGTGTTVTVSSDIKMTQSPSSMYASLGERVTITCKASQDIKSYLSWYQQKPWKSPKTLIYYATSLADGVPSRFSGSGSGQDYSLTISSLESDDTATYYCLQHGESPYTFGGGTKLEIK
##
##
##
##
## $clonotype3
## $clonotype3$`1`
## Steropodon object
## Sample: s1 Clonotype: clonotype3 Rank: 1
## Number of cells: 92
## Structures available: main, H, L, CDRH3, CDRL3
## Sequence: EVQLLETGGGLVQPGGSRGLSCEGSGFTFSGFWMSWVRQTPGKTLEWIGDINSDGSAINYAPSIKDRFTIFRDNDKSTLYLQMSNVRSEDTATYFCMRYSNYWYFDVWGTGTTVTVSSDIKMTQSPSSMYASLGERVTITCKASQDIKSYLSWYQQKPWKSPKTLIYYATSLADGVPSRFSGSGSGQDYSLTISSLESDDTATYYCLQHGESPYTFGGGTKLEIK
##
##
##
##
## $clonotype4
## $clonotype4$`1`
## Steropodon object
## Sample: s1 Clonotype: clonotype4 Rank: 1
## Number of cells: 147
## Structures available: main, H, L, CDRH3, CDRL3
## Sequence: QVQLKQSGAELVRPGASVKLSCKASGYTFTDYYINWVKQRPGQGLEWIARIYPGSGNTYYNEKFKGKATLTAEKSSSTAYMQLSSLTSEDSAVYFCARDDGYYPYYFDYWGQGTTLTVSSDIVLTQSPASLAVSLGQRATISCKASQSVDYDGDSYMNWYQQKPGQPPKLLIYAASNLESGIPARFSGSGSGTDFTLNIHPVEEEDAATYYCQQSNEDPYTFGGGTKLEIK
##
##
##
##
## $clonotype5
## $clonotype5$`1`
## Steropodon object
## Sample: s1 Clonotype: clonotype5 Rank: 1
## Number of cells: 58
## Structures available: main, H, L, CDRH3, CDRL3
## Sequence: EVQLLETGGGLVQPGGSRGLSCEGSGFTFSGFWMSWVRQTPGKTLEWIGDINSDGSAINYAPSIKDRFTIFRDNDKSTLYLQMSNVRSEDTATYFCMRYPGSSYWYFDVWGTGTTVTVSSDIKMTQSPSSMYASLGERVTITCKASQDIKSYLSWYQQKPWKSPKTLIYYATSLADGVPSRFSGSGSGQDYSLTISSLESDDTATYYCLQHGESPYTFGGGTKLEIK
##
##
##
##
## $clonotype6
## $clonotype6$`1`
## Steropodon object
## Sample: s1 Clonotype: clonotype6 Rank: 1
## Number of cells: 54
## Structures available: main, H, L, CDRH3, CDRL3
## Sequence: EVQLLETGGGLVQPGGSRGLSCEGSGFTFSGFWMSWVRQTPGKTLEWIGDINSDGSAINYAPSIKDRFTIFRDNDKSTLYLQMSNVRSEDTATYFCMRYGSSWYFDVWGTGTTVTVSSDIKMTQSPSSMYASLGERVTITCKASQDIKSYLSWYQQKPWKSPKTLIYYATSLADGVPSRFSGSGSGQDYSLTISSLESDDTATYYCLQHGESPYTFGGGTKLEIK
##
##
##
##
## $clonotype7
## $clonotype7$`1`
## Steropodon object
## Sample: s1 Clonotype: clonotype7 Rank: 1
## Number of cells: 4
## Structures available: main, H, L, CDRH3, CDRL3
## Sequence: EVQLQQSGPDLVKPGASVKISCRASGYTFTDYNMDWVKQRHGTSLEWIGDINPRNGGTIYDQKFKGKATLTVDKSSSTAYMELRSLTSEDTAVYYCARLTGTSYDAMSHWGQGTSVTVSSDIQMTQSPASLSASVGETVTITCRPSENIYSYLAWYQQKQGKSPQLLVYNGKILAEGVPSRFSGSGSGTHFSLRINSLQPEDFGTYYCQHHYGVPRTFGGGSKLEIE
##
##
##
##
## $clonotype8
## $clonotype8$`1`
## Steropodon object
## Sample: s1 Clonotype: clonotype8 Rank: 1
## Number of cells: 6
## Structures available: main, H, L, CDRH3, CDRL3
## Sequence: EMKFLQSGGGLVQPGGSLRLSCAASGIDFSRYWMSWVRRAPGKGLEWIGEINPDSSTINYAPSLKDKFIISRDNAKNTLYLQMSKVRSEDTALYYCARHYYGRGGHFDVWGTGTTVTVSSQIVLSQSPAIMSASLGERVTMTCTASSSVSSTYFHWYQQKPGSSPKLWIYGTSNLASGVPTRFSGSGSGTSYSLTISSMEAEDAATYYCHQYHRSPLTFGAGTKLELK
##
##
##
##
## $clonotype9
## $clonotype9$`1`
## Steropodon object
## Sample: s1 Clonotype: clonotype9 Rank: 1
## Number of cells: 8
## Structures available: main, H, L, CDRH3, CDRL3
## Sequence: EVQLVESGGGLVKPGGSLKLSCAASGFTFSDYGMHWVRQAPEKGLEWVVYISSGSSIIYYADTVKGRFTISRDNAKNTLFLQMTSLRSEDTAMYYCARSYYYTMDYWGQGTSVTVSSDIVMTQSPSSLAMSVGQKVTMSCKSSQSLLNSSNQKNYLAWYQQKPGQSPKLLVYFASTRESGVPDRFIGSGSGTDFTLTISSVQAEDLADYFCQQHYGFPFTFGSGTKLEIK
##
##
##
##
## $clonotype10
## $clonotype10$`1`
## Steropodon object
## Sample: s1 Clonotype: clonotype10 Rank: 1
## Number of cells: 38
## Structures available: main, H, L, CDRH3, CDRL3
## Sequence: EVKLVESGGGLVQSGRSLRLSCATSGFTFSDFYMEWVRQAPGKGLEWIAASRNKANDYTTEYSASVKGRFIVSRDTSQSILYLQMNALRAEDTAIYYCARDGYDWYFDVWGTGTTVTVSSDIQMNQSPSSLSASLGDTITITCHASQNINVWLSWYQQKPGNIPKLLIYKASNLHTGVPSRFSGSGSGTGFTLTISSLQPEDIATYYCQQGQSYPFTFGSGTKLEIK
As we have seen, only the most frequent sequence from the most expanded 10 clonotypes has been selected, per each sample. The sequence.type parameter dictates whether to fold full structures (both heavy and light chains, sequence.type = ‘VDJ.VJ’), heavy chains only (sequence.type = ‘VDJ’), or only light chains (sequence.type = ‘VJ’). The resulting nested list of objects can be saved as an RDS if save.rds = T in the directory specified in save.dir.
We can also plot individual structures using Steropodon_visualize. By default, the VDJ and VJ CDR regions will be highlighted: this can be customized by changing the color.by and additional.color.features parameters, as we will see shortly. The r3dmol R package must be installed!
$s1$clonotype1$`1` %>% Steropodon_visualize(additional.r3dmol.code = 'r3dmol::m_png(width = 850, height = 500)') steropodon_igfold
<- Steropodon_model(VDJ, model = 'omegafold', model.folder = '/Users/tudorcotet/OmegaFold',
steropodon_omegafold sequence.type = 'VDJ',
max.clonotypes = 1, max.per.clonotype = 1,
save.rds = T, save.dir = './steropodon_RDS',
use.conda = T, env.name = 'vignette_env'
)
## Note: Using an external vector in selections is ambiguous.
## ℹ Use `all_of(VDJ_regions)` instead of `VDJ_regions` to silence this message.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
## Finished folding using omegafold
$s1$clonotype1$`1` %>% Steropodon_visualize(additional.r3dmol.code = 'r3dmol::m_png(width = 850, height = 500)') steropodon_omegafold
As OmegaFold can only model a single chain, the main structure in the structure slot of our Steropodon object corresponds to the heavy chain only.
As we have seen above, the Steropodon_model function outputs a nested list of Steropodon objects: the first nested layer is determined by the sample ID, followed by clonotype and by the sequence rank per clonotype (rank 1 signifies the most frequent sequence/structure in a given clonotype from a specific sample).
The Steropodon object is a container of multiple structures and analysis results: it contains the main structure (in our case, the full-chain antibodies modelled by IgFold) in the structure slot, the heavy chain structure in the H slot, light chain in L, as well as the CDRH3 and CDRL3 regions in their respective slots. All structures are saved as bio3d’s pdb objects, therefore all bio3d methods are compatible with Steropodon.
Lastly, each structure is assigned a unique ID, determined by the structure_id slot. The unique cell barcodes per each structure are also saved in the barcodes slot.
#The heavy chain pdb object from bio3d
$s1$clonotype1$`1`@H steropodon_igfold
##
## Call: trim.pdb(pdb = pdb, inds = idx)
##
## Total Models#: 1
## Total Atoms#: 567, XYZs#: 1701 Chains#: 1 (values: VDJ)
##
## Protein Atoms#: 567 (residues/Calpha atoms#: 115)
## Nucleic acid Atoms#: 0 (residues/phosphate atoms#: 0)
##
## Non-protein/nucleic Atoms#: 0 (residues: 0)
## Non-protein/nucleic resid values: [ none ]
##
## Protein sequence:
## EVKLVESEGGLVQPGSSMKLSCTASGFTFSDYYMAWVRQVPEKGLEWVANINYDGSSTYY
## LDSLKSRFIISRDNAKNILYLQMSSLKSEDTATYYCARDLDYFDYWGQGTTLTVS
##
## + attr: atom, helix, sheet, seqres, xyz,
## calpha, call
#The atom dataframe from the bio3d object
head(steropodon_igfold$s1$clonotype1$`1`@H$atom)
## type eleno elety alt resid chain resno insert x y z o b
## 1 ATOM 1 N <NA> GLU VDJ 1 <NA> -17.259 8.461 -1.762 1 0.42
## 2 ATOM 2 CA <NA> GLU VDJ 1 <NA> -15.838 8.544 -1.447 1 0.42
## 3 ATOM 3 C <NA> GLU VDJ 1 <NA> -15.273 7.177 -1.086 1 0.42
## 4 ATOM 4 CB <NA> GLU VDJ 1 <NA> -15.602 9.529 -0.312 1 0.42
## 5 ATOM 5 O <NA> GLU VDJ 1 <NA> -15.942 6.369 -0.441 1 0.42
## 6 ATOM 6 N <NA> VAL VDJ 2 <NA> -14.042 6.865 -1.477 1 0.37
## segid elesy charge region
## 1 <NA> <NA> <NA> FR1
## 2 <NA> <NA> <NA> FR1
## 3 <NA> <NA> <NA> FR1
## 4 <NA> <NA> <NA> FR1
## 5 <NA> <NA> <NA> FR1
## 6 <NA> <NA> <NA> FR1
We can see that Steropodon automatically renames each chain as either VDJ or VJ, and it also add the variable regions annotations in the region column
unique(steropodon_igfold$s1$clonotype1$`1`@H$atom$region)
## [1] "FR1" "CDR1" "FR2" "CDR2" "FR3" "CDR3" "FR4"
We will only plot the heavy chain stucture, saved in the ‘H’ slot of our Steropodon object.
#Plotting the heavy chain structure
$s1$clonotype1$`1` %>% Steropodon_visualize(structure = 'H', additional.r3dmol.code = 'r3dmol::m_png(width = 850, height = 500)') steropodon_igfold
CDR3 regions are saved in the ‘CDRH3’ and ‘CDRL3’ slots - we have also colored this structure by residues, by setting color.by to ‘resid’.
#Plotting the CDRH3 structure
$s1$clonotype1$`1` %>% Steropodon_visualize(structure = 'CDRH3', color.by = 'resid', specific.color.values = NULL, additional.r3dmol.code = 'r3dmol::m_png(width = 850, height = 500)') steropodon_igfold
As we have seen above, the Steropodon_visualize function outputs and interactive plot of the desired structure (or a simple plot if additional.r3dmol.code = ‘r3dmol::m_png()’), taking as input a single Steropodon object, a list of objects, or a nested list as obtained from Steropodon_model.
Two or more structures can be visualized simultaneously using Steropodon_visualize if the input is a list. In this case, we will only visualize the heavy chains.
list(steropodon_igfold$s1$clonotype1$`1`, steropodon_igfold$s2$clonotype1$`1`) %>% Steropodon_visualize(structure = 'H', show.struct.label = T, show.label = F, additional.r3dmol.code = 'r3dmol::m_png(width = 850, height = 500)')
#show.label controls the feature labels determined by the color.by and specific.color.values parameters
#show.struct.label controls the unique indentifier of a structure (its index in the input list)
The color.by and specific.color.values paramaters control which residue-level features to be plotted. By default, color.by will plot the chain and variable regions pasted together (e.g., ‘VDJ_CDR3’).
We will next highlight each individual chain in our full-chain structure.
$s1$clonotype1$`1` %>% Steropodon_visualize(structure = 'structure', color.by = 'chain', specific.color.values = c('darkred' = 'VDJ', 'teal' = 'VJ'), additional.r3dmol.code = 'r3dmol::m_png(width = 850, height = 500)') steropodon_igfold
#Only coloring by chains, with new colors specified in specific.color.values (as a named vector of color = feature value).
The ‘region’ column in each PDB object separates each structure into its constituent framework ad complementarity-determining regions. These were assigned using the region annotations obtained from the MIXCR software.
In the following example, we have highlighted each region on the light chain.
<- readRDS('./steropodon_RDS/steropodon_object.rds')
steropodon_igfold $s1$clonotype1$`1` %>% Steropodon_visualize(structure = 'L', color.by = 'region', specific.color.values = NULL, additional.r3dmol.code = 'r3dmol::m_png(width = 850, height = 500)') steropodon_igfold
#If specific.color.values is set to NULL, Steropodon_visualize will automatically pick colors for each unique features from the grDevices::rainbow palette.
Steropodon_visualize will automatically plot continuous/numeric features that are specified in color.by, such as the ones obtained from Steropodon_properties (e.g., charge, hydrophobicity). We will also showcase this when presenting the Steropodon_properties function.
$s1$clonotype1$`1` %>% Steropodon_visualize(structure = 'structure', color.by = 'resno', additional.r3dmol.code = 'r3dmol::m_png(width = 850, height = 500)') steropodon_igfold
#Colored by the residue number/id
Steropodon includes two methods to perform structural superposition in the Steropodon_superposition function: a wrapper to the the Kabsch algorithm as implemented in the bio3d::fit.xyz function of coordinate-only superposition (structural alignment, indicated in the structure.superpose parameter), and a modified version of the iterative sequence and structure alignment method implemented in bio3d::struct.aln (sequence alignment is performed using MAFFT in our case, which should be installed following the instructions https://mafft.cbrc.jp/alignment/software/).
The superposition template can be indicated in the steropodon.template parameter - it should be a Steropodon object which should serve as template for aligning all subsequent structures in the nested list. If not specified, it will select the first Steropodon object from the nested list. Lastly, alignment can be done to an invariable core structure as obtained from the Steropodon_find_core function (if fit.to.core is set to T and Steropodon_find_core was called before/ the core slot of a Steropodon object contains a structure).
We recommend performing superposition as a structure pre-processing step before obtaining coordinate features from Steropodon_coordinates or before other comparisons. Superposition can also be performed internally in the Steropodon_distances function between each pair of structures, as we will see.
In the following code, we have performed structure-only superposition, then sequence alignment followed by the iterative superposition algorithm. We have also compared the superposition results by visualizing 2 structures in the same plot (first structure was also chosen as the superposition template).
#Structure-only alignment using the Kabsch algorithm
<- Steropodon_superposition(steropodon_igfold, sequence.structure.superpose = F, structure.superpose = T) superposed_struct
## MAC system detected
#Sequence and structure iterative alignment - the maximum number of fitting cycles can be specified in max.cycles, and the RMSD threshold between the 2 fitted structures in the cutoff parameter.
<- Steropodon_superposition(steropodon_igfold, sequence.structure.superpose = T, structure.superpose = F,
superposed_seq_struct max.cycles = 10, cutoff = 0.5)
## MAC system detected
Coordinate-only superposition via the Kabsch algorithm:
list(superposed_struct$s1$clonotype1$`1`, superposed_struct$s5$clonotype1$`1`) %>% Steropodon_visualize(structure = 'structure', show.struct.label = T, show.label = F)
Sequence alignment and iterative superposition:
list(superposed_seq_struct$s1$clonotype1$`1`, superposed_seq_struct$s5$clonotype1$`1`) %>% Steropodon_visualize(structure = 'structure', show.struct.label = T, show.label = F)
Structures from a nested list of Steropodon objects can be separated into multiple structures or trimmed into a single region, as determined by the grouping and specific.values parameters in Steropodon_trim.
For example, grouping = ‘chain’ and specific.values = ‘VDJ’ will only keep the heavy chain of our full-chain structures. These new structures can be used in obtaining coordinate features or in distance comparisons, as we will see.
$s1$clonotype1$`1` %>%
steropodon_igfoldSteropodon_trim(structure = 'structure', grouping = 'chain', specific.values = 'VDJ') %>% #Will modify the full-chain PDB in the 'structure' slot
Steropodon_visualize(structure = 'structure', color.by = 'chain', specific.color.values = NULL)
The combine.values parameter determines whether the regions inserted in specific.values should all be combined in the same final structure. For example, to obtain a new structure with only VDJ and VJ CDR regions we can set combine.values to T and combine.grouping to T (which will paste together all regions if the grouping parameter is a vector of structural region/feature columns). This is useful for calculating distances only across the CDR regions using Steropodon_distances.
$s1$clonotype1$`1` %>%
steropodon_igfoldSteropodon_trim(structure = 'structure', grouping = c('chain', 'region'), specific.values = c('VDJ_CDR1','VDJ_CDR2','VDJ_CDR3','VJ_CDR1','VJ_CDR2','VJ_CDR3'), combine.values = T, combine.groupings = T) %>% #combine.values will output all regions in specific.values in the same structure
Steropodon_visualize(structure = 'structure', color.by = c('chain', 'region'))
The Steropodon_distances function incorporates a wide range of distance metrics for protein structures, defined by the distance.metric parameter, for intra and cross-repertoire structural comparisons. It currently includes the Root Mean Square Deviation/ RMSD (distance.metric = ‘rmsd’), the Alphafold implementation of the Local Distance Difference Test/ lDDT (distance.metric = ‘lddt’ modified from https://github.com/deepmind/alphafold/blob/main/alphafold/model/lddt.py), the Template Modelling Score/ TM-score (distance.metric = ‘tmscore’), as well as a custom weighted sum of RMSD distances per region, which can be normalized by the number of residues (as originally implemented in https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009675, distance.metric = ‘custom’).
Root Mean Square Deviation/ RMSD:
#distance.metric = 'rmsd'
%>% Steropodon_distances(distance.metric = 'rmsd') superposed_seq_struct
Local Distance Difference Test/ lDDT requires the shared residues between each pair of structures, therefore structures must be sequence and structure superposed (seq.struct.superpose = T; struct.superpose will only perform structural superposition). It is, therefore, calculated only for the C-alpha atoms.
#distance.metric = 'lddt'
%>% Steropodon_distances(distance.metric = 'lddt', seq.struct.superpose = T)
superposed_seq_struct #Each pair of structures needs to be superposed
Similarly, the TM-score also requires the common residues between two structures, therefore seq.struct.superpose must be set to TRUE.
#distance.metric = 'tmscore'
%>% Steropodon_distances(distance.metric = 'tmscore', seq.struct.superpose = T) superposed_seq_struct
The weighted RMSD sum by region offers a greater flexibility and can incorporate some meaningful B-cell biology priors, such as the greater structural and sequence diversity of the CDR regions. The custom.specific.values defines which regions will be used from the entire structure, with the column names defining those regions in our PDB object being specified in custom.features. The custom.weights parameter is a vector of weights for each region in the custom.specific.values - in this case, we will assign a greater weight to the heavy chain CDR3 region. Lastly, each region’s RMSD distance can be normalized by its number of residues if custom.normalize.sequence.length is set to T.
This metric also requires each pair of structures that is compared to be superposed beforehand, therefore seq.struct.superpose must be TRUE.
#distance.metric = 'custom'
#WILL ONLY CALCULATE DISTANCES BETWEEN THE C-ALPHA ATOMS OF SHARED RESIDUES IN THE HEAVY CHAIN CDR REGIONS!
%>% Steropodon_distances(distance.metric = 'custom',
superposed_seq_struct custom.specific.values = c('VDJ_CDR1','VDJ_CDR2','VDJ_CDR3'), #Selecting only the VDJ CDR regions
custom.features = c('chain', 'region'), #Region names (will paste together)
custom.weights = c(1,1,3), #Greater weights for CDR3s
custom.normalize.sequence.length = T, #Normalizing regions by number of residues
seq.struct.superpose = T)
#Each pair of structures needs to be superposed
We can also trim our structures beforehand (using Steropodon_trim) and then calculate the RMSD - here we select only the CDR regions of both the heavy and light chains! In this case, we have kept only the CDR regions of both heavy and light chains - this increases the overall RMSD as opposed to the previous RMSD heatmap on whole structure.
%>%
superposed_seq_struct Steropodon_trim(structure = 'structure', grouping = c('chain', 'region'), specific.values = c('VDJ_CDR1','VDJ_CDR2','VDJ_CDR3','VJ_CDR1','VJ_CDR2','VJ_CDR3'), combine.values = T, combine.groupings = T) %>%
Steropodon_distances()
#Each pair of structures needs to be superposed
If plot.results is set to F, Steropodon_distances will output the symmetric distance matrix instead of a heatmap. This distance matrix can be used for multidimensional scaling in Steropodon_distances_MDS, or for hierarchical and k-medoids clustering in Steropodon_cluster.
The resulting scatter plot will be colored as specified in color.by - for this, we require the VGM/VDJ object as well with the columns mentioned in color.by.
#MDS FOR ENTIRE STRUCTURE
<-superposed_seq_struct %>%
distance_matrix Steropodon_distances(distance.metric = 'rmsd', plot.results = F) #Setting plot results to F to obtain a distance matrix
Steropodon_distances_MDS(steropodon.object = superposed_seq_struct,
distance.matrix = distance_matrix,
color.by = c('site', 'age', 'sample', 'VDJ_vgene'),
VGM = VGM
)
## [[1]]
##
## [[2]]
##
## [[3]]
##
## [[4]]
When considering MDS distances across the entire structure, we can see how performing MDS is not very informative, with no specific clusters forming across our structures.
We will next trim the structures and keep only the CDR regions.
#MDS FOR CDR REGIONS
<- superposed_seq_struct %>%
distance_matrix Steropodon_trim(structure = 'structure', grouping = c('chain', 'region'), specific.values = c('VDJ_CDR1','VDJ_CDR2','VDJ_CDR3','VJ_CDR1','VJ_CDR2','VJ_CDR3'), combine.values = T, combine.groupings = T) %>%
Steropodon_distances(distance.metric = 'rmsd', plot.results = F)
Steropodon_distances_MDS(steropodon.object = superposed_seq_struct,
distance.matrix = distance_matrix,
color.by = c('site', 'age', 'sample', 'VDJ_vgene'),
VGM = VGM
)
## [[1]]
##
## [[2]]
##
## [[3]]
##
## [[4]]
Invariant cores can be inferred using bio3d’s find.core implementation in the Steropodon_find_core wrapper. Cores can be predicted across the entire nested list of Steropodon object (core.per = ‘all’), across only samples (core.per = ‘sample’), or across both samples and clonotypes (single core per unique clonotype, core.per = ‘clonotype’).
#Predicting the invariant core - a new 'core' slot is added to each object in the nested list
<- steropodon_igfold %>% Steropodon_find_core(structure = 'structure', core.per = 'all') steropodon_igfold_core
## pdb/seq: 1 name: s1.clonotype1.1
## pdb/seq: 2 name: s1.clonotype2.1
## pdb/seq: 3 name: s1.clonotype3.1
## pdb/seq: 4 name: s1.clonotype4.1
## pdb/seq: 5 name: s1.clonotype5.1
## pdb/seq: 6 name: s1.clonotype6.1
## pdb/seq: 7 name: s1.clonotype7.1
## pdb/seq: 8 name: s1.clonotype8.1
## pdb/seq: 9 name: s1.clonotype9.1
## pdb/seq: 10 name: s1.clonotype10.1
## pdb/seq: 11 name: s2.clonotype1.1
## pdb/seq: 12 name: s2.clonotype1.2
## pdb/seq: 13 name: s2.clonotype1.3
## pdb/seq: 14 name: s2.clonotype2.1
## pdb/seq: 15 name: s2.clonotype3.1
## pdb/seq: 16 name: s2.clonotype4.1
## pdb/seq: 17 name: s2.clonotype5.1
## pdb/seq: 18 name: s2.clonotype6.1
## pdb/seq: 19 name: s2.clonotype7.1
## pdb/seq: 20 name: s2.clonotype8.1
## pdb/seq: 21 name: s2.clonotype9.1
## pdb/seq: 22 name: s2.clonotype10.1
## pdb/seq: 23 name: s3.clonotype1.1
## pdb/seq: 24 name: s3.clonotype1.2
## pdb/seq: 25 name: s3.clonotype2.1
## pdb/seq: 26 name: s3.clonotype3.1
## pdb/seq: 27 name: s3.clonotype4.1
## pdb/seq: 28 name: s3.clonotype5.1
## pdb/seq: 29 name: s3.clonotype5.2
## pdb/seq: 30 name: s3.clonotype6.1
## pdb/seq: 31 name: s3.clonotype7.1
## pdb/seq: 32 name: s3.clonotype8.1
## pdb/seq: 33 name: s3.clonotype9.1
## pdb/seq: 34 name: s3.clonotype10.1
## pdb/seq: 35 name: s4.clonotype1.1
## pdb/seq: 36 name: s4.clonotype2.1
## pdb/seq: 37 name: s4.clonotype4.1
## pdb/seq: 38 name: s4.clonotype6.1
## pdb/seq: 39 name: s4.clonotype7.1
## pdb/seq: 40 name: s4.clonotype8.1
## pdb/seq: 41 name: s4.clonotype11.1
## pdb/seq: 42 name: s4.clonotype13.1
## pdb/seq: 43 name: s4.clonotype17.1
## pdb/seq: 44 name: s4.clonotype18.1
## pdb/seq: 45 name: s5.clonotype1.1
## pdb/seq: 46 name: s5.clonotype2.1
## pdb/seq: 47 name: s5.clonotype3.1
## pdb/seq: 48 name: s5.clonotype3.2
## pdb/seq: 49 name: s5.clonotype4.1
## pdb/seq: 50 name: s5.clonotype5.1
## pdb/seq: 51 name: s5.clonotype6.1
## pdb/seq: 52 name: s5.clonotype7.1
## pdb/seq: 53 name: s5.clonotype8.1
## pdb/seq: 54 name: s5.clonotype9.1
## pdb/seq: 55 name: s5.clonotype11.1
## pdb/seq: 56 name: s6.clonotype1.1
## pdb/seq: 57 name: s6.clonotype2.1
## pdb/seq: 58 name: s6.clonotype2.2
## pdb/seq: 59 name: s6.clonotype3.1
## pdb/seq: 60 name: s6.clonotype4.1
## pdb/seq: 61 name: s6.clonotype5.1
## pdb/seq: 62 name: s6.clonotype6.1
## pdb/seq: 63 name: s6.clonotype7.1
## pdb/seq: 64 name: s6.clonotype10.1
## pdb/seq: 65 name: s6.clonotype12.1
## pdb/seq: 66 name: s6.clonotype13.1
## core size 205 of 206 vol = 15.168
## core size 204 of 206 vol = 9.2
## core size 203 of 206 vol = 7.426
## core size 202 of 206 vol = 5.993
## core size 201 of 206 vol = 5.371
## core size 200 of 206 vol = 4.791
## core size 199 of 206 vol = 4.49
## core size 198 of 206 vol = 4.31
## core size 197 of 206 vol = 4.154
## core size 196 of 206 vol = 3.992
## core size 195 of 206 vol = 3.867
## core size 194 of 206 vol = 3.756
## core size 193 of 206 vol = 3.673
## core size 192 of 206 vol = 3.563
## core size 191 of 206 vol = 3.456
## core size 190 of 206 vol = 3.347
## core size 189 of 206 vol = 3.251
## core size 188 of 206 vol = 3.168
## core size 187 of 206 vol = 3.104
## core size 186 of 206 vol = 3.026
## core size 185 of 206 vol = 2.977
## core size 184 of 206 vol = 2.902
## core size 183 of 206 vol = 2.856
## core size 182 of 206 vol = 2.792
## core size 181 of 206 vol = 2.727
## core size 180 of 206 vol = 2.671
## core size 179 of 206 vol = 2.615
## core size 178 of 206 vol = 2.562
## core size 177 of 206 vol = 2.502
## core size 176 of 206 vol = 2.441
## core size 175 of 206 vol = 2.378
## core size 174 of 206 vol = 2.328
## core size 173 of 206 vol = 2.28
## core size 172 of 206 vol = 2.233
## core size 171 of 206 vol = 2.183
## core size 170 of 206 vol = 2.135
## core size 169 of 206 vol = 2.088
## core size 168 of 206 vol = 2.054
## core size 167 of 206 vol = 2.015
## core size 166 of 206 vol = 1.972
## core size 165 of 206 vol = 1.931
## core size 164 of 206 vol = 1.887
## core size 163 of 206 vol = 1.847
## core size 162 of 206 vol = 1.815
## core size 161 of 206 vol = 1.78
## core size 160 of 206 vol = 1.75
## core size 159 of 206 vol = 1.722
## core size 158 of 206 vol = 1.687
## core size 157 of 206 vol = 1.661
## core size 156 of 206 vol = 1.63
## core size 155 of 206 vol = 1.602
## core size 154 of 206 vol = 1.571
## core size 153 of 206 vol = 1.541
## core size 152 of 206 vol = 1.511
## core size 151 of 206 vol = 1.481
## core size 150 of 206 vol = 1.454
## core size 149 of 206 vol = 1.422
## core size 148 of 206 vol = 1.399
## core size 147 of 206 vol = 1.371
## core size 146 of 206 vol = 1.343
## core size 145 of 206 vol = 1.311
## core size 144 of 206 vol = 1.287
## core size 143 of 206 vol = 1.259
## core size 142 of 206 vol = 1.237
## core size 141 of 206 vol = 1.214
## core size 140 of 206 vol = 1.19
## core size 139 of 206 vol = 1.166
## core size 138 of 206 vol = 1.141
## core size 137 of 206 vol = 1.116
## core size 136 of 206 vol = 1.09
## core size 135 of 206 vol = 1.067
## core size 134 of 206 vol = 1.043
## core size 133 of 206 vol = 1.022
## core size 132 of 206 vol = 0.995
## core size 131 of 206 vol = 0.97
## core size 130 of 206 vol = 0.947
## core size 129 of 206 vol = 0.922
## core size 128 of 206 vol = 0.898
## core size 127 of 206 vol = 0.874
## core size 126 of 206 vol = 0.842
## core size 125 of 206 vol = 0.823
## core size 124 of 206 vol = 0.804
## core size 123 of 206 vol = 0.786
## core size 122 of 206 vol = 0.769
## core size 121 of 206 vol = 0.744
## core size 120 of 206 vol = 0.729
## core size 119 of 206 vol = 0.7
## core size 118 of 206 vol = 0.668
## core size 117 of 206 vol = 0.647
## core size 116 of 206 vol = 0.615
## core size 115 of 206 vol = 0.581
## core size 114 of 206 vol = 0.563
## core size 113 of 206 vol = 0.532
## core size 112 of 206 vol = 0.512
## core size 111 of 206 vol = 0.482
## FINISHED: Min vol ( 0.5 ) reached
## # 112 positions (cumulative volume <= 0.5 Angstrom^3)
## start end length
## 1 20 22 3
## 2 35 40 6
## 3 45 48 4
## 4 50 50 1
## 5 78 81 4
## 6 89 96 8
## 7 105 113 9
## 8 118 122 5
## 9 129 142 14
## 10 154 161 8
## 11 165 175 11
## 12 179 187 9
## 13 190 197 8
## 14 199 200 2
## 15 202 211 10
## 16 217 226 10
$s1$clonotype1$`1` steropodon_igfold_core
## Steropodon object
## Sample: s1 Clonotype: clonotype1 Rank: 1
## Number of cells: 587
## Structures available: main, PDBs, core, H, L, CDRH3, CDRL3
## Sequence: EVKLVESEGGLVQPGSSMKLSCTASGFTFSDYYMAWVRQVPEKGLEWVANINYDGSSTYYLDSLKSRFIISRDNAKNILYLQMSSLKSEDTATYYCARDLDYFDYWGQGTTLTVSSDVVVTQTPLSLPVSFGDQVSISCRSSQSLANSYGNTYLSWYLHKPGQSPQLLIYGISNRFSGVPDRFSGSGSGTDFTLKISTIKPEDLGMYYCLQGTHQPRTFGGGTKLEIK
Furthermore, the core structure is saved in the core slot of a Steropodon object and can be visualized using Steropodon_visualize. Core annotations have also been added to the main structure, in the ‘core’ column.
We will next visualize the core structure.
#Visualizing the core structure
$s1$clonotype1$`1` %>% Steropodon_visualize(structure = 'core') steropodon_igfold_core
We can also color an existing structure by its core regions.
#Visualizing the full-chain structure with the core colored in
$s1$clonotype1$`1` %>% Steropodon_visualize(structure = 'structure', color.by = 'core', specific.color.values = NULL) steropodon_igfold_core
Lastly, cores can be used in structure superposition via Steropodon_superpositions if fit.to.core is set to T.
<- steropodon_igfold_core %>% Steropodon_superposition(structure = 'structure', fit.to.core = T, structure.superpose = T, sequence.structure.superpose = F) core_fit
## MAC system detected
The Steropodon_coordinates function can be used to extract the spatial coordinate vectors per structure, resulting in a matrix that can be used for downstream analyses/clustering. Moreover, several dimensionality reduction methods are also implemented (PCA if dim.reduction = ‘pca’, ‘tsne’ for T-SNE, ‘umap’ for UMAP) to visualize these feature vectors in a scatter plot, with points colored by the VDJ/VGM column specified in the color.by parameter.
The coordinate vectors can be aligned by performing sequence alignment on the Steropodon object, resulting in a bio3d pdbs object. If use.pdbs is set to T in Steropodon_coordinates, these vectors will be used in the scatter plot or will be in the matrix output (plot.results set to F) - these vectors correspond only to the C-alpha atoms. If use.pdbs is set to F, the resulting feature matrix will include all unaligned coordinate vectors, matching the length of the largest coordinate vector from the Steropodon nested list (with NA values added to match lengths).
If use.pdbs is T, we need the bio3d pdbs object, obtained by calling Steropodon_sequence_alignment on your nested list of superposed Steropodon objects.
<- superposed_seq_struct %>% Steropodon_sequence_alignment(structure = 'structure') superposed_w_pdbs
## pdb/seq: 1 name: s1.clonotype1.1
## pdb/seq: 2 name: s1.clonotype2.1
## pdb/seq: 3 name: s1.clonotype3.1
## pdb/seq: 4 name: s1.clonotype4.1
## pdb/seq: 5 name: s1.clonotype5.1
## pdb/seq: 6 name: s1.clonotype6.1
## pdb/seq: 7 name: s1.clonotype7.1
## pdb/seq: 8 name: s1.clonotype8.1
## pdb/seq: 9 name: s1.clonotype9.1
## pdb/seq: 10 name: s1.clonotype10.1
## pdb/seq: 11 name: s2.clonotype1.1
## pdb/seq: 12 name: s2.clonotype1.2
## pdb/seq: 13 name: s2.clonotype1.3
## pdb/seq: 14 name: s2.clonotype2.1
## pdb/seq: 15 name: s2.clonotype3.1
## pdb/seq: 16 name: s2.clonotype4.1
## pdb/seq: 17 name: s2.clonotype5.1
## pdb/seq: 18 name: s2.clonotype6.1
## pdb/seq: 19 name: s2.clonotype7.1
## pdb/seq: 20 name: s2.clonotype8.1
## pdb/seq: 21 name: s2.clonotype9.1
## pdb/seq: 22 name: s2.clonotype10.1
## pdb/seq: 23 name: s3.clonotype1.1
## pdb/seq: 24 name: s3.clonotype1.2
## pdb/seq: 25 name: s3.clonotype2.1
## pdb/seq: 26 name: s3.clonotype3.1
## pdb/seq: 27 name: s3.clonotype4.1
## pdb/seq: 28 name: s3.clonotype5.1
## pdb/seq: 29 name: s3.clonotype5.2
## pdb/seq: 30 name: s3.clonotype6.1
## pdb/seq: 31 name: s3.clonotype7.1
## pdb/seq: 32 name: s3.clonotype8.1
## pdb/seq: 33 name: s3.clonotype9.1
## pdb/seq: 34 name: s3.clonotype10.1
## pdb/seq: 35 name: s4.clonotype1.1
## pdb/seq: 36 name: s4.clonotype2.1
## pdb/seq: 37 name: s4.clonotype4.1
## pdb/seq: 38 name: s4.clonotype6.1
## pdb/seq: 39 name: s4.clonotype7.1
## pdb/seq: 40 name: s4.clonotype8.1
## pdb/seq: 41 name: s4.clonotype11.1
## pdb/seq: 42 name: s4.clonotype13.1
## pdb/seq: 43 name: s4.clonotype17.1
## pdb/seq: 44 name: s4.clonotype18.1
## pdb/seq: 45 name: s5.clonotype1.1
## pdb/seq: 46 name: s5.clonotype2.1
## pdb/seq: 47 name: s5.clonotype3.1
## pdb/seq: 48 name: s5.clonotype3.2
## pdb/seq: 49 name: s5.clonotype4.1
## pdb/seq: 50 name: s5.clonotype5.1
## pdb/seq: 51 name: s5.clonotype6.1
## pdb/seq: 52 name: s5.clonotype7.1
## pdb/seq: 53 name: s5.clonotype8.1
## pdb/seq: 54 name: s5.clonotype9.1
## pdb/seq: 55 name: s5.clonotype11.1
## pdb/seq: 56 name: s6.clonotype1.1
## pdb/seq: 57 name: s6.clonotype2.1
## pdb/seq: 58 name: s6.clonotype2.2
## pdb/seq: 59 name: s6.clonotype3.1
## pdb/seq: 60 name: s6.clonotype4.1
## pdb/seq: 61 name: s6.clonotype5.1
## pdb/seq: 62 name: s6.clonotype6.1
## pdb/seq: 63 name: s6.clonotype7.1
## pdb/seq: 64 name: s6.clonotype10.1
## pdb/seq: 65 name: s6.clonotype12.1
## pdb/seq: 66 name: s6.clonotype13.1
<- superposed_seq_struct %>% Steropodon_sequence_alignment(structure = 'CDRH3') superposed_w_pdbs_CDRH3
## pdb/seq: 1 name: s1.clonotype1.1
## pdb/seq: 2 name: s1.clonotype2.1
## pdb/seq: 3 name: s1.clonotype3.1
## pdb/seq: 4 name: s1.clonotype4.1
## pdb/seq: 5 name: s1.clonotype5.1
## pdb/seq: 6 name: s1.clonotype6.1
## pdb/seq: 7 name: s1.clonotype7.1
## pdb/seq: 8 name: s1.clonotype8.1
## pdb/seq: 9 name: s1.clonotype9.1
## pdb/seq: 10 name: s1.clonotype10.1
## pdb/seq: 11 name: s2.clonotype1.1
## pdb/seq: 12 name: s2.clonotype1.2
## pdb/seq: 13 name: s2.clonotype1.3
## pdb/seq: 14 name: s2.clonotype2.1
## pdb/seq: 15 name: s2.clonotype3.1
## pdb/seq: 16 name: s2.clonotype4.1
## pdb/seq: 17 name: s2.clonotype5.1
## pdb/seq: 18 name: s2.clonotype6.1
## pdb/seq: 19 name: s2.clonotype7.1
## pdb/seq: 20 name: s2.clonotype8.1
## pdb/seq: 21 name: s2.clonotype9.1
## pdb/seq: 22 name: s2.clonotype10.1
## pdb/seq: 23 name: s3.clonotype1.1
## pdb/seq: 24 name: s3.clonotype1.2
## pdb/seq: 25 name: s3.clonotype2.1
## pdb/seq: 26 name: s3.clonotype3.1
## pdb/seq: 27 name: s3.clonotype4.1
## pdb/seq: 28 name: s3.clonotype5.1
## pdb/seq: 29 name: s3.clonotype5.2
## pdb/seq: 30 name: s3.clonotype6.1
## pdb/seq: 31 name: s3.clonotype7.1
## pdb/seq: 32 name: s3.clonotype8.1
## pdb/seq: 33 name: s3.clonotype9.1
## pdb/seq: 34 name: s3.clonotype10.1
## pdb/seq: 35 name: s4.clonotype1.1
## pdb/seq: 36 name: s4.clonotype2.1
## pdb/seq: 37 name: s4.clonotype4.1
## pdb/seq: 38 name: s4.clonotype6.1
## pdb/seq: 39 name: s4.clonotype7.1
## pdb/seq: 40 name: s4.clonotype8.1
## pdb/seq: 41 name: s4.clonotype11.1
## pdb/seq: 42 name: s4.clonotype13.1
## pdb/seq: 43 name: s4.clonotype17.1
## pdb/seq: 44 name: s4.clonotype18.1
## pdb/seq: 45 name: s5.clonotype1.1
## pdb/seq: 46 name: s5.clonotype2.1
## pdb/seq: 47 name: s5.clonotype3.1
## pdb/seq: 48 name: s5.clonotype3.2
## pdb/seq: 49 name: s5.clonotype4.1
## pdb/seq: 50 name: s5.clonotype5.1
## pdb/seq: 51 name: s5.clonotype6.1
## pdb/seq: 52 name: s5.clonotype7.1
## pdb/seq: 53 name: s5.clonotype8.1
## pdb/seq: 54 name: s5.clonotype9.1
## pdb/seq: 55 name: s5.clonotype11.1
## pdb/seq: 56 name: s6.clonotype1.1
## pdb/seq: 57 name: s6.clonotype2.1
## pdb/seq: 58 name: s6.clonotype2.2
## pdb/seq: 59 name: s6.clonotype3.1
## pdb/seq: 60 name: s6.clonotype4.1
## pdb/seq: 61 name: s6.clonotype5.1
## pdb/seq: 62 name: s6.clonotype6.1
## pdb/seq: 63 name: s6.clonotype7.1
## pdb/seq: 64 name: s6.clonotype10.1
## pdb/seq: 65 name: s6.clonotype12.1
## pdb/seq: 66 name: s6.clonotype13.1
#A pdbs slot was added to the Steropodon object
$s1$clonotype1$`1` superposed_w_pdbs
## Steropodon object
## Sample: s1 Clonotype: clonotype1 Rank: 1
## Number of cells: 587
## Structures available: main, PDBs, H, L, CDRH3, CDRL3
## Sequence: EVKLVESEGGLVQPGSSMKLSCTASGFTFSDYYMAWVRQVPEKGLEWVANINYDGSSTYYLDSLKSRFIISRDNAKNILYLQMSSLKSEDTATYYCARDLDYFDYWGQGTTLTVSSDVVVTQTPLSLPVSFGDQVSISCRSSQSLANSYGNTYLSWYLHKPGQSPQLLIYGISNRFSGVPDRFSGSGSGTDFTLKISTIKPEDLGMYYCLQGTHQPRTFGGGTKLEIK
We can see how a new ‘pdbs’ slot has been added to our Steropodon object. This must be non-null if use.pdbs is set to TRUE in Steropodon_coordinates.
$s1$clonotype1$`1`@pdbs superposed_w_pdbs
## 1 . . . . . 60
## s1.clonotype1.1 EVKLVESEGGLVQPGSSMKLSCTASGFTFSDYYM--AWVRQVPEKGLEWVANI--N-YDG
## s1.clonotype2.1 EVQLLETGGGLVQPGGSRGLSCEGSGFTFSGFWM--SWVRQTPGKTLEWIGDI----NSD
## s1.clonotype3.1 EVQLLETGGGLVQPGGSRGLSCEGSGFTFSGFWM--SWVRQTPGKTLEWIGDI----NSD
## s1.clonotype4.1 QVQLKQSGAELVRPGASVKLSCKASGYTFTDYYI--NWVKQRPGQGLEWIARI--Y-PGS
## s1.clonotype5.1 EVQLLETGGGLVQPGGSRGLSCEGSGFTFSGFWM--SWVRQTPGKTLEWIGDI----NSD
## s1.clonotype6.1 EVQLLETGGGLVQPGGSRGLSCEGSGFTFSGFWM--SWVRQTPGKTLEWIGDI----NSD
## s1.clonotype7.1 EVQLQQSGPDLVKPGASVKISCRASGYTFTDYNM--DWVKQRHGTSLEWIGDI--N-PRN
## s1.clonotype8.1 EMKFLQSGGGLVQPGGSLRLSCAASGIDFSRYWM--SWVRRAPGKGLEWIGEI--N-PDS
## s1.clonotype9.1 EVQLVESGGGLVKPGGSLKLSCAASGFTFSDYGM--HWVRQAPEKGLEWVVYI--S-SGS
## s1.clonotype10.1 EVKLVESGGGLVQSGRSLRLSCATSGFTFSDFYM--EWVRQAPGKGLEWIAAS--RNKAN
## s2.clonotype1.1 EVQLLETGGGLVQPGGSRGLSCEGSGFTFSGFWM--SWVRQTPGKTLEWIGDI----NSD
## s2.clonotype1.2 EVQLLETGGGLVQPGGSRGLSCEGSGFTFSGFWM--SWVRQTPGKTLEWIGDI----NSD
## s2.clonotype1.3 EVQLLETGGGLVQPGGSRGLSCEGSGFTFSGFWM--SWVRQTPGKTLEWIGDI----NSD
## s2.clonotype2.1 EVKLVESEGGLVQPGSSMKLSCTASGFTFSDYYM--AWVRQVPEKGLEWVANI--N-YDG
## s2.clonotype3.1 QVQLQQPGTEMVKPGTSVKLSCKASGYVLTNYWM--HWVRQRPGQGLEWIGNI--S-PNN
## s2.clonotype4.1 QVQLQQSGPELVKPGASVKISCKASGYSFSSSWM--NWVKQRPGKGLEWIGRI--Y-PGD
## s2.clonotype5.1 QVQLQQPGAELVKPGASVKLSCKTSGYTFTSYWM--QWVKQRPGQGLEWIGEI--D-PSD
## s2.clonotype6.1 QVQLKQSGTELVKPGASVKISCKASGYTFTDNFI--NWVKQRPGQGLEWIGKI--G-PGS
## s2.clonotype7.1 QVQLKQSGAELVRPGASVKLSCKASGYTFTDYYI--NWVKQRPGQGLEWIARI--Y-PGS
## s2.clonotype8.1 QVQLQQSGAEMARPGASVKMSCLTSGYTFTSYTM--HWIRQRPGQGLEWIGYI--N-LSS
## s2.clonotype9.1 EVKLLQSGGGLVQPGGSLKLSCAASGIDFSRYWM--SWVRRAPGKGLEWIGEI--N-PDS
## s2.clonotype10.1 EVQLQQSGAELVRPGASVKLSCTASGFNIKDYYM--HWVKQRPEQGLEWIGRI--D-PED
## s3.clonotype1.1 EVQLLETGGGLVQPGGSRGLSCEGSGFTFSGFWM--SWVRQTPGKTLEWIGDI----NSD
## s3.clonotype1.2 EVQLLETGGGLVQPGGSRGLSCEGSGFTFSGFWM--SWVRQTPGKTLEWIGDI----NSD
## s3.clonotype2.1 QVQLQQSGAELVKPGASVKMSCKASGYTFTTYPI--EWMKQNHGKSLEWIGNF--H-PYN
## s3.clonotype3.1 QVQLQQPGTEVVKPGSSVKLSCKTSGFRFTNYWM--EWVKQRPGQGLEWLGMI--H-PNS
## s3.clonotype4.1 EVKLVESGGGLVQPGGSLSLSCAASGFTFTDYYM--SWVRQPPGKALEWLGFIRNK-ANG
## s3.clonotype5.1 EVMLVESGGGLVKPGGSLKLSCAASGFTFSSYTM--SWVRQTPEKRLEWVATI--S-GGG
## s3.clonotype5.2 EVMLVESGGGLVKPGGSLKLSCAASGFTFSSYTM--SWVRQTPEKRLEWVATI--S-GGG
## s3.clonotype6.1 EVKLLQSGGGLVQPGGSLKLSCAASGIDFSRYWM--SWVRRAPGKGLEWIGEI--N-PDS
## s3.clonotype7.1 QVQLKESGPGLVAPSQNLSITCTVSGFSLTTSGV--TWMRQPPGKGLEWLGVI----WGG
## s3.clonotype8.1 QVQLKESGPGLVAPSQSLSITCTVSGFSLTSYAI--SWVRQPPGKGLEWLGVI----WTG
## s3.clonotype9.1 QVQVQQPGAELVKPGSSVKMSCKASGYTFTSYWI--TWVRRRPGQGLEWIGDI--Y-PGS
## s3.clonotype10.1 QVQLQQSGAELVRPGTSVKVSCKASGYAFTNYLI--EWVKQRPGQGLEWIGVI--N-PGS
## s4.clonotype1.1 EVQLVESGGDLVKPGGSLKLSCAASGFTFRSYDM--SWVRQTPDKRLEWVATI--S-SGG
## s4.clonotype2.1 QVQLQQPGAELVKPGASVKMSCKASGYTFTSYWI--TWVKQRPGQGLEWIGDI--Y-PGS
## s4.clonotype4.1 QVQLQQSGAELVRPGTSVKMSCKASGYTFTNYWI--GWAKQRPGHGLEWIGDI--Y-PGG
## s4.clonotype6.1 QVTLKESGPGILQSSQTLSLTCSFSGFSLNTSGMGVSWIRQPSGKGLEWLAHIY----WD
## s4.clonotype7.1 EVQLVASGGGLVRPEGSLKLSCAASGFSLNTYAM--NWVRQAPGKGLEWVACM--RSKSN
## s4.clonotype8.1 EVQLQQSGAELVRPGSSVKMSCKTSGYTFTSYGI--NWVKQRPGQGLEWIGYI--Y-LGN
## s4.clonotype11.1 QVQLVETGGGLVRPGNSLKLSCVTSGFTFSNYRM--HWLRQPPGKRLEWIAVI--TVKSD
## s4.clonotype13.1 QVQLQQPGAELIKPGAPVKLSCKASGYTFTNYYI--HWVRQRPGQGLEWIGMI--H-PKS
## s4.clonotype17.1 EVQLQQSGPELVKPGASVKISCKASGYTFTDYYM--NWVKQSHGKSLEWIGDI--N-PNN
## s4.clonotype18.1 EVQLQQSGPVLVKPGASVKMSCKASGYTFTDYYM--NWVKQSHGKSLEWIGVI--N-PYN
## s5.clonotype1.1 EVQLVESGGGLVQPKGSLKLSCAASGFSFNTYAM--NWVRQAPGKGLEWVARI--RSKSN
## s5.clonotype2.1 QVQLQQSGADLVKPGASVKMSCKASGYTFTAYPI--EWMKQSHGKSLEWIGIF--H-PYN
## s5.clonotype3.1 EVQLLETGGGLVQPGGSRGLSCEGSGFTFSGFWM--SWVRQTPGKTLEWIGDI----NSD
## s5.clonotype3.2 EVQLLETGGGLVQPGGSRGLSCEGSGFTFSGFWM--SWVRQTPGKTLEWIGDI----NSD
## s5.clonotype4.1 EVQLQQSGPELVKPGASVKISCKASGYTFTDYYM--NWVKQSHGKSLEWIGDI--N-PNN
## s5.clonotype5.1 EVKLEESGGGLVQPGGSMKLSCVASGFTFSNYWM--NWVRQSPDKGLEWISQI--KLKSD
## s5.clonotype6.1 QVQLKQSGAELVRPGASVKLSCKASGYTFTDYYI--NWVKQRPGQGLEWIARI--Y-PGS
## s5.clonotype7.1 QVQLQQSGTELVKPGASVKISCKASGYAFSDYWM--NWVKQRPGKGLEWIGQI--F-PGD
## s5.clonotype8.1 QVQLQQSGAELVRPGASVTLSCKASGYTFTDYEM--YWVKQTPVHGLEWIGAI--D-PET
## s5.clonotype9.1 QVQLQQSGAELVRPGASVTLSCKASGYIFTDYEM--HWVKQTPVHGLEWIGGI--D-PEP
## s5.clonotype11.1 QVQLQQPGAELVKPGASVKLSCKASGYTFTSYWM--HWVKQRPGRGLEWIGRI--D-PNS
## s6.clonotype1.1 QAQLQQSGAELARPGASVKLSCKASGYTFTNYAI--NWVKQRTGQGLEWVGEI--S-PRS
## s6.clonotype2.1 EVMLVESGGGLVKPGGSLKLSCAASGFTFSSYTM--SWVRQTPEKRLEWVATI--S-GGG
## s6.clonotype2.2 EVMLVESGGGLVKPGGSLKLSCAASGFTFSSYTM--SWVRQTPEKRLEWVATI--S-GGG
## s6.clonotype3.1 EVQLFQSGPELVKPGTSVKISCKASGYTFTDYYI--NWVKQSHGQSLEWIGSV--N-PKN
## s6.clonotype4.1 EVKLEESGGGLVKPGGSMKLSCVASGFTFNNYWM--NWVRQSPEKGLEWVAQI--RLKSD
## s6.clonotype5.1 QVQLQQSGAELVRPGASVTLSCKASGYTFTDYEM--YWVKQTPVHGLEWIGAI--D-PET
## s6.clonotype6.1 QVQLQQPGTEVVKPGSSVKLSCKTSGFRFTNYWM--EWVKQRPGQGLEWLGMI--H-PNS
## s6.clonotype7.1 EVQLQQSGPELVQFGTSLKISCMTSGFSFTAYYL--NWVKQGPEKSLEWLGQI--N-PTT
## s6.clonotype10.1 EVQLVESGGDLVKPGGSLKLSCAASGFTFSSYGM--SWVRQTPDKRLEWVATI--S-SGG
## s6.clonotype12.1 EVQLQQSGAELVKPGASVKLSCTASGFNIKDYYI--HWMKQRTDQGLEWIGRI--D-PEN
## s6.clonotype13.1 QAYLQQSGAELVRPGASVKISCKASGYTFTSDNV--HWIKQTPRQGLEWIGAI--Y-PGN
## ^^* ** ^ * ^ ***^
## 1 . . . . . 60
##
## 61 . . . . . 120
## s1.clonotype1.1 SS-TYYLDSLKSRFIISRDNAKNILYLQMSSLKSEDTATYYCARDL-D------------
## s1.clonotype2.1 GSAINYAPSIKDRFTIFRDNDKSTLYLQMSNVRSEDTATYFCMRY------GSSY-----
## s1.clonotype3.1 GSAINYAPSIKDRFTIFRDNDKSTLYLQMSNVRSEDTATYFCMRY-------SNY-----
## s1.clonotype4.1 GN-TYYNEKFKGKATLTAEKSSSTAYMQLSSLTSEDSAVYFCARDD-------GYYPY--
## s1.clonotype5.1 GSAINYAPSIKDRFTIFRDNDKSTLYLQMSNVRSEDTATYFCMRYP-----GSSY-----
## s1.clonotype6.1 GSAINYAPSIKDRFTIFRDNDKSTLYLQMSNVRSEDTATYFCMRY------GSS------
## s1.clonotype7.1 GG-TIYDQKFKGKATLTVDKSSSTAYMELRSLTSEDTAVYYCAR---LT--GTSYDA---
## s1.clonotype8.1 ST-INYAPSLKDKFIISRDNAKNTLYLQMSKVRSEDTALYYCARHY-YG----------R
## s1.clonotype9.1 SI-IYYADTVKGRFTISRDNAKNTLFLQMTSLRSEDTAMYYCAR---------------S
## s1.clonotype10.1 DYTTEYSASVKGRFIVSRDTSQSILYLQMNALRAEDTAIYYCARDG-------------Y
## s2.clonotype1.1 GSAINYAPSIKDRFTIFRDNDKSTLYLQMSNVRSEDTATYFCMRY-------SNY-----
## s2.clonotype1.2 GSAINYAPSIKDRFTIFRDNDKSTLYLQMSNVRSEDTATYFCMRY------GSSY-----
## s2.clonotype1.3 GSAINYAPSIKDRFTIFRDNDKSTLYLQMSNVRSEDTATYFCMRY------GSSY-----
## s2.clonotype2.1 SS-TYYLDSLKSRFIISRDNAKNILYLQMSSLKSEDTATYYCARDL-Y------------
## s2.clonotype3.1 GN-TNYNEKFKTKATLTVDQSSSTAYMLLRSLTSEASAVYYCARTG--------------
## s2.clonotype4.1 GD-TNYNGKFKGKATLTADKSSSTAYMQLSSLTSEDSAVYFCARD---------------
## s2.clonotype5.1 TY-TNYIQNFKGKATLTVDTSSSTAYMQLSSLTSEDSAVYYCARRN-L-----------L
## s2.clonotype6.1 DT-TYYNAKFKGKATLTADKSSCTAYMQLSSLTSEDSAIYFCARGP--------------
## s2.clonotype7.1 GN-TYYNEKFKGKATLTAEKSSSTAYMQLSSLTSEDSAVYFCARE-----------HYSN
## s2.clonotype8.1 DY-TEYNQKFKDKATLTADRSSSTAYLQLSSLTSEDSSVYFCARSG------DY-----D
## s2.clonotype9.1 ST-INYAPSLKDKFIISRDNAKNTLYLQMSKVRSEDTALYYCARNW--------------
## s2.clonotype10.1 GD-TEYAPKFQGKATMTADTSSNTAYLQLSSLTSEDTAVYYC----------TTW----L
## s3.clonotype1.1 GSAINYAPSIKDRFTIFRDNDKSTLYLQMSNVRSEDTATYFCMRY-------SNY-----
## s3.clonotype1.2 GSAINYAPSIKDRFTIFRDNDKSTLYLQMSNVRSEDTATYFCMRY------GSSY-----
## s3.clonotype2.1 DD-TKYNEKFKGKATLTVEKSSSTVYLELSRLTSDDSAVYYCA-----------------
## s3.clonotype3.1 GN-TNFTEKFKTKATMTVDRSSNTAYLQLISLTSEDSAVYFCARER-D------------
## s3.clonotype4.1 YT-TEYSASVKGRFTISRDNSQSILYLQMNALRAEDSATYYCA--S-YY--GNY------
## s3.clonotype5.1 GN-TYYPDSVKGRFTISRDNAKNTLYLQMSSLRSEDTALYYCARDY-Y----G------N
## s3.clonotype5.2 GN-TYYPDSVKGRFTISRDNAKNTLYLQMSSLRSEDTALYYCARDY-Y----G------N
## s3.clonotype6.1 ST-INYAPSLKDKFIISRDNAKNTLYLQMSKVRSEDTALYYCARNW--------------
## s3.clonotype7.1 GS-ANYNSALMSRLSISKDNSKSQVFLRIDSLQTDDTALYYCAK---HGWDDT-------
## s3.clonotype8.1 GG-TNYNSALKSRLSISKDNSKSQVFLKMNSLQTDDTARYYCARKV-HDYDGAW------
## s3.clonotype9.1 GS-SNFNEKFKNTATLTVDTSSSTAYLQLSSLTSEDSAVYYCAR----------------
## s3.clonotype10.1 GG-TNYNEKFKGKAILTEDKSSSTAYMQLSSLTSEDSAVYFCARGK-----GIYY-----
## s4.clonotype1.1 SY-TYYLDSVKGRFTISRDNAKNTLYLQMSSLKSEDTAMYYCVRHR-I----SS-----T
## s4.clonotype2.1 GS-TNYNEKFKSKATLTVDTSSSTAYMQLSSLTSEDSAVYYCARGG-L-----I-----S
## s4.clonotype4.1 GY-TNYNEKFKGKATLTADKSSSTAYMQFSSLTSEDSAIYYCARYD-----GPSW-----
## s4.clonotype6.1 DD-KRYNPSLKSRLTISKDTSRNQVFLKITSVDAADSATYYCARRS-----GRV------
## s4.clonotype7.1 NYATYYADSVKDRVTISRDDSENMFYLQMDNLKTEDTGMYFCV-----------------
## s4.clonotype8.1 GY-TEYNEKFKGMATLTSDTSSSTAYMQLSSLTSDDSAIYFCARQG-------------G
## s4.clonotype11.1 NYGANYAESVKGRFAISRDDSKSSVYLEMNRLREEDTATYFCRG----------------
## s4.clonotype13.1 GT-TLYSEKFKSKATLTVDKSSSTAYMHLNSLTSDDSAVYYCARDY--------------
## s4.clonotype17.1 GG-TSYNQKFKGKATLTVDKSSSTAYMELRSLTSEDSAVYYCARS-----------PYGN
## s4.clonotype18.1 GG-TSYNQKFKGKATLTVDKSSSTAYMELNSLTSEDSAVYYCAREE----------VLIT
## s5.clonotype1.1 NYATYYADSVKDRFTISRDDSESMLYLQMNNLKTEDTAMYYCVRPA--------------
## s5.clonotype2.1 DE-TKYSTNFKNRATLTVEKSSNTVYLELRRLTSEDSAVYFCARGG--------------
## s5.clonotype3.1 GSAINYAPSIKDRFTIFRDNDKSTLYLQMSNVRSEDTATYFCMRY------GSSY-----
## s5.clonotype3.2 GSAINYAPSIKDRFTIFRDNDKSTLYLQMSNVRSEDTATYFCMRY-------SNY-----
## s5.clonotype4.1 GG-TSYNQKFKGKATLTVDKSSSTAYMELRSLTSEDSAVYYCARSGEYF--GSRYDPY--
## s5.clonotype5.1 NYATHYAESVKGRFTISRDDSKSTVYLQMNNLRTEDTGIYYCS-----------------
## s5.clonotype6.1 GN-TYYNEKFKGKATLTAEKSSSTAYMQLSSLTSEDSAVYFCARE-----------HYSN
## s5.clonotype7.1 GD-INYNGKFKGKATLTVDKSSSTAYMQLSSLTSEDSAVYFCARGA--------------
## s5.clonotype8.1 GG-TAYNQKFKGKAILTADKSSSTAYMELRSLTSEDSAVYYCTRKE-V------------
## s5.clonotype9.1 GA-SAYYKKFKGKAILTADTSSNTAYMELRSLTSEDSAVYYCTRRN-Y---G-------K
## s5.clonotype11.1 GG-TKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCARPL-I---TTV-----G
## s6.clonotype1.1 GN-TYYTGEFRGKATLTADKSSSTAYMELRSLTSEDSAVYFCARGG-FD--G-------N
## s6.clonotype2.1 GN-TYYPDSVKGRFTISRDNAKNTLYLQMSSLRSEDTALYYCARDY-Y----G------N
## s6.clonotype2.2 GN-TYYPDSVKGRFTISRDNAKNTLYLQMSSLRSEDTALYYCARDY-Y----G------N
## s6.clonotype3.1 GD-DKYNHRFTDKATLNVDKSTTTAYMDLRSLTSEDSAVYYCARWG-F---GS-------
## s6.clonotype4.1 NYAAHYAESVKGRVTISRDDSKSSVYLQMNNLRVEDTGIYYCTG-Y--------------
## s6.clonotype5.1 GG-TAYNQKFKGKAILTADKSSSTAYMELRSLTSEDSAVYYCTRD---------------
## s6.clonotype6.1 GN-TNFTEKFKTKATMTVDRSSNTAYLQLISLTSEDSAVYFCARER-D------------
## s6.clonotype7.1 GG-AVYNQKFKARATMSVDKSASCAYVELTNLTYEDSAIYYCARVA-Y---GSRL-----
## s6.clonotype10.1 SY-TYYPDSVKGRFTISRDNAKNTLYLQMSSLKSEDTAMYYCARHE-YY--GSS-----Y
## s6.clonotype12.1 GE-TKSAPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCA--PIYSYGSSHW-----
## s6.clonotype13.1 GD-TTYNQKFKDKATLTADKSSSTANMQLSSLTSEDSAVYFCAREDLYF---GDF-----
## ^ ^ ^ ^ ^ *^*
## 61 . . . . . 120
##
## 121 . . . . . 180
## s1.clonotype1.1 -YF--------DYWGQGTTLTVSSDVVVTQTPLSLPVSFGDQVSISCRSSQSLANS-YGN
## s1.clonotype2.1 -WY-------FDVWGTGTTVTVSSDIKMTQSPSSMYASLGERVTITCKASQDI------K
## s1.clonotype3.1 -WY-------FDVWGTGTTVTVSSDIKMTQSPSSMYASLGERVTITCKASQDI------K
## s1.clonotype4.1 --Y-------FDYWGQGTTLTVSSDIVLTQSPASLAVSLGQRATISCKASQSVDYD--GD
## s1.clonotype5.1 -WY-------FDVWGTGTTVTVSSDIKMTQSPSSMYASLGERVTITCKASQDI------K
## s1.clonotype6.1 -WY-------FDVWGTGTTVTVSSDIKMTQSPSSMYASLGERVTITCKASQDI------K
## s1.clonotype7.1 ----------MSHWGQGTSVTVSSDIQMTQSPASLSASVGETVTITCRPSENI------Y
## s1.clonotype8.1 GGH-------FDVWGTGTTVTVSSQIVLSQSPAIMSASLGERVTMTCTASSSVSS-----
## s1.clonotype9.1 YYYT------MDYWGQGTSVTVSSDIVMTQSPSSLAMSVGQKVTMSCKSSQSLLNSSNQK
## s1.clonotype10.1 DWY-------FDVWGTGTTVTVSSDIQMNQSPSSLSASLGDTITITCHASQNI------N
## s2.clonotype1.1 -WY-------FDVWGTGTTVTVSSDIKMTQSPSSMYASLGERVTITCKASQDI------K
## s2.clonotype1.2 -WY-------FDVWGTGTTVTVSSDIKMTQSPSSMYASLGERVTITCKASQDI------K
## s2.clonotype1.3 -WY-------FDVWGTGTTVTVSSDIKMTQSPSSMYASLGERVTITCKASQDI------K
## s2.clonotype2.1 -YF--------DYWGQGTTLTVSSDVVVTQTPLSLPVSFGDQVSISCRSSQSLANS-YGN
## s2.clonotype3.1 TDY-------FDYWGQGTTLTVSSDIVMTQSPATLSVTPGERVSLSCRASQSISDH----
## s2.clonotype4.1 -YYGSS----FDYWGQGTTLTVSSDIVMTQATFSNPVTLGTSASISCRSSKSLLHS-NGI
## s2.clonotype5.1 WSF-------FDYWGQGTTLTVSSQIVLSQSPAILSASPGEKVTMTCRASSSV-------
## s2.clonotype6.1 TDY-------FDYWGQGTTLTVSSDIVMTQSPAILSVTPGDRVSLSCRASQSISDF----
## s2.clonotype7.1 YGY-------FDVWGTGTTVTVSSDIQMTQTTSSLSASLGDRVTISCRASQDI------S
## s2.clonotype8.1 GYYA------IGYWGQGTSVTVSSDILLTQSPAFLSVNPGERVNFSCRASQGIGTS----
## s2.clonotype9.1 DVY-------FDVWGTGTTVTVSSDIVLTQSPASLAVSLGQRATISCRASKSVSTS--GY
## s2.clonotype10.1 LQY-------FDVWGTGTTVTVSSNIVMTQSPKSMSMSVGERVTLSCKASENV------G
## s3.clonotype1.1 -WY-------FDVWGTGTTVTVSSDIKMTQSPSSMYASLGERVTITCKASQDI------K
## s3.clonotype1.2 -WY-------FDVWGTGTTVTVSSDIKMTQSPSSMYASLGERVTITCKASQDI------K
## s3.clonotype2.1 -IY-------YGNWGQGTTLTVSSQILLTQSPAIMSASPGEKVTMTCSASSSV-------
## s3.clonotype3.1 -YFGSG----FAYWGQGTLVTVSADVLMTQIPLSLSVSLGDQASISCKSSQSIVHS-NGN
## s3.clonotype4.1 -WY-------FDVWGTGTTVTVSSDIVLTQSPATLSVTPGDSVSLSCRASQSISNN----
## s3.clonotype5.1 IYYV------MDYWGQGTSVTVSSDVLMTQTPLSLPVSLGDQASISCRSSQSIVHS-NGN
## s3.clonotype5.2 IYYA------MDYWGQGTSVTVSSDVLMTQTPLSLPVSLGDQASISCRSSQSIVHS-NGN
## s3.clonotype6.1 DVG-------FDYWGQGTTLTVSSDIQMTQTTSSLSASLGDRVTISCRASQDIS------
## s3.clonotype7.1 ----------LDYWGQGTSVTVSSEIVVSQSPTTMTASPGEKITITCSASSSF-----SS
## s3.clonotype8.1 ----------FAYWGQGTLVTVSADIVMTQSHKFMSTSVGDRVSITCKASQDV-----ST
## s3.clonotype9.1 -----GDLGVFAYWGQGTLVSVSADVLMTQTPLSLPVSLGDQASISCRSSQNIVHS-NRN
## s3.clonotype10.1 --YGSS---SFAYWGQGTLVTVSADIVMTQSHKFMSTSVGDRVSITCKASQDV------S
## s4.clonotype1.1 GYYA------MDYWGQGTSVTVSSDIVMTQSQKFMSTSVGDRVSVTCKASQ------NVG
## s4.clonotype2.1 SPY-------FDYWGQGTTLTVSSDIVMTQSPATLSVTPGDRVSLSCRASQSISDY----
## s4.clonotype4.1 --Y-------FDVWGTGTTVTVSSDIVMTQSHKFMSTSVGDRVSITCKASQDV------S
## s4.clonotype6.1 RMY-------FDVWGTGTTVTVSSDIVVSQSPATLSVTPGDSVSLSCRASQSISNS----
## s4.clonotype7.1 --RAYA----MDYWGQGTSVTVSSDVVMTQTPLTLSVTIGQPASISCKSSQSLLDS-DGK
## s4.clonotype8.1 LLA-------MDYWGQGTSITVSSDIQMNQSPSSLSASLGDTITITCHASQHI------N
## s4.clonotype11.1 -----G----FAYWGQGTLVTVSASIVMTQTPKFLLVSAGDRVTITCKASQSV------S
## s4.clonotype13.1 -------SNCLDYWGQGTTLTVSSDVQITQSPSYLAASPGETIRINCRASKSI------H
## s4.clonotype17.1 RGY-------FDVWGTGTTVTVSSDIQMTQTTSSLSASLGDRVTISCRASQDI------S
## s4.clonotype18.1 TVA-------MDYWGQGTSVTVSSDIQMTQSPSSLSASLGERVSLTCRASQEI------S
## s5.clonotype1.1 -YYSNP----FAYWGQGTLVTVSADIQMTQSSSYLSVSLGGRVTITCKASDHI------N
## s5.clonotype2.1 EFYT------LDYWGQGTSVTVSSDIVMTQTSPSVIVSPGESVSISCRSNTSLLHS-NGN
## s5.clonotype3.1 -WY-------FDVWGTGTTVTVSSDIKMTQSPSSMYASLGERVTITCKASQDI------K
## s5.clonotype3.2 -WY-------FDVWGTGTTVTVSSDIKMTQSPSSMYASLGERVTITCKASQDI------K
## s5.clonotype4.1 --Y-------FDYWGQGTTLTVSSDIQMTQSPASLSVSVGETVTITCRASENI------Y
## s5.clonotype5.1 -DFNPL----FAYWGQGTTLTVFSDIVITQDELSTLVTSGESASISCRSSKSLLYG-DGK
## s5.clonotype6.1 YGY-------FDVWGTGTTVTVSSDIQMTQTTSSLSASLGDRVTISCRASQDI------S
## s5.clonotype7.1 ------------YWGQGTLVTVSAQIVLTQSPAIMSASPGEKVTMTCSASSSV-------
## s5.clonotype8.1 -YYYGSSHWYFDVWGTGTTVTVSSDVLMTQTPLSLPVSLGDQASISCRSSQSIVHS-NGN
## s5.clonotype9.1 KDA-------MDYWGQGTSITVSSQIVLTQSPAIMSATLGERVTMTCTVTSSMS-----S
## s5.clonotype11.1 GYY-------FDYWGQGTTLTVSSDILLTQSPAILSVSPGERVSFSCRASQSIGTS----
## s6.clonotype1.1 YAA-------MEHWGQGTSVTVSSENVLTQSPAIMSASPGEKVTMTCSASSSV-------
## s6.clonotype2.1 IYYA------MDYWGQGTSVTVSSDVLMTQTPLSLPVSLGDQASISCRSSQSIVHS-NGN
## s6.clonotype2.2 IYYV------MDYWGQGTSVTVSSDVLMTQTPLSLPVSLGDQASISCRSSQSIVHS-NGN
## s6.clonotype3.1 ----------FDHWGQGTILTVSSDIVLTQSPASLAVSQGQRATISCRASESADDF--GI
## s6.clonotype4.1 -DYAWY----FDVWGTGTTVTVSSDVVMTQTPLTLSVTIGQPASISCKSSQSLLHS-DGK
## s6.clonotype5.1 -YRYA-----MDYWGQGTSVTVSSDVVMTQTPLSLPVSLGDQASISCRSSQSLVHS-NGN
## s6.clonotype6.1 -YFGSG----FAYWGQGTLVTVSADVLMTQIPLSLSVSLGDQASISCKSSQSIVHS-NGN
## s6.clonotype7.1 ---------TMDCWGQGTSVTVSSDIVLTQSPVSLAVSLGQRATISCRASESVDTY--GI
## s6.clonotype10.1 GWY-------FDVWGTGTTVTVSSDIVMTQSPATLSVTPGDRVSLSCRASQSISDY----
## s6.clonotype12.1 --Y-------FDVWGTGTTVTVSSDIKMTQSPSSMYASLGQRVTITCKASQDI------N
## s6.clonotype13.1 VRN-------FGVWGTGTTVTVSSDIRMTQSPSSMFASLGERVTITCRASRDI------N
## ** ** ^^* ^ * * *
## 121 . . . . . 180
##
## 181 . . . . . 240
## s1.clonotype1.1 TYLSWYLHKPGQSPQLLIYGISNRFSGVPDRFSGSGSGTDFTLKISTIKPEDLGMYYCLQ
## s1.clonotype2.1 SYLSWYQQKPWKSPKTLIYYATSLADGVPSRFSGSGSGQDYSLTISSLESDDTATYYCLQ
## s1.clonotype3.1 SYLSWYQQKPWKSPKTLIYYATSLADGVPSRFSGSGSGQDYSLTISSLESDDTATYYCLQ
## s1.clonotype4.1 SYMNWYQQKPGQPPKLLIYAASNLESGIPARFSGSGSGTDFTLNIHPVEEEDAATYYCQQ
## s1.clonotype5.1 SYLSWYQQKPWKSPKTLIYYATSLADGVPSRFSGSGSGQDYSLTISSLESDDTATYYCLQ
## s1.clonotype6.1 SYLSWYQQKPWKSPKTLIYYATSLADGVPSRFSGSGSGQDYSLTISSLESDDTATYYCLQ
## s1.clonotype7.1 SYLAWYQQKQGKSPQLLVYNGKILAEGVPSRFSGSGSGTHFSLRINSLQPEDFGTYYCQH
## s1.clonotype8.1 TYFHWYQQKPGSSPKLWIYGTSNLASGVPTRFSGSGSGTSYSLTISSMEAEDAATYYCHQ
## s1.clonotype9.1 NYLAWYQQKPGQSPKLLVYFASTRESGVPDRFIGSGSGTDFTLTISSVQAEDLADYFCQQ
## s1.clonotype10.1 VWLSWYQQKPGNIPKLLIYKASNLHTGVPSRFSGSGSGTGFTLTISSLQPEDIATYYCQQ
## s2.clonotype1.1 SYLSWYQQKPWKSPKTLIYYATSLADGVPSRFSGSGSGQDYSLTISSLESDDTATYYCLQ
## s2.clonotype1.2 SYLSWYQQKPWKSPKTLIYYATSLADGVPSRFSGSGSGQDYSLTISSLESDDTATYYCLQ
## s2.clonotype1.3 SYLSWYQQKPWKSPKTLIYYATSLADGVPSRFSGSGSGQDYSLTISSLESDDTATYYCLQ
## s2.clonotype2.1 TYLSWYLHKPGQSPQLLIYGISNRFSGVPDRFSGSGSGTDFTLKISTIKPEDLGMYYCLQ
## s2.clonotype3.1 --LHWYQQKSHESPRLLIKYASQSISGVPSRFSGSGSGSAFTLNINSVEPEDVGVYFCQH
## s2.clonotype4.1 TYLYWYLQKPGQSPQLLIYQMSNLASGVPDRFSGSGSGTDFTLGISRVEAEDVGVYYCAQ
## s2.clonotype5.1 SYMDWYQQKPGSSPKPWIYATSDLASGVPTRFSGSGSGTSYSLTISGVEAEDAATYYCQQ
## s2.clonotype6.1 --LHWYYQKSHESPRLLIKYASQSISGIPSRFSGSGSGSDFTLTINSVEPEDVGVYYCQN
## s2.clonotype7.1 NYLNWYQQKPDGTVKLLIYYTSRLHSGVPSRFSGSGSGTDYSLTISNLEQEDIATYFCQQ
## s2.clonotype8.1 --IHWYQQRTNGSPRLLIKYASDSMSGIPSRFRGSGSGTDFTLTINSVESEDIADYYCQQ
## s2.clonotype9.1 SYMHWYQQKPGQPPKLLIYLASNLESGVPARFSGSGSGTDFTLNIHPVEEEDAATYYCQH
## s2.clonotype10.1 TYVSWYQQKPEQSPKLLIYGASNRYTGVPDRFTGSGSATDFTLTISSVQAEDLADYHCGQ
## s3.clonotype1.1 SYLSWYQQKPWKSPKTLIYYATSLADGVPSRFSGSGSGQDYSLTISSLESDDTATYYCLQ
## s3.clonotype1.2 SYLSWYQQKPWKSPKTLIYYATSLADGVPSRFSGSGSGQDYSLTISSLESDDTATYYCLQ
## s3.clonotype2.1 SYMHWYQQKPGSSPKPWIYDTSNLASGFPARFSGSGSGTSYSLIISSMEAEDAATYYCHQ
## s3.clonotype3.1 IYLEWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISKVEAEDLGLYYCSQ
## s3.clonotype4.1 --LHWYQQKSHESPRLLIKYASQSISGIPSRFSGSGSGTDFTLSINSVETEDFGMYFCQQ
## s3.clonotype5.1 TYLEWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYYCFQ
## s3.clonotype5.2 TYLEWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYYCFQ
## s3.clonotype6.1 NYLNWYQQKPDGTVKLLIYYTSRLHSGVPSRFSGSGSGTDYSLTISNLEQEDIATYFCQQ
## s3.clonotype7.1 NYLHWYQQKPGFSPTLLIYRTSYLASGVPPRFSGTGSGTSYSLTIDTMEAEDVATYYCQQ
## s3.clonotype8.1 A-VAWYQQKPGQSPKLLIYWASTRHTGVPDRFTGSGSGTDYTLTISSVQAEDLALYYCQQ
## s3.clonotype9.1 TYLEWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYYCFQ
## s3.clonotype10.1 TAVAWYQQKPGQSPKLLIYSASYRYTGVPDRFTGSGSGTDFTFTISSVQAEDLAVYYCQQ
## s4.clonotype1.1 TNVAWYQQKPGQSPKALIYSASYRYSGVPDRFTGSGSGTDFTLTISNVQSEDLTEYFCHQ
## s4.clonotype2.1 --LHWYQQKSHESPRLLIKYASQSISGIPSRFSGSGSGSDFTLSINSVEPEDVGVYYCQN
## s4.clonotype4.1 TAVAWYQQKPGQSPKLLIYSASYRYTGVPDRFTGSGSGTDFTFTISSVQAEDLAVYYCQQ
## s4.clonotype6.1 --LHWYQQKSHESPRLLIKYSSQSISGIPSRFSGSGSGTDFTLSINSVETEDFGMYFCQQ
## s4.clonotype7.1 TYLNWLFQRPGQSPKRLIYLVSKLDSGVPDRFTGSGSGTDFTLKISRVEAEDLGVYYCWQ
## s4.clonotype8.1 VWLTWYQQKPGNFPKLLIYKTSNLHTGVPSRFSGSGSGTGFTLTISSLQPEDIATYYCQQ
## s4.clonotype11.1 NDVAWYQQKPGQSPKLLIYYASNRYTGVPDRFTGSGYGTDFTFTISTVQAEDLAVYFCQQ
## s4.clonotype13.1 KFLAWYQEKPGKTNKLLIYYGSSLQSGIPSRFSGSGSGTDFSLTISSLEPEDFAMYYCQQ
## s4.clonotype17.1 NYLNWYQQKPDGTVKLLIYYTSRLHSGVPSRFSGSGSGTDYSLTISNLEQEDIATYFCQQ
## s4.clonotype18.1 GYLSWLQQKPDGTIKRLIYAASTLDSGVPKRFSGSRSGSDYSLTISSLESEDFADYYCLQ
## s5.clonotype1.1 NWLAWYQQKPGNAPRLLISGATSLETGVPSRFSGSGSGKDYTLSITSLQTEDVATYYCQQ
## s5.clonotype2.1 TFLSWFLQRPGQSPQLLIYRMSTLASGVPYRFSGSGSGTVFTLRISRVKAEDVGIYYCMQ
## s5.clonotype3.1 SYLSWYQQKPWKSPKTLIYYATSLADGVPSRFSGSGSGQDYSLTISSLESDDTATYYCLQ
## s5.clonotype3.2 SYLSWYQQKPWKSPKTLIYYATSLADGVPSRFSGSGSGQDYSLTISSLESDDTATYYCLQ
## s5.clonotype4.1 SNLAWYQQKQGKSPQLLVYVATNLADGVPSRFSGSGSGTQYSLKINSLQSEDFGSYYCQH
## s5.clonotype5.1 TYLNWFLQRPGQSPQLLIYLMSTRLSGVSDRFSGSGSGTDFTLEISRVQAEDVGLYYCQQ
## s5.clonotype6.1 NYLNWYQQKPDGTVKLLIYYTSRLHSGVPSRFSGSGSGTDYSLTISNLEQEDIATYFCQQ
## s5.clonotype7.1 SHMYWYQQKPGSSPRLLIYDTSNLASGVPVRFSGSGSGTSYSLTISRMEAEDAAIYYCQQ
## s5.clonotype8.1 TYLEWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYYCFQ
## s5.clonotype9.1 SYLHWFQQKPGSSPKLWIYSTSNLASGVPVRFSGSGSGTSYSLTISSMEAEDAATYYCHQ
## s5.clonotype11.1 --IHWYQQRTNGSPRLLIKYASESISGIPSRFSGSGSGTDFTLSINSVESEDIADYYCQQ
## s6.clonotype1.1 SYMHWYQQRSSTSPKLWIYDTSKLASGVPGRFSGSGSGNSYSLTIRTMEAEDVATYYCFQ
## s6.clonotype2.1 TYLEWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYYCFQ
## s6.clonotype2.2 TYLEWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYYCFQ
## s6.clonotype3.1 SFLTWFQQKAGQPPKVLIFAASNRGSGVPARFSGSGSGTDFTLNIHPLEDDDSAMYFCQQ
## s6.clonotype4.1 TCLNWLLQRPGQSPKRLIYLVSKLDSGVPDRFTGSGSGTDFKLSISRVEAEDLGIYYCWQ
## s6.clonotype5.1 TYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQ
## s6.clonotype6.1 IYLEWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISKVEAEDLGLYYCSQ
## s6.clonotype7.1 SFMNWFQQKPGQPPKLLIYVASKQGSGVPARFTGSGSRTDFSLNIHPMEKDDVAMYFCQQ
## s6.clonotype10.1 --LHWYQQKSHESPRLLIKYASQSISGIPSRFSGSGSGSDFTLSINSVEPEDVGVYYCQN
## s6.clonotype12.1 SYLSWFHQKPGKSPKTLIYRANRLVDGVPSRFSGSGSGQDYSLTISSLDYEDMGIYYCLQ
## s6.clonotype13.1 SYLSWFQQKPGKSPKTLIYRANSLVDGVPSRFSGSGSGLDYSLTISSLEYDDVGFYYCLQ
## * ^ ^ * ** *^ ^ * ^ ^* * *
## 181 . . . . . 240
##
## 241 . 259
## s1.clonotype1.1 -GTHQP-RTFGGGTKLEIK
## s1.clonotype2.1 -HGESP-YTFGGGTKLEIK
## s1.clonotype3.1 -HGESP-YTFGGGTKLEIK
## s1.clonotype4.1 -SNEDP-YTFGGGTKLEIK
## s1.clonotype5.1 -HGESP-YTFGGGTKLEIK
## s1.clonotype6.1 -HGESP-YTFGGGTKLEIK
## s1.clonotype7.1 -HYGVP-RTFGGGSKLEIE
## s1.clonotype8.1 -YHRSP-LTFGAGTKLELK
## s1.clonotype9.1 -HYGFP-FTFGSGTKLEIK
## s1.clonotype10.1 -GQSYP-FTFGSGTKLEIK
## s2.clonotype1.1 -HGESP-YTFGGGTKLEIK
## s2.clonotype1.2 -HGESP-YTFGGGTKLEIK
## s2.clonotype1.3 -HGESP-WTFGGGTKLEIK
## s2.clonotype2.1 -GTHQP-YTFGGGTKLEIK
## s2.clonotype3.1 -GLSFP-YTFGGGTKLEIK
## s2.clonotype4.1 -NLEL--WTFGGGTKLEIK
## s2.clonotype5.1 -WSSNP-FTFGSGTKLEIK
## s2.clonotype6.1 -GLSFP-RTFGGGTKVEIK
## s2.clonotype7.1 -GNTLP-PTFGGGTKLEIK
## s2.clonotype8.1 -SKTWPTLTFGAGTKLELK
## s2.clonotype9.1 -SRELP-RTFGGGTKLEIK
## s2.clonotype10.1 SYSY-P-YTFGGGTKLEIK
## s3.clonotype1.1 -HGESP-YTFGGGTKLEIK
## s3.clonotype1.2 -HGESP-YTFGGGTKLEIK
## s3.clonotype2.1 -RSSYP-WTFGGGTKLEIK
## s3.clonotype3.1 -GSHVP-ITFGSGTKLEIE
## s3.clonotype4.1 -SNSWP-LTFGAGTKLELK
## s3.clonotype5.1 -GSHVP-LTFGAGTKLELK
## s3.clonotype5.2 -GSHVP-LTFGAGTKLELK
## s3.clonotype6.1 -GNTLP-RTFGGGTKLEIK
## s3.clonotype7.1 -ARTIP-FTFGSGTKLGIK
## s3.clonotype8.1 -HYSTP-PTFGGGTKLEIK
## s3.clonotype9.1 -GSHVP-YTFGGGTKLELK
## s3.clonotype10.1 -HYST--RTFGGGTKLEIK
## s4.clonotype1.1 -YNNYP-LTFGAGTRLELK
## s4.clonotype2.1 -GHSFP-FTFGSGTKLEIK
## s4.clonotype4.1 -HYSTP-PTFGGGTKLEIK
## s4.clonotype6.1 -SNSWP-LTFGAGTKLELK
## s4.clonotype7.1 -YRHFP-YTFGGGTKLAIK
## s4.clonotype8.1 -GQSFP-LTFGAGTKLELK
## s4.clonotype11.1 -DYSSP-LTFGAGTKLELK
## s4.clonotype13.1 -HNEYP-LTFGAGTKLEVK
## s4.clonotype17.1 -GNTLP-YTFGGGTKLEIK
## s4.clonotype18.1 -YASYPRWTFGGGTKLEIK
## s5.clonotype1.1 -YWSTP-YTFGGGTKLEIK
## s5.clonotype2.1 -HLGYP-FTFGGGTKVEIK
## s5.clonotype3.1 -HGESP-WTFGGGTKLEIK
## s5.clonotype3.2 -HGESP-YTFGGGTKLEIK
## s5.clonotype4.1 -FWGTP-WTFGGGTKLEIK
## s5.clonotype5.1 -LLEYP-YTFGGGTKLEIK
## s5.clonotype6.1 -GNTLP-PTFGGGTKLEIK
## s5.clonotype7.1 -WNSYP-FTFGSGTKLEIK
## s5.clonotype8.1 -GSHVP-YTFGGGTKLEIK
## s5.clonotype9.1 -YHRSP-LTFGAGTKLELK
## s5.clonotype11.1 -SNSWP-LTFGAGTKLELK
## s6.clonotype1.1 -GSGYP-FTFGAGTKLELK
## s6.clonotype2.1 -GSHVP-LTFGAGTKLELK
## s6.clonotype2.2 -GSHVP-LTFGAGTKLELK
## s6.clonotype3.1 -SKEVP-YTFGGGTKLKIK
## s6.clonotype4.1 -GTHFP-HTFGGGTKLEIK
## s6.clonotype5.1 -STHVP-YTFGGGTKLEIK
## s6.clonotype6.1 -GSHVP-ITFGSGTKLEIE
## s6.clonotype7.1 -SKEVP-WTFGGGTKLESK
## s6.clonotype10.1 -GHSFP-LTFGAGTKLELK
## s6.clonotype12.1 -YDE-L-YTFGGGTKLEIK
## s6.clonotype13.1 -YDEFP-YTFGGGTKLEIK
## *** *^^^
## 241 . 259
##
## Call:
## bio3d::read.fasta.pdb(aln = aln, pdblist = structure.list)
##
## Class:
## pdbs, fasta
##
## Alignment dimensions:
## 66 sequence rows; 259 position columns (206 non-gap, 53 gap)
##
## + attr: xyz, resno, b, chain, id, ali, resid, sse, call,
## elety, segid, elesy, charge, region
If plot.results is set to T, we also need to provide a VDJ or VGM object to obtain the custom color columns used in color.by. In this example, we have plotted the aligned coordinate vectors of the entire structures and colored them by site, age, and the V gene. We can observe no specific clustering across sites or sample age - however, two distinct clusters have formed, corresponding to specific V genes, as plotted below.
#FOR THE FULL-CHAIN STRUCTURE
#Steropodon_coordinates with use.pdbs = T
#dim.reduction = 'pca'
%>% Steropodon_coordinates(use.pdbs = T, color.by = c('site', 'age', 'VDJ_vgene'), dim.reduction = 'pca', VGM = VGM) superposed_w_pdbs
## [[1]]
##
## [[2]]
##
## [[3]]
We have also done the same analysis as above, but for the aligned coordinate vectors of the CDRH3 structures. However, these vectors are less informative, mostly due to many coordinates being removed when performing sequence alignment on the highly variable CDRH3 regions (as the pdbs object keeps only the C-alpha coordinates of the aligned residues).
#FOR THE CDRH3 REGION
#Steropodon_coordinates with use.pdbs = T
#dim.reduction = 'pca'
%>% Steropodon_coordinates(use.pdbs = T, color.by = c('site', 'age', 'VDJ_vgene'), dim.reduction = 'pca', VGM = VGM) superposed_w_pdbs_CDRH3
## [[1]]
##
## [[2]]
##
## [[3]]
However, when using UMAP for dimensionality reduction and 2D projection, we can notice some distinct clusters forming, with some novel clusters also for each sampling site (first plot).
#FOR THE CDRH3 REGION
#Steropodon_coordinates with use.pdbs = T
#dim.reduction = 'umap'
%>% Steropodon_coordinates(use.pdbs = T, color.by = c('site', 'age', 'VDJ_vgene'), dim.reduction = 'umap', VGM = VGM) superposed_w_pdbs_CDRH3
## [[1]]
##
## [[2]]
##
## [[3]]
We are also interested in the unaligned vector embeddings and how informative they are. However performing PCA on the unaligned vectors of the full-chain structures seems less informative compared to when projecting the aligned vector embeddings as done above.
#FOR THE FULL-CHAIN STRUCTURE
#Steropodon_coordinates with use.pdbs = F
#dim.reduction = 'pca'
%>% Steropodon_coordinates(use.pdbs = F, color.by = c('site', 'age', 'VDJ_vgene'), dim.reduction = 'pca', VGM = VGM) superposed_w_pdbs
## [[1]]
##
## [[2]]
##
## [[3]]
We have done the same for the CDRH3 structures - some clusters appear, yet they are not correlated with our features of interested (sample site and age specifically).
#FOR THE CDRH3 REGION
#Steropodon_coordinates with use.pdbs = F
#dim.reduction = 'pca'
%>% Steropodon_coordinates(structure = 'CDRH3', use.pdbs = F, color.by = c('site', 'age', 'VDJ_vgene'), dim.reduction = 'pca', VGM = VGM) superposed_w_pdbs_CDRH3
## [[1]]
##
## [[2]]
##
## [[3]]
We can also trim the structures before to only plot the variable CDR3 regions. In this case, we will use T-SNE for dimensionality reduction using the Rtsne function - additonal parameters can be inserted via additional.dim.reduction.parameters as a named list. We are also using the aligned vectors in this case and color by site, age, and sample to see if the clusters obtained are correlated with these features of interest.
#ALL CDR REGIONS
#Steropodon_coordinates with use.pdbs = T and trimmed structures
#dim.reduction = 'tsne'
<- superposed_seq_struct %>% #Make sure the structures were superposed before
steropodon_trimmed Steropodon_trim(structure = 'structure', grouping = c('chain', 'region'), specific.values = c('VDJ_CDR1','VDJ_CDR2','VDJ_CDR3','VJ_CDR1','VJ_CDR2','VJ_CDR3'), combine.values = T, combine.groupings = T) %>%
Steropodon_sequence_alignment(structure = 'structure')
## pdb/seq: 1 name: s1.clonotype1.1
## pdb/seq: 2 name: s1.clonotype2.1
## pdb/seq: 3 name: s1.clonotype3.1
## pdb/seq: 4 name: s1.clonotype4.1
## pdb/seq: 5 name: s1.clonotype5.1
## pdb/seq: 6 name: s1.clonotype6.1
## pdb/seq: 7 name: s1.clonotype7.1
## pdb/seq: 8 name: s1.clonotype8.1
## pdb/seq: 9 name: s1.clonotype9.1
## pdb/seq: 10 name: s1.clonotype10.1
## pdb/seq: 11 name: s2.clonotype1.1
## pdb/seq: 12 name: s2.clonotype1.2
## pdb/seq: 13 name: s2.clonotype1.3
## pdb/seq: 14 name: s2.clonotype2.1
## pdb/seq: 15 name: s2.clonotype3.1
## pdb/seq: 16 name: s2.clonotype4.1
## pdb/seq: 17 name: s2.clonotype5.1
## pdb/seq: 18 name: s2.clonotype6.1
## pdb/seq: 19 name: s2.clonotype7.1
## pdb/seq: 20 name: s2.clonotype8.1
## pdb/seq: 21 name: s2.clonotype9.1
## pdb/seq: 22 name: s2.clonotype10.1
## pdb/seq: 23 name: s3.clonotype1.1
## pdb/seq: 24 name: s3.clonotype1.2
## pdb/seq: 25 name: s3.clonotype2.1
## pdb/seq: 26 name: s3.clonotype3.1
## pdb/seq: 27 name: s3.clonotype4.1
## pdb/seq: 28 name: s3.clonotype5.1
## pdb/seq: 29 name: s3.clonotype5.2
## pdb/seq: 30 name: s3.clonotype6.1
## pdb/seq: 31 name: s3.clonotype7.1
## pdb/seq: 32 name: s3.clonotype8.1
## pdb/seq: 33 name: s3.clonotype9.1
## pdb/seq: 34 name: s3.clonotype10.1
## pdb/seq: 35 name: s4.clonotype1.1
## pdb/seq: 36 name: s4.clonotype2.1
## pdb/seq: 37 name: s4.clonotype4.1
## pdb/seq: 38 name: s4.clonotype6.1
## pdb/seq: 39 name: s4.clonotype7.1
## pdb/seq: 40 name: s4.clonotype8.1
## pdb/seq: 41 name: s4.clonotype11.1
## pdb/seq: 42 name: s4.clonotype13.1
## pdb/seq: 43 name: s4.clonotype17.1
## pdb/seq: 44 name: s4.clonotype18.1
## pdb/seq: 45 name: s5.clonotype1.1
## pdb/seq: 46 name: s5.clonotype2.1
## pdb/seq: 47 name: s5.clonotype3.1
## pdb/seq: 48 name: s5.clonotype3.2
## pdb/seq: 49 name: s5.clonotype4.1
## pdb/seq: 50 name: s5.clonotype5.1
## pdb/seq: 51 name: s5.clonotype6.1
## pdb/seq: 52 name: s5.clonotype7.1
## pdb/seq: 53 name: s5.clonotype8.1
## pdb/seq: 54 name: s5.clonotype9.1
## pdb/seq: 55 name: s5.clonotype11.1
## pdb/seq: 56 name: s6.clonotype1.1
## pdb/seq: 57 name: s6.clonotype2.1
## pdb/seq: 58 name: s6.clonotype2.2
## pdb/seq: 59 name: s6.clonotype3.1
## pdb/seq: 60 name: s6.clonotype4.1
## pdb/seq: 61 name: s6.clonotype5.1
## pdb/seq: 62 name: s6.clonotype6.1
## pdb/seq: 63 name: s6.clonotype7.1
## pdb/seq: 64 name: s6.clonotype10.1
## pdb/seq: 65 name: s6.clonotype12.1
## pdb/seq: 66 name: s6.clonotype13.1
%>% Steropodon_coordinates(use.pdbs = T,
steropodon_trimmed color.by = c('site', 'age', 'VDJ_vgene', 'sample_id'),
dim.reduction = 'tsne',
VGM = VGM,
additional.dim.reduction.parameters = list(perplexity = 10)) #lower perplexity value
## [[1]]
##
## [[2]]
##
## [[3]]
##
## [[4]]
Again, we can see that the obtained clusters are mostly correlated with the V gene of each structure, as expected. We can see no specific clustering across sample sites or age for our TNFR2-immunized mice.
If plot.results is set to F, Steropodon_coordinates will output a feature matrix to be used in clustering via Steropodon_cluster. This will be used as the feature.matrix in Steropodon_cluster, as we will see.
<- superposed_w_pdbs %>% Steropodon_coordinates(use.pdbs = T, plot.results = F)
feature_matrix class(feature_matrix)
## [1] "matrix" "array"
Moreover, Steropodon_coordinates supports several feature imputation methods to deal with missing values in the coordinate matrix, controlled by the feature.imputation parameter. This is specifically required as the coordinate vectors obtained for aligned residues are often sparse, resulting in loss of information. For example, feature.imputation = ‘iterative_pca’ implements the imputePCA method from the missMDA R package, which should preserve the overall topology following PCA.
We will use this method on the trimmed regions with only CDR regions, with the aligned coordinate vectors.
#ALL CDR REGIONS
#Steropodon_coordinates with use.pdbs = T and TRIMMED STRUCTURES
#dim.reduction = 'tsne'
#feature.imputation = 'iterative_pca'
<- superposed_seq_struct %>%
steropodon_trimmed Steropodon_trim(structure = 'structure', grouping = c('chain', 'region'), specific.values = c('VDJ_CDR1','VDJ_CDR2','VDJ_CDR3','VJ_CDR1','VJ_CDR2','VJ_CDR3'), combine.values = T, combine.groupings = T) %>%
Steropodon_sequence_alignment(structure = 'structure')
## pdb/seq: 1 name: s1.clonotype1.1
## pdb/seq: 2 name: s1.clonotype2.1
## pdb/seq: 3 name: s1.clonotype3.1
## pdb/seq: 4 name: s1.clonotype4.1
## pdb/seq: 5 name: s1.clonotype5.1
## pdb/seq: 6 name: s1.clonotype6.1
## pdb/seq: 7 name: s1.clonotype7.1
## pdb/seq: 8 name: s1.clonotype8.1
## pdb/seq: 9 name: s1.clonotype9.1
## pdb/seq: 10 name: s1.clonotype10.1
## pdb/seq: 11 name: s2.clonotype1.1
## pdb/seq: 12 name: s2.clonotype1.2
## pdb/seq: 13 name: s2.clonotype1.3
## pdb/seq: 14 name: s2.clonotype2.1
## pdb/seq: 15 name: s2.clonotype3.1
## pdb/seq: 16 name: s2.clonotype4.1
## pdb/seq: 17 name: s2.clonotype5.1
## pdb/seq: 18 name: s2.clonotype6.1
## pdb/seq: 19 name: s2.clonotype7.1
## pdb/seq: 20 name: s2.clonotype8.1
## pdb/seq: 21 name: s2.clonotype9.1
## pdb/seq: 22 name: s2.clonotype10.1
## pdb/seq: 23 name: s3.clonotype1.1
## pdb/seq: 24 name: s3.clonotype1.2
## pdb/seq: 25 name: s3.clonotype2.1
## pdb/seq: 26 name: s3.clonotype3.1
## pdb/seq: 27 name: s3.clonotype4.1
## pdb/seq: 28 name: s3.clonotype5.1
## pdb/seq: 29 name: s3.clonotype5.2
## pdb/seq: 30 name: s3.clonotype6.1
## pdb/seq: 31 name: s3.clonotype7.1
## pdb/seq: 32 name: s3.clonotype8.1
## pdb/seq: 33 name: s3.clonotype9.1
## pdb/seq: 34 name: s3.clonotype10.1
## pdb/seq: 35 name: s4.clonotype1.1
## pdb/seq: 36 name: s4.clonotype2.1
## pdb/seq: 37 name: s4.clonotype4.1
## pdb/seq: 38 name: s4.clonotype6.1
## pdb/seq: 39 name: s4.clonotype7.1
## pdb/seq: 40 name: s4.clonotype8.1
## pdb/seq: 41 name: s4.clonotype11.1
## pdb/seq: 42 name: s4.clonotype13.1
## pdb/seq: 43 name: s4.clonotype17.1
## pdb/seq: 44 name: s4.clonotype18.1
## pdb/seq: 45 name: s5.clonotype1.1
## pdb/seq: 46 name: s5.clonotype2.1
## pdb/seq: 47 name: s5.clonotype3.1
## pdb/seq: 48 name: s5.clonotype3.2
## pdb/seq: 49 name: s5.clonotype4.1
## pdb/seq: 50 name: s5.clonotype5.1
## pdb/seq: 51 name: s5.clonotype6.1
## pdb/seq: 52 name: s5.clonotype7.1
## pdb/seq: 53 name: s5.clonotype8.1
## pdb/seq: 54 name: s5.clonotype9.1
## pdb/seq: 55 name: s5.clonotype11.1
## pdb/seq: 56 name: s6.clonotype1.1
## pdb/seq: 57 name: s6.clonotype2.1
## pdb/seq: 58 name: s6.clonotype2.2
## pdb/seq: 59 name: s6.clonotype3.1
## pdb/seq: 60 name: s6.clonotype4.1
## pdb/seq: 61 name: s6.clonotype5.1
## pdb/seq: 62 name: s6.clonotype6.1
## pdb/seq: 63 name: s6.clonotype7.1
## pdb/seq: 64 name: s6.clonotype10.1
## pdb/seq: 65 name: s6.clonotype12.1
## pdb/seq: 66 name: s6.clonotype13.1
%>% Steropodon_coordinates(use.pdbs = T,
steropodon_trimmed color.by = c('site', 'age', 'VDJ_vgene'),
feature.imputation = 'impute_pca',
dim.reduction = 'tsne',
VGM = VGM,
additional.dim.reduction.parameters = list(perplexity = 10))
## [[1]]
##
## [[2]]
##
## [[3]]
This plot illustrates the best clustering - with two distinct clusters forming depending on the V gene used. However, our other 2 features of interest are still not correlated with the overall receptor structure, especially with the structures of only the CDR regions.
Lastly, the structural features can be integrated with the gene expression values in a multimodal embedding, using Seurat’s weighted nearest neighbours method (https://satijalab.org/seurat/articles/weighted_nearest_neighbor_analysis.html). This can be done withing the Steropodon_multimodal function, by supplying a coordinate matrix per cell/unique barcode (Steropodon_coordinates with per.cell = T). In expanding the Steropodon ecosystem, we plan to include more multimodal integration methods of both structural coordinates and gene expression vectors (as well as learned vector representations for the receptor sequences).
We will first obtain the per-cell feature matrix from the structure PDBs object created earlier - the full-chain structures and aligned coordinate vectors after multiple sequence alignment.
<- superposed_w_pdbs %>% Steropodon_coordinates(use.pdbs = T, plot.results = F, per.cell = T)
per_cell_features
ncol(per_cell_features) # 618 coordinate features per cell
## [1] 618
Next, we will call Steropodon_multimodal on the feature matrix and VGM object. The VGM object must contain the Seurat object as well (VGM[[2]] must be non-null)! The Steropodon_multimodal function will output 3 scatter plots - cells/GEX only (left), structures only (middle), and multimodal (right), colored by the feature specified in color.by (must be found in the VGM object).
The number of variable features selected using Seurat::FindVariableFeatures can be controlled via the n.variable.features parameter - a vector with 2 values, with the first one for the structural variable features, second one for gene expression. The number of PCA dimension to be considered in the weighted nearest-neighbour analysis can be modulated via pca.dims - a vector with 2 values, first for structure PCA, second for GEX. The additional.integration.parameters and additional.clustering.parameters options can also be used for a finer control of the clustering the multimodal embeddings, or for integrating structural and gene expression features, by giving a named list (in the form of list(option_name = option)) as input.
source('Steropodon_multimodal.R')
Steropodon_multimodal(VGM = VGM[[2]], feature.matrix = per_cell_features, multimodal.integration = 'wnn', #only Seurat's weighted nearest-neighbour method is supported
pca.dims = c(10, 20), #10 principal components for structure embeddings, 20 for GEX
n.variable.features = c(100, 2000), #selects 100 variable features for structure vector, 2000 for GEX
color.by = c('site', 'age', 'sample_id', 'seurat_clusters', 'VDJ_vgene')) #if plot.results = F, will output a multimodal Seurat object for clustering/downstream analyses
## Centering and scaling data matrix
## PC_ 1
## Positive: V313, V31, V310, V15, V37, V40, V34, V30, V215, V72
## V316, V9, V3, V43, V6, V321, V46, V78, V177, V218
## V77, V81, V80, V174, V83, V75, V186, V2, V86, V183
## Negative: V261, V258, V264, V42, V267, V315, V312, V255, V252, V213
## V239, V219, V194, V318, V112, V197, V27, V23, V26, V477
## V241, V193, V474, V20, V471, V468, V24, V127, V121, V538
## PC_ 2
## Positive: V124, V160, V164, V163, V121, V502, V24, V159, V580, V165
## V147, V157, V92, V218, V127, V425, V2, V578, V171, V77
## V338, V341, V23, V112, V422, V186, V83, V183, V180, V80
## Negative: V364, V532, V161, V155, V125, V538, V22, V162, V193, V154
## V158, V84, V46, V43, V241, V460, V128, V468, V75, V81
## V471, V477, V190, V316, V197, V9, V194, V6, V72, V474
## PC_ 3
## Positive: V156, V159, V92, V162, V190, V338, V341, V89, V155, V580
## V161, V502, V75, V78, V2, V80, V86, V77, V81, V24
## V158, V83, V3, V6, V84, V9, V163, V425, V218, V321
## Negative: V157, V189, V180, V171, V183, V165, V147, V174, V186, V177
## V538, V364, V164, V532, V160, V316, V477, V154, V313, V310
## V46, V128, V578, V31, V37, V40, V125, V34, V213, V43
## PC_ 4
## Positive: V576, V578, V468, V471, V580, V474, V341, V338, V538, V532
## V428, V477, V158, V364, V86, V83, V160, V80, V30, V128
## V502, V25, V163, V190, V155, V147, V24, V89, V154, V177
## Negative: V574, V84, V460, V582, V431, V165, V81, V22, V425, V75
## V20, V239, V219, V218, V422, V162, V127, V27, V215, V124
## V112, V241, V194, V78, V193, V213, V315, V72, V312, V42
## PC_ 5
## Positive: V582, V428, V341, V338, V574, V190, V84, V538, V163, V160
## V468, V471, V159, V474, V124, V171, V92, V81, V24, V364
## V532, V30, V164, V174, V321, V128, V121, V177, V477, V186
## Negative: V578, V431, V22, V576, V425, V460, V580, V20, V422, V83
## V219, V86, V80, V213, V2, V25, V77, V197, V194, V42
## V125, V239, V189, V127, V252, V218, V161, V3, V6, V9
## Centering and scaling data matrix
## PC_ 1
## Positive: XBP1, CRELD2, SEL1L, LY6C2, MANF, TMEM176B, SDF2L1, PRDX4, HSP90B1, SLPI
## SLC3A2, DERL3, PON3, HSPA5, PDIA4, DNAJC3, HERPUD1, FOS, TXNDC5, EDEM1
## GGH, MT1, RGCC, ISG20, P4HB, TNFRSF17, NDUFA1, JUN, CALR, SLAMF9
## Negative: H2-AA, H2-DMB2, H2-EB1, LY86, APOE, TMSB4X, FCMR, LTB, H2-OA, H2-OB
## CD19, BLK, SIGLECG, CORO1A, H2-AB1, PRKCB, CTSH, NAPSA, H2-DMA, CD72
## SH3BP5, RIPOR2, FCRL5, EBF1, AFF3, BTG1, GIMAP3, FAM3C, PAX5, TMSB10
## PC_ 2
## Positive: PCLAF, BIRC5, STMN1, CCNA2, RRM2, CDCA8, TOP2A, SPC24, CDK1, UBE2C
## CCNB2, CDCA3, TPX2, HMMR, KIF22, NUSAP1, AURKB, SHCBP1, KNL1, KIF11
## NEIL3, CDKN3, HIST1H2AE, CENPF, CDCA2, HMGB2, PBK, MKI67, STIL, CDC20
## Negative: APOE, LTB, ARHGAP24, FCMR, BTG1, BCL2, CD79A, H2-OA, BE692007, IMMP2L
## PLAC8, FCRL5, SNN, MYO1E, CD72, LY86, CCR1, CCR7, PLBD1, BLK
## BANK1, FCER1G, GM31718, ALDH2, AFF3, MS4A6C, MARCH1, ZEB2, NR3C2, 1810046K07RIK
## PC_ 3
## Positive: PRG2, CD79A, SHISA5, IL2RG, FAM214A, SLC3A2, UBC, ST6GAL1, MZB1, TNFRSF17
## SERPINA3G, RILPL2, NFKBIA, SELENOP, CD69, TXNDC5, JUNB, ZFP36, MAN1A, HERPUD1
## ARHGDIB, PTPN6, FOS, CD28, JUN, SQSTM1, DENND4A, GGH, SUB1, EGR1
## Negative: KLRA17, NBEA, PGAM2, RPIA, GPRC5B, CLDN7, OPCML, FILIP1, PGM2L1, CCDC68
## PHACTR3, CCND2, 2300002M23RIK, AKR1B3, CPED1, SPON1, NEGR1, OVOL3, COLCA2, PLD4
## ERMN, F11R, AI427809, COX4I2, SNTG2, CSN3, HSD11B1, RPL37A, LY6C2, WDR49
## PC_ 4
## Positive: JUNB, JUN, ZFP36, FOSB, FOS, IER2, DUSP1, RHOB, NEGR1, JMJD1C
## KLF6, MALAT1, LNCPINT, CLIC4, NFKBIZ, CD274, IRF1, IL5RA, LY6C2, PPP1R15A
## EPCAM, ZFP36L1, GADD45B, BTG2, BCAR3, PDE4B, CDH2, SPON1, DNAJB1, UBC
## Negative: MZB1, RPS26, ATF5, RPL37, RPLP2, COX6A2, PRG2, HSP90B1, RPL35, GLIPR1
## RPS21, PCK2, EEF1B2, SHMT2, RPS29, RPS19, PSAT1, RPS27, RPL39, RPL36A
## RPS2, MTHFD2, RPS28, RPL36, EIF4EBP1, SUB1, ASNS, ST6GAL1, CHAC1, RPS12
## PC_ 5
## Positive: SCIMP, GM4316, ST6GAL1, FGL2, PIK3R5, FAS, GIMAP7, GNG3, CD2, CXCR3
## MALAT1, CRIP1, IL2RB, CMAH, JAKMIP1, PRF1, PYCARD, VIM, SAMHD1, TRP53INP1
## SERINC5, TEX14, PITPNC1, HVCN1, OSBPL3, FLNA, RRBP1, A, GLIPR1, S1PR4
## Negative: RPS2, LY6D, RPS12, CBLC, MS4A6C, GGH, EEF1B2, AXL, HSP90AB1, RPS18
## MARCKSL1, CD72, HSPA8, SLC3A2, TNFRSF17, FCRL5, RPL36, PPIA, EIF3F, EIF4EBP1
## RPS26, RPS19, RPL36A, EBI3, NFKBIE, ASNS, NPM1, 1810046K07RIK, MTHFD2, RPL35
## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session
## 15:05:39 UMAP embedding parameters a = 0.9922 b = 1.112
## 15:05:39 Read 3795 rows and found 10 numeric columns
## 15:05:39 Using Annoy for neighbor search, n_neighbors = 30
## 15:05:39 Building Annoy index with metric = cosine, n_trees = 50
## 0% 10 20 30 40 50 60 70 80 90 100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 15:05:39 Writing NN index file to temp file /var/folders/4v/1r1vx0ws03n62xksn8vv49wr0000gn/T//RtmpDKKhuF/file1734231bca72c
## 15:05:39 Searching Annoy index using 1 thread, search_k = 3000
## 15:05:40 Annoy recall = 22.19%
## 15:05:40 Commencing smooth kNN distance calibration using 1 thread
## 15:05:40 3791 smooth knn distance failures
## 15:05:41 Found 15 connected components, falling back to 'spca' initialization with init_sdev = 1
## 15:05:41 Initializing from PCA
## 15:05:41 Using 'irlba' for PCA
## 15:05:41 PCA: 2 components explained 72.29% variance
## 15:05:41 Commencing optimization for 500 epochs, with 200542 positive edges
## 15:06:13 Optimization finished
## 15:06:13 UMAP embedding parameters a = 0.9922 b = 1.112
## 15:06:13 Read 3795 rows and found 20 numeric columns
## 15:06:13 Using Annoy for neighbor search, n_neighbors = 30
## 15:06:13 Building Annoy index with metric = cosine, n_trees = 50
## 0% 10 20 30 40 50 60 70 80 90 100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 15:06:13 Writing NN index file to temp file /var/folders/4v/1r1vx0ws03n62xksn8vv49wr0000gn/T//RtmpDKKhuF/file1734258835336
## 15:06:13 Searching Annoy index using 1 thread, search_k = 3000
## 15:06:14 Annoy recall = 100%
## 15:06:14 Commencing smooth kNN distance calibration using 1 thread
## 15:06:14 Initializing from normalized Laplacian + noise
## 15:06:14 Commencing optimization for 500 epochs, with 149618 positive edges
## 15:06:20 Optimization finished
## Calculating cell-specific modality weights
## Finding 20 nearest neighbors for each modality.
## Warning: Keys should be one or more alphanumeric characters followed by an
## underscore, setting key from normstruct.pca_ to normstructpca_
## Warning: All keys should be one or more alphanumeric characters followed by an
## underscore '_', setting key to normstructpca_
## Warning: Keys should be one or more alphanumeric characters followed by an
## underscore, setting key from normseurat.pca_ to normseuratpca_
## Warning: All keys should be one or more alphanumeric characters followed by an
## underscore '_', setting key to normseuratpca_
## Calculating kernel bandwidths
## Finding multimodal neighbors
## Constructing multimodal KNN graph
## Constructing multimodal SNN graph
## Warning: Invalid name supplied, making object name syntactically valid. New
## object name is X; see ?make.names for more details on syntax validity
## 15:07:26 UMAP embedding parameters a = 0.9922 b = 1.112
## 15:07:27 Commencing smooth kNN distance calibration using 1 thread
## 15:07:27 297 smooth knn distance failures
## 15:07:28 Initializing from normalized Laplacian + noise
## 15:07:28 Commencing optimization for 500 epochs, with 115820 positive edges
## 15:07:33 Optimization finished
## [[1]]
##
## [[2]]
##
## [[3]]
##
## [[4]]
##
## [[5]]
##
## [[6]]
We can see how the multimodal integration refines the overall cell embeddings when considering all the features highlighted (site, age, sample, transcriptomic clusters, and V genes). Surprisingly, it appears that the Seurat/transcriptomic clusters are more clearly defined in the multimodal embedding, and that structural clustering on the coordinate vectors is also correlated with transcriptome identity (e.g., clusters 10 and 11 clearly separating from the largest mass of points when comparing the right side plot to the left side, colored by the Seurat cluster identity obtained for the gene expression of all 22 TNFR2 samples). This illustrates a tight correlation between the structural features of a receptor and the gene expression patterns of its corresponding immune cell.
In the coordinate scatter plots of the trimmed structures (with only CDR3 region), we can identify some clearly defined clusters. These can be highlighted by clustering on the 2D reduced embeddings in the Steropodon_coordinates, by setting the cluster.method parameter to a clustering method from the bluster R package, such as bluster::HclustParam.
<- superposed_seq_struct %>% #Make sure the structures were superposed before
steropodon_trimmed Steropodon_trim(structure = 'structure', grouping = c('chain', 'region'), specific.values = c('VDJ_CDR1','VDJ_CDR2','VDJ_CDR3','VJ_CDR1','VJ_CDR2','VJ_CDR3'), combine.values = T, combine.groupings = T) %>%
Steropodon_sequence_alignment(structure = 'structure')
## pdb/seq: 1 name: s1.clonotype1.1
## pdb/seq: 2 name: s1.clonotype2.1
## pdb/seq: 3 name: s1.clonotype3.1
## pdb/seq: 4 name: s1.clonotype4.1
## pdb/seq: 5 name: s1.clonotype5.1
## pdb/seq: 6 name: s1.clonotype6.1
## pdb/seq: 7 name: s1.clonotype7.1
## pdb/seq: 8 name: s1.clonotype8.1
## pdb/seq: 9 name: s1.clonotype9.1
## pdb/seq: 10 name: s1.clonotype10.1
## pdb/seq: 11 name: s2.clonotype1.1
## pdb/seq: 12 name: s2.clonotype1.2
## pdb/seq: 13 name: s2.clonotype1.3
## pdb/seq: 14 name: s2.clonotype2.1
## pdb/seq: 15 name: s2.clonotype3.1
## pdb/seq: 16 name: s2.clonotype4.1
## pdb/seq: 17 name: s2.clonotype5.1
## pdb/seq: 18 name: s2.clonotype6.1
## pdb/seq: 19 name: s2.clonotype7.1
## pdb/seq: 20 name: s2.clonotype8.1
## pdb/seq: 21 name: s2.clonotype9.1
## pdb/seq: 22 name: s2.clonotype10.1
## pdb/seq: 23 name: s3.clonotype1.1
## pdb/seq: 24 name: s3.clonotype1.2
## pdb/seq: 25 name: s3.clonotype2.1
## pdb/seq: 26 name: s3.clonotype3.1
## pdb/seq: 27 name: s3.clonotype4.1
## pdb/seq: 28 name: s3.clonotype5.1
## pdb/seq: 29 name: s3.clonotype5.2
## pdb/seq: 30 name: s3.clonotype6.1
## pdb/seq: 31 name: s3.clonotype7.1
## pdb/seq: 32 name: s3.clonotype8.1
## pdb/seq: 33 name: s3.clonotype9.1
## pdb/seq: 34 name: s3.clonotype10.1
## pdb/seq: 35 name: s4.clonotype1.1
## pdb/seq: 36 name: s4.clonotype2.1
## pdb/seq: 37 name: s4.clonotype4.1
## pdb/seq: 38 name: s4.clonotype6.1
## pdb/seq: 39 name: s4.clonotype7.1
## pdb/seq: 40 name: s4.clonotype8.1
## pdb/seq: 41 name: s4.clonotype11.1
## pdb/seq: 42 name: s4.clonotype13.1
## pdb/seq: 43 name: s4.clonotype17.1
## pdb/seq: 44 name: s4.clonotype18.1
## pdb/seq: 45 name: s5.clonotype1.1
## pdb/seq: 46 name: s5.clonotype2.1
## pdb/seq: 47 name: s5.clonotype3.1
## pdb/seq: 48 name: s5.clonotype3.2
## pdb/seq: 49 name: s5.clonotype4.1
## pdb/seq: 50 name: s5.clonotype5.1
## pdb/seq: 51 name: s5.clonotype6.1
## pdb/seq: 52 name: s5.clonotype7.1
## pdb/seq: 53 name: s5.clonotype8.1
## pdb/seq: 54 name: s5.clonotype9.1
## pdb/seq: 55 name: s5.clonotype11.1
## pdb/seq: 56 name: s6.clonotype1.1
## pdb/seq: 57 name: s6.clonotype2.1
## pdb/seq: 58 name: s6.clonotype2.2
## pdb/seq: 59 name: s6.clonotype3.1
## pdb/seq: 60 name: s6.clonotype4.1
## pdb/seq: 61 name: s6.clonotype5.1
## pdb/seq: 62 name: s6.clonotype6.1
## pdb/seq: 63 name: s6.clonotype7.1
## pdb/seq: 64 name: s6.clonotype10.1
## pdb/seq: 65 name: s6.clonotype12.1
## pdb/seq: 66 name: s6.clonotype13.1
%>% Steropodon_coordinates(use.pdbs = T,
steropodon_trimmed color.by = c('site', 'age', 'VDJ_vgene', 'sample_id'),
cluster.method = bluster::HclustParam(),
dim.reduction = 'tsne',
VGM = VGM,
additional.dim.reduction.parameters = list(perplexity = 10)) #last plot will be the clustered one using the hierarchical clustering method from bluster
## [[1]]
##
## [[2]]
##
## [[3]]
##
## [[4]]
##
## [[5]]
Moreover, the function Steropodon_cluster integrates multiple clustering methods on both the feature matrices obtained from Steropodon_coordinates, as well as the distance matrices from Steropodon_distances. These cluster annotations can then be added to the VGM object via Steropodon_annotate_VGM to be used for downstream analyses (e.g., plotting clusters using Seurat::DimPlot() or diversity analyses via Platypus::VDJ_diversity() grouped by structural clusters).
We will first obtain a distance matrix and a feature matrix from the trimmed Steropodon objects
<- superposed_seq_struct %>% #Make sure the structures were superposed before
steropodon_trimmed Steropodon_trim(structure = 'structure', grouping = c('chain', 'region'), specific.values = c('VDJ_CDR1','VDJ_CDR2','VDJ_CDR3','VJ_CDR1','VJ_CDR2','VJ_CDR3'), combine.values = T, combine.groupings = T) %>%
Steropodon_sequence_alignment(structure = 'structure')
## pdb/seq: 1 name: s1.clonotype1.1
## pdb/seq: 2 name: s1.clonotype2.1
## pdb/seq: 3 name: s1.clonotype3.1
## pdb/seq: 4 name: s1.clonotype4.1
## pdb/seq: 5 name: s1.clonotype5.1
## pdb/seq: 6 name: s1.clonotype6.1
## pdb/seq: 7 name: s1.clonotype7.1
## pdb/seq: 8 name: s1.clonotype8.1
## pdb/seq: 9 name: s1.clonotype9.1
## pdb/seq: 10 name: s1.clonotype10.1
## pdb/seq: 11 name: s2.clonotype1.1
## pdb/seq: 12 name: s2.clonotype1.2
## pdb/seq: 13 name: s2.clonotype1.3
## pdb/seq: 14 name: s2.clonotype2.1
## pdb/seq: 15 name: s2.clonotype3.1
## pdb/seq: 16 name: s2.clonotype4.1
## pdb/seq: 17 name: s2.clonotype5.1
## pdb/seq: 18 name: s2.clonotype6.1
## pdb/seq: 19 name: s2.clonotype7.1
## pdb/seq: 20 name: s2.clonotype8.1
## pdb/seq: 21 name: s2.clonotype9.1
## pdb/seq: 22 name: s2.clonotype10.1
## pdb/seq: 23 name: s3.clonotype1.1
## pdb/seq: 24 name: s3.clonotype1.2
## pdb/seq: 25 name: s3.clonotype2.1
## pdb/seq: 26 name: s3.clonotype3.1
## pdb/seq: 27 name: s3.clonotype4.1
## pdb/seq: 28 name: s3.clonotype5.1
## pdb/seq: 29 name: s3.clonotype5.2
## pdb/seq: 30 name: s3.clonotype6.1
## pdb/seq: 31 name: s3.clonotype7.1
## pdb/seq: 32 name: s3.clonotype8.1
## pdb/seq: 33 name: s3.clonotype9.1
## pdb/seq: 34 name: s3.clonotype10.1
## pdb/seq: 35 name: s4.clonotype1.1
## pdb/seq: 36 name: s4.clonotype2.1
## pdb/seq: 37 name: s4.clonotype4.1
## pdb/seq: 38 name: s4.clonotype6.1
## pdb/seq: 39 name: s4.clonotype7.1
## pdb/seq: 40 name: s4.clonotype8.1
## pdb/seq: 41 name: s4.clonotype11.1
## pdb/seq: 42 name: s4.clonotype13.1
## pdb/seq: 43 name: s4.clonotype17.1
## pdb/seq: 44 name: s4.clonotype18.1
## pdb/seq: 45 name: s5.clonotype1.1
## pdb/seq: 46 name: s5.clonotype2.1
## pdb/seq: 47 name: s5.clonotype3.1
## pdb/seq: 48 name: s5.clonotype3.2
## pdb/seq: 49 name: s5.clonotype4.1
## pdb/seq: 50 name: s5.clonotype5.1
## pdb/seq: 51 name: s5.clonotype6.1
## pdb/seq: 52 name: s5.clonotype7.1
## pdb/seq: 53 name: s5.clonotype8.1
## pdb/seq: 54 name: s5.clonotype9.1
## pdb/seq: 55 name: s5.clonotype11.1
## pdb/seq: 56 name: s6.clonotype1.1
## pdb/seq: 57 name: s6.clonotype2.1
## pdb/seq: 58 name: s6.clonotype2.2
## pdb/seq: 59 name: s6.clonotype3.1
## pdb/seq: 60 name: s6.clonotype4.1
## pdb/seq: 61 name: s6.clonotype5.1
## pdb/seq: 62 name: s6.clonotype6.1
## pdb/seq: 63 name: s6.clonotype7.1
## pdb/seq: 64 name: s6.clonotype10.1
## pdb/seq: 65 name: s6.clonotype12.1
## pdb/seq: 66 name: s6.clonotype13.1
#Get the feature matrix
<- steropodon_trimmed %>% Steropodon_coordinates(use.pdbs = T, #use.pdbs = T will use the aligned coordinate vectors
feature_matrix plot.results = F
)
#Get the distance matrix
<- steropodon_trimmed %>% Steropodon_distances(distance.metric = 'rmsd',
distance_matrix plot.results = F
)
Next we will perform clustering using Steropodon_cluster and compare the number of unique clusters from each matrix. For clustering on the feature matrix, we need to provide a bluster clustering method in the cluster.method parameter (by default will use bluster::HclustParam), whereas for the distance matrix, we need to either specify ‘hclust’ or ‘medoids’ for the currently implemented algorithms. PCA or other dimensionality reduction methods can be applied before clustering, if dim.reduction is ‘pca’, ‘tsne’, or ‘umap’.
Clustering results (either scatter plots for feature matrices or dendrograms for distance matrices) can be plotted if plot.results is set to T.
Lastly, the output of Steropodon_cluster is the nested list of Steropodon objects given as input, with a new cluster slot added.
#Clustering on the feature matrix
<- Steropodon_cluster(steropodon.object = steropodon_trimmed,
feature_clusters feature.matrix = feature_matrix,
dim.reduction = 'tsne',
plot.dim.reduction = 'tsne', #dim reduction method for scatter plots
cluster.method = bluster::HclustParam,
additional.dim.reduction.parameters = list(perplexity = 10),
plot.results = T
)
$s1$clonotype1$`1` feature_clusters
## Steropodon object
## Sample: s1 Clonotype: clonotype1 Rank: 1
## Number of cells: 587
## Structures available: main, PDBs, H, L, CDRH3, CDRL3, cluster
## Sequence: EVKLVESEGGLVQPGSSMKLSCTASGFTFSDYYMAWVRQVPEKGLEWVANINYDGSSTYYLDSLKSRFIISRDNAKNILYLQMSSLKSEDTATYYCARDLDYFDYWGQGTTLTVSSDVVVTQTPLSLPVSFGDQVSISCRSSQSLANSYGNTYLSWYLHKPGQSPQLLIYGISNRFSGVPDRFSGSGSGTDFTLKISTIKPEDLGMYYCLQGTHQPRTFGGGTKLEIK
Hierarchical clustering on a distance matrix will output the resulting dendrogram. 3 clusters are defined, apparently not correlated with the sample identity.
#Clustering on the distance matrix
<- Steropodon_cluster(steropodon.object = steropodon_trimmed,
distance_clusters distance.matrix = distance_matrix,
cluster.method = 'hclust',
plot.results = T
)
The ‘cluster’ slots contains the cluster ID for each structure.
$s1$clonotype1$`1`@cluster distance_clusters
## [1] 1
The cluster annotations can be included in the VGM object using Steropodon_annotate_VGM. This function can also be used to add the invariant core sequence, epitope, or paratope information in both the VDJ and GEX data frames.
<- Steropodon_annotate_VGM(steropodon.object = feature_clusters,
VGM_feature_clusters VGM = VGM)
<- Steropodon_annotate_VGM(steropodon.object = distance_clusters,
VGM_distance_clusters VGM = VGM)
1]]$structure_cluster[1:50] VGM_distance_clusters[[
## [1] 2 1 1 1 NA NA NA NA NA NA 1 NA NA NA NA NA NA NA NA NA NA NA 1 NA NA
## [26] 1 NA NA NA NA NA NA 1 NA NA NA NA NA NA 2 NA NA 1 1 2 NA NA 1 NA 1
Lastly, these cluster annotations can be used in all downstream analyses available in the Platypus ecosystem or in Seurat. We might be interested, for example, to see if the structural cluster identity is also contained withing the UMAP projection of transcriptomic features (UMAP of all cells corresponding to a structure from our steropodon_igfold object).
<- Seurat::DimPlot(VGM_feature_clusters[[2]], group.by = 'structure_cluster')
p1 <- Seurat::DimPlot(VGM_distance_clusters[[2]], group.by = 'structure_cluster')
p2 + p2 p1
We might also be interested in investigating the CDR3 sequence diversity across structural clusters, using the VDJ_diversity function from the Platpyus package.
<- VDJ_diversity(VDJ = VGM_feature_clusters[[1]], feature.columns = c('VDJ_cdr3s_aa', 'VJ_cdr3s_aa'), grouping.column = 'structure_cluster', metric = 'richness') p1
## Used group 2 as a reference for subsampling with 882 entries
## Subsampled group 1 to 882 entries
## Used group NA as a reference for subsampling with 882 entries
<- VDJ_diversity(VDJ = VGM_distance_clusters[[1]], feature.columns = c('VDJ_cdr3s_aa', 'VJ_cdr3s_aa'), grouping.column = 'structure_cluster', metric = 'richness') p2
## Subsampled group 2 to 330 entries
## Subsampled group 1 to 330 entries
## Used group NA as a reference for subsampling with 330 entries
## Used group 3 as a reference for subsampling with 330 entries
+ p2 p1
## Warning: Removed 1 rows containing missing values (position_stack).
## 10. Calculating physicochemical properties via Steropodon_properties The Steropodon_properties function can be used to calculate physicochemical and developability parameters such as the solvent-accessible surface area (SASA), charge, hydrophobicity, pKa, as well as obtain secondary structure annotations. The resulting Steropodon object will have a modified bio3d pdb object, with new columns corresponding to the aforementioned properties. More calculators/properties will be added!
Properties to be calculated can be specified via properties. For pKa calculation, Steropodon_properties uses the propka tool - its directory must be included in propka.directory. For secondary structure annotations, Steropodon_properties requires the DSSP tool and its exefile in dssp.exefile (mkdssp exe, https://manpages.ubuntu.com/manpages/xenial/man1/dssp.1.html).
<- steropodon_igfold %>% Steropodon_properties(structure = 'structure',
steropodon_properties properties = c('SASA', 'charge', 'hydrophobicity', 'DSSP'),
dssp.exefile = '/opt/homebrew/Caskroom/miniforge/base/bin/mkdssp',
parallel = F
)
## MAC system detected
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
## /opt/homebrew/Caskroom/miniforge/base/bin/mkdssp
Moreover, structural properties can be visualized via Steropodon_visualize.
Visualizing charge:
#CHARGE
$s1$clonotype1$`1` %>% Steropodon_visualize(color.by = 'charge') steropodon_properties
Visualizing hydrophobicity:
#HYDROPHOBICITY
$s1$clonotype1$`1` %>% Steropodon_visualize(color.by = 'hydrophobicity') steropodon_properties
Visualizing SASA:
#SASA
$s1$clonotype1$`1` %>% Steropodon_visualize(color.by = 'SASA') steropodon_properties
Coloring structures by the secondary structure annotations:
#Secondary structure DSSP naming
$s1$clonotype1$`1` %>% Steropodon_visualize(color.by = 'sse', specific.color.values = NULL) steropodon_properties
Properties and other metrics can be plotted using Steropodon_metrics: the resulting plots can be either bar plots of number of residues per regions (determined by the feature and grouping parameters, if plot.format = ‘bar’), or line plots of continuous values per ordered residues in the structure (if plot.format = ‘line’).
For bar plots, residues can be counted at the level specified by plot.level: plot.level = ‘global’ to count across all objects, plot.level = ‘sample’ to group structures and count by sample, plot.level = ‘clonotype’ to group structures and count by both sample and clonotype. The feature parameter determines the features we are interested in getting the residue counts for, whereas the grouping parameter imposes a primary grouping: e.g., to count the number of residues per region, grouped by chain, we set feature = ‘region’ and grouping = ‘chain’.
This function was designed to be highly flexible and tailored to the analysis interest of each user.
%>% Steropodon_metrics(plot.format = 'bar',
steropodon_igfold feature = 'region',
grouping = 'chain',
proportions = F,
plot.level = 'global'
)
## $global
We can get proportions for the residue counts above by setting proportions to T (proportions of variable regions across each group = across each chain).
%>% Steropodon_metrics(plot.format = 'bar',
steropodon_igfold feature = 'region',
grouping = 'chain',
proportions = T,
plot.level = 'global'
)
## $global
We can also get the proportions of residues per region across other grouping factors obtained via Steropodon, such as the core column from Steropodon_find_core.
%>% Steropodon_metrics(plot.format = 'bar',
steropodon_igfold_core feature = 'region',
grouping = 'core',
proportions = T,
plot.level = 'global'
)
## $global
If the feature parameter is given a column/feature name for continuous features, then it will calculate the mean of that feature across the values determined by the grouping parameter. For example, we can see the B-value averages across the variable regions, pooled for all structures in the nested list of Steropodon objects (plot.level = ‘global’).
%>% Steropodon_metrics(plot.format = 'bar',
steropodon_igfold feature = 'b',
grouping = 'region',
plot.level = 'global'
)
## $global
To get the mean of B-values across regions pooled across each sample, we can set plot.level to ‘sample’.
%>% Steropodon_metrics(plot.format = 'bar',
steropodon_igfold feature = 'b',
grouping = 'region',
plot.level = 'sample'
)
## $s1
##
## $s2
##
## $s3
##
## $s4
##
## $s5
##
## $s6
Line plots are useful to visualize variations in continuous features across the ordered residues (x-axis). For example, we can see how the charge property we calculated via Steropodon_properties varies across all structures (plot.level = ‘global’ will average across all structures, per each residue number).
%>% Steropodon_metrics(plot.format = 'line',
steropodon_properties feature = 'charge',
plot.level = 'global'
)
## [[1]]
If plot.level is not set to ‘global’, we can obtain a list of plots per sample/ samples and clonotypes.
%>% Steropodon_metrics(plot.format = 'line',
steropodon_properties feature = 'charge',
plot.level = 'sample',
combine.lineplot = F
)
## [[1]]
##
## [[2]]
##
## [[3]]
##
## [[4]]
##
## [[5]]
##
## [[6]]
Lastly, if combine.lineplot is set to T, the line plots from above are combined into a single plot and colored by the factor in plot.level (sample in this case).
%>% Steropodon_metrics(plot.format = 'line',
steropodon_properties feature = 'charge',
plot.level = 'sample',
combine.lineplot = T
)
## [[1]]
Same as above, but for B-values.
%>% Steropodon_metrics(plot.format = 'line',
steropodon_properties feature = 'b',
plot.level = 'sample',
combine.lineplot = T
)
## [[1]]
## 12. Docking structures via Steropodon_dock We can obtain receptor-ligand complexes via docking (with ZDOCK) using Steropodon_dock. This requires a single Steropodon object or a nested list as obtained from the Steropodon_model function, as well as the PDB ID or PDB file of the ligand we wish to dock. In this case, we will try docking the experimental structure of the tumor necrosis factor used in immunizing our mice - PDB ID = 2tnf.
Docking is computationally extensive: to enable faster docking, we have parallelized the docking call - if parallel is set to T. In this vignette, we will dock a single structure.
Lastly, unique options for each docking tool can be specified via additional.docking.params as a named list. The parameters for each docking tool are well described in their respective documentations - as only ZDOCK is supported currently, these options are avaialable for further modulation: zdock.n.predictions, zdock.specific.regions.to.mask, zdock.region.columns, zdock.fixed.receptor.
<- steropodon_igfold$s1$clonotype1$`1` %>% Steropodon_dock(docking.tool = 'zdock',
steropodon_docked tool.directory = '/Users/tudorcotet/Desktop/zdock', #Must specify the ZDOCK directory with the executables
ligand = '2tnf', #Either a PDB ID or a path to a PDB file
ligand.name = 'TNFR2',
structure = 'structure', #Will dock the full-chain structures
additional.docking.params = list(zdock.n.predictions = 10,
zdock.fixed.receptor = T),
parallel = F
)
## MAC system detected
## Note: Accessing on-line PDB file
## Warning in bio3d::read.pdb(file_name): duplicated element numbers ('eleno')
## detected
steropodon_docked
## $s1
## $s1$clonotype1
## $s1$clonotype1$`1`
## Steropodon object
## Sample: s1 Clonotype: clonotype1 Rank: 1
## Number of cells: 587
## Structures available: main, complex, antigen, H, L, CDRH3, CDRL3
## Sequence: EVKLVESEGGLVQPGSSMKLSCTASGFTFSDYYMAWVRQVPEKGLEWVANINYDGSSTYYLDSLKSRFIISRDNAKNILYLQMSSLKSEDTATYYCARDLDYFDYWGQGTTLTVSSDVVVTQTPLSLPVSFGDQVSISCRSSQSLANSYGNTYLSWYLHKPGQSPQLLIYGISNRFSGVPDRFSGSGSGTDFTLKISTIKPEDLGMYYCLQGTHQPRTFGGGTKLEIK
%>% Steropodon_visualize(structure = 'complex', color.by = 'chain', specific.color.values = NULL) steropodon_docked
##13. Obtaining binding interfaces via Steropodon_interface Steropodon_interface can be used to determine the epitope and paratope regions from a complex structure (obtainef either form Steropodon_dock or from Steropodon_model using ColabFold). The distance.threshold parameter sets the maximum ångström distance for the contact surface.
This function will add a new epitope and paratope slot for the respective structures, as well as annotate the paratope region on the main structure.
<- steropodon_docked %>% Steropodon_interface(structure = 'structure', distance.threshold = 5) steropodon_docked_interface
We can visualize the docked structure from the ‘complex’ slot (receptor-ligand) and color it by the newly added ‘interface’ feature from Steropodon_interface.
%>% Steropodon_visualize(structure = 'complex', color.by = 'interface', specific.color.values = NULL) steropodon_docked_interface
The same ‘interface’ feature was also added to the main structure (correspoding to the paratope in this case).
%>% Steropodon_visualize(structure = 'structure', color.by = 'interface', specific.color.values = NULL) steropodon_docked_interface
Paratopes and epitopes have also been isolated from the complex structure for a better visualization of each of these structures of interest. We can visualize the paratope and annotate it by the CDR regions.
%>% Steropodon_visualize(structure = 'paratope', specific.color.values = NULL) steropodon_docked_interface
Or we can visualize the epitope and color it by the unique residues.
%>% Steropodon_visualize(structure = 'epitope', specific.color.values = NULL, color.by = 'resid') steropodon_docked_interface
Lastly, Steropodon_annotate will add the paratope and epitope sequences as new columns in the VGM object, to be used in any downstream analysis available in the Platypus ecosystem.
<- Steropodon_annotate_VGM(steropodon.object = steropodon_docked_interface, VGM = VGM) VGM_w_interface
unique(VGM_w_interface[[1]]$paratope)
## [1] NA "PVSFGDRSTIKPEEI"
unique(VGM_w_interface[[1]]$epitope)
## [1] NA "SDKPPADYLSDKPPADYLSDKPPADYL"
Structures can be refined using OpenMM (fixing interactions, adding H atoms), whereas CDR loops can be folded again via ABlooper (https://github.com/oxpig/ABlooper) in the Steropodon_refine function (refining.tools = ‘openmm’ and refining.tools = ‘ablooper’, respectively). Both of these options will modify the structures in the nested list of Steropodon objects specified in the structure paramater (e.g., structure = ‘H’ to modify the heavy chains, ‘structure’ to modify the main/full-chain structure). However, ABlooper requires both chains to accurately refine the CDR loops!
Refining structures via OpenMM:
<- steropodon_igfold$s1$clonotype1$`1` %>% Steropodon_refine(refining.tools = 'openmm', parallel = F, env.name = 'vignette_env', use.conda = T) steropodon_refined_openmm
## MAC system detected
#H atoms were added
head(steropodon_refined_openmm$s1$clonotype1$`1`@structure$atom)
## type eleno elety alt resid chain resno insert x y z o b segid
## 1 ATOM 1 N <NA> GLU VDJ 2 <NA> -17.203 8.596 -1.990 1 0 <NA>
## 2 ATOM 1 N <NA> GLU VDJ 2 <NA> -17.203 8.596 -1.990 1 0 <NA>
## 3 ATOM 1 N <NA> GLU VDJ 2 <NA> -17.203 8.596 -1.990 1 0 <NA>
## 4 ATOM 1 N <NA> GLU VDJ 2 <NA> -17.203 8.596 -1.990 1 0 <NA>
## 5 ATOM 1 N <NA> GLU VDJ 2 <NA> -17.203 8.596 -1.990 1 0 <NA>
## 6 ATOM 2 H <NA> GLU VDJ 2 <NA> -17.593 9.514 -2.143 1 0 <NA>
## elesy charge region
## 1 N <NA> FR1
## 2 N <NA> FR1
## 3 N <NA> FR1
## 4 N <NA> FR1
## 5 N <NA> FR1
## 6 H <NA> FR1
Lastly, structures from a nested list of Steropodon objects can be saved as PDB files and as annotated CSV files (if save.full is TRUE). The structure slot/type can be determined from the structure parameter.
%>% Steropodon_save(structure = 'structure',
steropodon_igfold save.dir = 'vignette_saved',
save.full = T
)
sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Monterey 12.5.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] reticulate_1.25 foreach_1.5.2 forcats_0.5.1 stringr_1.4.0
## [5] dplyr_1.0.7 purrr_0.3.4 readr_2.1.1 tidyr_1.1.4
## [9] tibble_3.1.6 ggplot2_3.3.5 tidyverse_1.3.1 Platypus_3.3.5
##
## loaded via a namespace (and not attached):
## [1] readxl_1.3.1 backports_1.4.1 plyr_1.8.6
## [4] igraph_1.3.1 lazyeval_0.2.2 splines_4.1.2
## [7] BiocParallel_1.28.3 listenv_0.8.0 scattermore_0.7
## [10] digest_0.6.29 htmltools_0.5.2 viridis_0.6.2
## [13] fansi_1.0.2 magrittr_2.0.3 doParallel_1.0.17
## [16] tensor_1.5 cluster_2.1.4 bigassertr_0.1.5
## [19] bigstatsr_1.5.6 ROCR_1.0-11 openxlsx_4.2.5
## [22] tzdb_0.2.0 fastcluster_1.2.3 globals_0.14.0
## [25] modelr_0.1.8 matrixStats_0.61.0 askpass_1.1
## [28] spatstat.sparse_2.1-0 vanddraabe_1.1.1 colorspace_2.0-2
## [31] rvest_1.0.2 ggrepel_0.9.1 haven_2.4.3
## [34] xfun_0.29 crayon_1.4.2 jsonlite_1.8.0
## [37] spatstat.data_2.1-2 flock_0.7 survival_3.2-13
## [40] zoo_1.8-9 iterators_1.0.14 glue_1.6.2
## [43] polyclip_1.10-0 gtable_0.3.0 seqinr_4.2-8
## [46] leiden_0.3.9 future.apply_1.8.1 BiocGenerics_0.40.0
## [49] abind_1.4-5 scales_1.1.1 pheatmap_1.0.12
## [52] bigparallelr_0.3.2 bio3d_2.4-3 DBI_1.1.2
## [55] Peptides_2.4.4 miniUI_0.1.1.1 Rcpp_1.0.9
## [58] viridisLite_0.4.0 xtable_1.8-4 spatstat.core_2.3-2
## [61] stats4_4.1.2 umap_0.2.8.0 htmlwidgets_1.5.4
## [64] httr_1.4.2 RColorBrewer_1.1-2 ellipsis_0.3.2
## [67] Seurat_4.1.0 ica_1.0-2 pkgconfig_2.0.3
## [70] farver_2.1.0 sass_0.4.0 uwot_0.1.11
## [73] dbplyr_2.1.1 deldir_1.0-6 here_1.0.1
## [76] utf8_1.2.2 tidyselect_1.1.2 labeling_0.4.2
## [79] rlang_1.0.2 reshape2_1.4.4 later_1.3.0
## [82] munsell_0.5.0 cellranger_1.1.0 tools_4.1.2
## [85] cli_3.3.0 generics_0.1.1 ade4_1.7-19
## [88] broom_0.7.11 ggridges_0.5.3 evaluate_0.14
## [91] fastmap_1.1.0 yaml_2.3.5 goftest_1.2-3
## [94] knitr_1.37 fs_1.5.2 fitdistrplus_1.1-6
## [97] zip_2.2.0 RANN_2.6.1 dendextend_1.16.0
## [100] pbapply_1.5-0 future_1.25.0 nlme_3.1-155
## [103] mime_0.12 xml2_1.3.3 compiler_4.1.2
## [106] rstudioapi_0.13 plotly_4.10.0 png_0.1-7
## [109] spatstat.utils_2.3-0 reprex_2.0.1 bslib_0.3.1
## [112] stringi_1.7.6 ps_1.7.0 highr_0.9
## [115] RSpectra_0.16-0 bluster_1.4.0 lattice_0.20-45
## [118] Matrix_1.4-0 vctrs_0.4.1 pillar_1.6.4
## [121] lifecycle_1.0.1 spatstat.geom_2.3-1 lmtest_0.9-39
## [124] jquerylib_0.1.4 BiocNeighbors_1.12.0 RcppAnnoy_0.0.19
## [127] data.table_1.14.2 cowplot_1.1.1 irlba_2.3.5
## [130] httpuv_1.6.5 patchwork_1.1.1 R6_2.5.1
## [133] promises_1.2.0.1 r3dmol_0.1.2 KernSmooth_2.23-20
## [136] gridExtra_2.3 parallelly_1.30.0 codetools_0.2-18
## [139] MASS_7.3-55 assertthat_0.2.1 openssl_2.0.2
## [142] rprojroot_2.0.3 withr_2.5.0 SeuratObject_4.0.4
## [145] sctransform_0.3.3 S4Vectors_0.32.3 mgcv_1.8-38
## [148] parallel_4.1.2 hms_1.1.1 grid_4.1.2
## [151] rpart_4.1-15 rmarkdown_2.11 Rtsne_0.16
## [154] shiny_1.7.1 lubridate_1.8.0