PlatypusV3 Structure Vignette

Lucas Stalder

2022-07-04

library(Platypus)
library(tidyverse)
library(stringr)
library(utils)
library(tidyr)
library(dplyr)
library(stats, warn.conflicts = F)
library(ssh)
library(Seurat)
library(bio3d)
library(r3dmol)
source("VDJ_select_clonotypes.R")
source("AlphaFold_prediction.R")
source("VDJ_structure_analysis.R")

1. Introduction

The Platypus package contains various tools and pipelines for computational immunology. This vignettes focus is on the use of Platypus for structural analysis of antibodies. Starting by loading VDJ data to a VGM object, this vignette guides you through the process of getting germline reference sequences by using MIXCR, determine the most frequent clonotypes and do structure prediction of antibodies with AlphaFold. Furthermore, Platypus contains functions for visualization of antibody structures and determination of structural and binding metrics.

2. AlphaFold prediction

In order to predict a structure with AlphaFold, one must have access to a server or HPC with Alpha Fold installed and GPU usage access. For Platypus users with an ETH user account, the prediction function can connect to your account on the Euler cluster and automatically start the prediction and later import the structures directly from the cluster back into the VGM object. For non ETH members, the function will output the FASTA files in the right format to be used as an input for running AlphaFold on a custom server. After prediction is done, the function can import the structures form a local directory back into the VGM object.

2.1 prerequisites for ETH members to run Alpha Fold on Euler

In order to run Alpha Fold on the ETH Euler cluster, you need to have ETH credentials to connect to the Euler cluster (). Furthermore, you need to have GPU usage access, which can only be acquired by research groups. So make sure that your research group has GPU access and that you are part of the group on Euler. If this is not the case, contact your departments IT support and ask them to add you to the Euler user list of your group. To use the function, you have to be connected to the ETH network or have a VPN connection.

2.2 Prediction from VGM

The antibody structure can directly be predicted from a VGM object. In this Vignette the OVA data set from Neumeier et al. 2021 is used.

#Downloading PlatypusDB raw data in a list format
#For structure of PlatypusDB links, please refer to the PlatypusDB vignette 
neumeier2021_raw <- PlatypusDB_fetch(PlatypusDB.links = 
                                     c("neumeier2021b/ALL/VDJ"), 
                                     load.to.enviroment = F,
                                     load.to.list = T,)
#Running the VDJ_GEX_matrix function

vgm <- VDJ_GEX_matrix(Data.in = neumeier2021_raw,
                      verbose = T,
                      select.excess.chains.by.umi.count = T,
                      excess.chain.confidence.count.threshold = 1000)

2.2.1 Selection of top clonotypes with the VDJ_select_clonotypes function.

Prior to the structure prediction, desired antibody sequences of the most expanded clonotypes can be selected with the VDJ_select_clonotypes function. Furthermore, this function will call MIXCR to determine the germline reference sequences, which will be used to annotate the predicted sequences. Therefore a copy of MIXCR must be installed locally on your computer and the path has to be specified in the mixcr.direcotry argument. In order to integrate the UMI counts to the VGM data, this function needs the VGM and the raw data from the PlatypusDB_fetch function as an input. The function selects the most expanded clonotypes per sample by default (overall most expanded clonotypes can be selected by setting clonotypes.per.sample to FALSE). Among the clonotypes, the entries are ranked according to UMI counts, where the most prominent unique VDJ/VJ sequences are selected per clonotype. The number of clonotypes can be specified in the top.clontypes argument and the number of sequences per clonotype in the seq.per.clonotype argument.

Here the top three most expanded clonotypes per sample with the two most prominent sequences per clonotype are selected.

VDJ_top_clonotypes_mixcr_out <- VDJ_select_clonotypes(VGM = vgm, 
                                                      raw.data = neumeier2021_raw, 
                                                      clone.strategy = "10x.default", 
                                                      top.clonotypes = 3, 
                                                      seq.per.clonotype = 2, 
                                                      donut.plot = F, 
                                                      mixcr.directory = "/usr/local/Cellar/mixcr/3.0.13-2", 
                                                      species = "mmu")
## MAC system detected
## Clonotyping strategy: 10x.default
## Processing sample 1 of 5
## Processing sample 2 of 5
## Processing sample 3 of 5
## Processing sample 4 of 5
## Processing sample 5 of 5
## Backing up 10x default clonotyping in columns clonotype_id_10x and clonotype_frequency_10x before updating clonotype_id and clonotype_frequency columns
kable(VDJ_top_clonotypes_mixcr_out %>% head(),caption = "MIXCR output")

2.2.2 Custom selection of VGM entries

Instead of using the VDJ_select_clonotypes function, sequences can also be selected based on custom filters from the vgm object. The germline reference sequences can then be added by calling the VDJ_call_MIXCR_full function. Here only unique reads from sample s1 with the clonotype ID “clonotype4” are considered.

vgm_selection <- vgm[[1]] %>%  
                 filter(sample_id == "s1" & clonotype_id_10x == "clonotype4") %>%  
                 distinct(VDJ_raw_ref, .keep_all = T)

VDJ_mixcr_out <- VDJ_call_MIXCR_full(VDJ = vgm_selection, 
                                     mixcr.directory = "/usr/local/Cellar/mixcr/3.0.13-2", 
                                     species = "mmu")
## MAC system detected

2.3 Alpha Fold prediction function

2.3.1 Structure prediction from the vgm MIXCR output on Euler

The MIXCR output can be used as an input for the AlphaFold_prediction function. For ETH members with access to the Euler cluster and GPU access, the function will connect to the Euler cluster and send the AlphaFold structure predictions to the queue. The function will therefore ask for your password, which is handled in a safe manner by the shh package of R.

In the VDJ.mixcr.out argument the MIXCR output data frame either from the VDJ_select_clonotypes function or from calling the VDJ_call_MIXCR_full function can be specified as an input. In the cells.to.predict argument one can additionally specify a list of barcodes from which cells the structure shall be predicted. If it is set to “ALL”, all the structures from of the input data frame are predicted.

In the euler.user.name argument the ETH username for the Euler cluster has to be specified, which is the regular ETH username. In order to connect to server one has either to be connected to the ETH network or use a VPN connection. The function will then create a folder on your computer with all the FASTA files, which are then copied to the Euler Cluster in your scratch directory. By default the directory will be named AlphaFold_Fasta, however by specifying the argument dir.name the directory can be given any name.

Here we use the output from the VDJ_select_clonotypes function and predict the structure of three cells specified in cells.to.predict

#AlphaFold structure prediction
AlphaFold_prediction(VDJ.mixcr.out = VDJ_top_clonotypes_mixcr_out, 
                     cells.to.predict = c("s1_AAACCTGCATTGGCGC-1", 
                                          "s2_AAAGATGAGCATCATC-1",
                                          "s3_AAAGATGTCTCTAGGA-1"),
                     euler.user.name = "lucstalder", 
                     dir.name = "top_clonotypes_OVA")

This generates a single AlphaFold prediction job for each structure. Depending on the size and the number of structures this will take several hours to complete. Once the prediction is finished, the structures can be imported using the same function.
In the VDJ.mixcr.out argument the same input is specified as used previously for the prediction. In this case the VDJ_top_clonotypes_mixcr_out data frame. In order to import data, the import argument must be specified. Here the structures are imported directly from the Euler cluster and therefore import = “euler” is used.
As the directory was named “top_clonotypes_OVA” previously, the euler.dirname argument must be specified to tell the function in which directory the predicted outputs can be found.
AlphaFold produces multiple outputs for each structure which are then ranked according to the overall confidence, where the prediction with the highest confidence is named “ranked_0”. In the import function the number of ranked predictions per structure can be specified in the n.ranked argument, where n.ranked = 5 will import the top five ranked predictions per structure. The structures will be copied from the server to a directory on your local computer before they are loaded to the environment. By default, the local copy will be deleted afterwards. Here we want to keep them and therefore the rm.local.output is set to FALSE.

#AlphaFold import
VDJ_top_clonotypes_structure <- AlphaFold_prediction(VDJ.mixcr.out = VDJ_top_clonotypes_mixcr_out, 
                                                     import = "euler",
                                                     euler.user.name = "lucstalder",
                                                     euler.dirname = "top_clonotypes_OVA",
                                                     n.ranked = 5, 
                                                     rm.local.output = F)

The functions output is a list object which has in its first element the MIXCR output data frame that was used as input for structure prediction and in its second element a list of the predicted structures. As there are multiple ranked predictions per structure, the second list element of the output contains a list for each structure with the ranked predictions.

Lets have a look at the first list entry, which is just the VDJ data frame with the MIXCR information.
barcode orig_barcode sample_id_clonotype sample_id FB_assignment group_id celltype Nr_of_VDJ_chains Nr_of_VJ_chains VDJ_cdr3s_aa VJ_cdr3s_aa VDJ_cdr3s_nt VJ_cdr3s_nt VDJ_chain_contig VJ_chain_contig VDJ_chain VJ_chain VDJ_vgene VJ_vgene VDJ_dgene VDJ_jgene VJ_jgene VDJ_cgene VJ_cgene VDJ_sequence_nt_raw VJ_sequence_nt_raw VDJ_sequence_nt_trimmed VJ_sequence_nt_trimmed VDJ_sequence_aa VJ_sequence_aa VDJ_raw_ref VJ_raw_ref VDJ_trimmed_ref VJ_trimmed_ref VDJ_raw_consensus_id VJ_raw_consensus_id clonotype_frequency specifity affinity batches clonotype_id umis reads clonotype_frequency_10x.default clonotype_id_10x.default clonal_feature_10x.default clonotype_frequency_10x clonotype_id_10x VDJ_nSeqFR1 VDJ_nSeqCDR1 VDJ_nSeqFR2 VDJ_nSeqCDR2 VDJ_nSeqFR3 VDJ_nSeqCDR3 VDJ_nSeqFR4 VDJ_aaSeqFR1 VDJ_aaSeqCDR1 VDJ_aaSeqFR2 VDJ_aaSeqCDR2 VDJ_aaSeqFR3 VDJ_aaSeqCDR3 VDJ_aaSeqFR4 VDJ_bestVAlignment VDJ_bestJAlignment VDJ_bestDAlignment VDJ_descrsR1 VDJ_SHM VJ_nSeqFR1 VJ_nSeqCDR1 VJ_nSeqFR2 VJ_nSeqCDR2 VJ_nSeqFR3 VJ_nSeqCDR3 VJ_nSeqFR4 VJ_aaSeqFR1 VJ_aaSeqCDR1 VJ_aaSeqFR2 VJ_aaSeqCDR2 VJ_aaSeqFR3 VJ_aaSeqCDR3 VJ_aaSeqFR4 VJ_bestVAlignment VJ_bestJAlignment VJ_descrsR1 VJ_SHM VDJ_nt_mixcr VJ_nt_mixcr VDJ_aa_mixcr VJ_aa_mixcr
s1_AAACCTGCATTGGCGC-1 AAACCTGCATTGGCGC s1_clonotype2 s1 Not assignable 1 B cell 1 1 CMRYSAYWYFDVW CLQHGESPYTF TGTATGAGATATTCTGCTTACTGGTACTTCGATGTCTGG TGTCTACAGCATGGTGAGAGCCCGTACACGTTC AAACCTGCATTGGCGC-1_contig_1 AAACCTGCATTGGCGC-1_contig_2 IGH IGK IGHV11-2 IGKV14-126 IGHJ1 IGKJ2 IGHM IGKC GCAAATAGGGCCTCTTTCTCCTCATGAAACGCAGACCAACCTATCCTTGCAGTTCAGACATAGGAGCTTGGCTCTGGTTCCCAAGACCTCTCACTCACTTCTCAACATGGAGTGGGAACTGAGCTTAATTTTCATTTTTGCTCTTTTAAAAGATGTCCAGTGTGAAGTGCAGCTGTTGGAGACTGGAGGAGGCTTGGTGCAACCTGGGGGGTCACGGGGACTCTCTTGTGAAGGCTCAGGGTTTACTTTTAGTGGCTTCTGGATGAGCTGGGTTCGACAGACACCTGGGAAGACCCTGGAGTGGATTGGAGACATTAATTCTGATGGCAGTGCAATAAACTACGCACCATCCATAAAGGATCGATTCACTATCTTCAGAGACAATGACAAGAGCACCCTGTACCTGCAGATGAGCAATGTGCGATCTGAGGACACAGCCACGTATTTCTGTATGAGATATTCTGCTTACTGGTACTTCGATGTCTGGGGCGCAGGGACCACGGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA GGGGTTGTCATTGCAGTCAGGACTCAGCATGGACATGAGGGCCCCTGCTCAGTTTTTTGGGATCTTGTTGCTCTGGTTTCCAGGTATCAGATGTGACATCAAGATGACCCAGTCTCCATCCTCCATGTATGCATCGCTGGGAGAGAGAGTCACTATCACTTGCAAGGCGAGTCAGGACATTAAAAGCTATTTAAGCTGGTACCAGCAGAAACCATGGAAATCTCCTAAGACCCTGATCTATTATGCAACAAGCTTGGCAGATGGGGTCCCATCAAGATTCAGTGGCAGTGGATCTGGGCAAGATTATTCTCTAACCATCAGCAGCCTGGAGTCTGACGATACAGCAACTTATTACTGTCTACAGCATGGTGAGAGCCCGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC ATGCAAATAGGGCCTCTTTCTACTCATGAAACGCAGACCAACCTATCCTTGCAGTTCAGACAGAGGAGCTTGGCTCTGGTTCCCAAGACCTCTTACTCACTTCTCAACATGGAGTGGGAACTGAGCTTAATTTTCATTTTTGCTCTTTTAAAAGATGTCCAGTGTGAAGTGCAGCTGTTGGAGACTGGAGGAGGCTTGGTGCAACCTGGGGGGTCACGGGGACTCTCTTGTGAAGGCTCAGGGTTCACTTTTAGTGGCTTCTGGATGAGCTGGGTTCGACAGACACCTGGGAAGACCCTGGAGTGGATTGGAGACATTAATTCTGATGGCAGTGCAATAAACTACGCACCATCCATAAAGGATCGATTCACTATCTTCAGAGACAATGACAAGAGCACCCTGTACCTGCAGATGAGCAATGTGCGATCGGAGGACACAGCCACGTATTTCTGTATGAGATACTACTGGTACTTCGATGTCTGGGGCACAGGGACCACGGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT ATGGACATGAGGGCCCCTGCTCAGTTTTTTGGGATCTTGTTGCTCTGGTTTCCAGGTATCAGATGTGACATCAAGATGACCCAGTCTCCATCCTCCATGTATGCATCGCTGGGAGAGAGAGTCACTATCACTTGCAAGGCGAGTCAGGACATTAAAAGCTATTTAAGCTGGTACCAGCAGAAACCATGGAAATCTCCTAAGACCCTGATCTATTATGCAACAAGCTTGGCAGATGGGGTCCCATCAAGATTCAGTGGCAGTGGATCTGGGCAAGATTATTCTCTAACCATCAGCAGCCTGGAGTCTGACGATACAGCAACTTATTACTGTCTACAGCATGGTGAGAGCCCTCCATTCACGTTCGGCTCGGGGACAAAGTTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGAACAACTTCTACCCCAAAGACATCAATGTCAAGTGGAAGATTGATGGCAGTGAACGACAAAATGGCGTCCTGAACAGTTGGACTGATCAGGACAGCAAAGACAGCACCTACAGCATGAGCAGCACCCTCACGTTGACCAAGGACGAGTATGAACGACATAACAGCTATACCTGTGAGGCCACTCACAAGACATCAACTTCACCCATTGTCAAGAGCTTCAACAGGAATGAGTGT clonotype2_concat_ref_1 clonotype2_concat_ref_2 283 NA NA Unspecified clonotype2 1007 3328 283 clonotype2 clonotype2 287 clonotype2 GAAGTGCAGCTGTTGGAGACTGGAGGAGGCTTGGTGCAACCTGGGGGGTCACGGGGACTCTCTTGTGAAGGCTCA GGGTTTACTTTTAGTGGCTTCTGG ATGAGCTGGGTTCGACAGACACCTGGGAAGACCCTGGAGTGGATTGGAGAC ATTAATTCTGATGGCAGTGCAATA AACTACGCACCATCCATAAAGGATCGATTCACTATCTTCAGAGACAATGACAAGAGCACCCTGTACCTGCAGATGAGCAATGTGCGATCTGAGGACACAGCCACGTATTTC TGTATGAGATATTCTGCTTACTGGTACTTCGATGTCTGG GGCGCAGGGACCACGGTCACCGTCTCCTCAG EVQLLETGGGLVQPGGSRGLSCEGS GFTFSGFW MSWVRQTPGKTLEWIGD INSDGSAI NYAPSIKDRFTIFRDNDKSTLYLQMSNVRSEDTATYF CMRYSAYWYFDVW GAGTTVTVSS_ 0|301|316|163|463|SC80TSG263TDA297|1460.0 21|73|73|466|518|SA45G|246.0 s1_AAACCTGCATTGGCGC-1 3 GACATCAAGATGACCCAGTCTCCATCCTCCATGTATGCATCGCTGGGAGAGAGAGTCACTATCACTTGCAAGGCGAGT CAGGACATTAAAAGCTAT TTAAGCTGGTACCAGCAGAAACCATGGAAATCTCCTAAGACCCTGATCTAT TATGCAACA AGCTTGGCAGATGGGGTCCCATCAAGATTCAGTGGCAGTGGATCTGGGCAAGATTATTCTCTAACCATCAGCAGCCTGGAGTCTGACGATACAGCAACTTATTAC TGTCTACAGCATGGTGAGAGCCCGTACACGTTC GGAGGGGGGACCAAGCTGGAAATAAAAC DIKMTQSPSSMYASLGERVTITCKAS QDIKSY LSWYQQKPWKSPKTLIY YAT SLADGVPSRFSGSGSGQDYSLTISSLESDDTATYY CLQHGESPYTF GGGTKLEIK_ 0|284|307|94|378||1420.0 21|59|59|378|416||190.0 s1_AAACCTGCATTGGCGC-1 0 GAAGTGCAGCTGTTGGAGACTGGAGGAGGCTTGGTGCAACCTGGGGGGTCACGGGGACTCTCTTGTGAAGGCTCAGGGTTTACTTTTAGTGGCTTCTGGATGAGCTGGGTTCGACAGACACCTGGGAAGACCCTGGAGTGGATTGGAGACATTAATTCTGATGGCAGTGCAATAAACTACGCACCATCCATAAAGGATCGATTCACTATCTTCAGAGACAATGACAAGAGCACCCTGTACCTGCAGATGAGCAATGTGCGATCTGAGGACACAGCCACGTATTTCTGTATGAGATATTCTGCTTACTGGTACTTCGATGTCTGGGGCGCAGGGACCACGGTCACCGTCTCCTCAG GACATCAAGATGACCCAGTCTCCATCCTCCATGTATGCATCGCTGGGAGAGAGAGTCACTATCACTTGCAAGGCGAGTCAGGACATTAAAAGCTATTTAAGCTGGTACCAGCAGAAACCATGGAAATCTCCTAAGACCCTGATCTATTATGCAACAAGCTTGGCAGATGGGGTCCCATCAAGATTCAGTGGCAGTGGATCTGGGCAAGATTATTCTCTAACCATCAGCAGCCTGGAGTCTGACGATACAGCAACTTATTACTGTCTACAGCATGGTGAGAGCCCGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAAC EVQLLETGGGLVQPGGSRGLSCEGSGFTFSGFWMSWVRQTPGKTLEWIGDINSDGSAINYAPSIKDRFTIFRDNDKSTLYLQMSNVRSEDTATYFCMRYSAYWYFDVWGAGTTVTVSS_ DIKMTQSPSSMYASLGERVTITCKASQDIKSYLSWYQQKPWKSPKTLIYYATSLADGVPSRFSGSGSGQDYSLTISSLESDDTATYYCLQHGESPYTFGGGTKLEIK_
s1_AAAGCAATCTTTACAC-1 AAAGCAATCTTTACAC s1_clonotype2 s1 Not assignable 1 B cell 1 1 CMRYGNYWYFDVW CLQHGESPFTF TGTATGAGATATGGTAACTACTGGTACTTCGATGTCTGG TGTCTACAGCATGGTGAGAGCCCATTCACGTTC AAAGCAATCTTTACAC-1_contig_1 AAAGCAATCTTTACAC-1_contig_2 IGH IGK IGHV11-2 IGKV14-126 IGHJ1 IGKJ4 IGHM IGKC CAGACCAACCTATCCTTGCAGTTCAGACATAGGAGCTTGGCTCTGGTTCCCAAGACCTCTCACTCACTTCTCAACATGGAGTGGGAACTGAGCTTAATTTTCATTTTTGCTCTTTTAAAAGATGTCCAGTGTGAAGTGCAGCTGTTGGAGACTGGAGGAGGCTTGGTGCAACCTGGGGGGTCACGGGGACTCTCTTGTGAAGGCTCAGGGTTTACTTTTAGTGGCTTCTGGATGAGCTGGGTTCGACAGACACCTGGGAAGACCCTGGAGTGGATTGGAGACATTAATTCTGATGGCAGTGCAATAAACTACGCACCATCCATAAAGGATCGATTCACTATCTTCAGAGACAATGACAAGAGCACCCTGTACCTGCAGATGAGCAATGTGCGATCTGAGGACACAGCCACGTATTTCTGTATGAGATATGGTAACTACTGGTACTTCGATGTCTGGGGCGCAGGGACCACGGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA GGGGTTGTCATTGCAGTCAGGACTCAGCATGGACATGAGGGCCCCTGCTCAGTTTTTTGGGATCTTGTTGCTCTGGTTTCCAGGTATCAGATGTGACATCAAGATGACCCAGTCTCCATCCTCCATGTATGCATCGCTGGGAGAGAGAGTCACTATCACTTGCAAGGCGAGTCAGGACATTAAAAGCTATTTAAGCTGGTACCAGCAGAAACCATGGAAATCTCCTAAGACCCTGATCTATTATGCAACAAGCTTGGCAGATGGGGTCCCATCAAGATTCAGTGGCAGTGGATCTGGGCAAGATTATTCTCTAACCATCAGCAGCCTGGAGTCTGACGATACAGCAACTTATTACTGTCTACAGCATGGTGAGAGCCCATTCACGTTCGGCTCGGGGACAAAGTTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC ATGCAAATAGGGCCTCTTTCTACTCATGAAACGCAGACCAACCTATCCTTGCAGTTCAGACAGAGGAGCTTGGCTCTGGTTCCCAAGACCTCTTACTCACTTCTCAACATGGAGTGGGAACTGAGCTTAATTTTCATTTTTGCTCTTTTAAAAGATGTCCAGTGTGAAGTGCAGCTGTTGGAGACTGGAGGAGGCTTGGTGCAACCTGGGGGGTCACGGGGACTCTCTTGTGAAGGCTCAGGGTTCACTTTTAGTGGCTTCTGGATGAGCTGGGTTCGACAGACACCTGGGAAGACCCTGGAGTGGATTGGAGACATTAATTCTGATGGCAGTGCAATAAACTACGCACCATCCATAAAGGATCGATTCACTATCTTCAGAGACAATGACAAGAGCACCCTGTACCTGCAGATGAGCAATGTGCGATCGGAGGACACAGCCACGTATTTCTGTATGAGATACTACTGGTACTTCGATGTCTGGGGCACAGGGACCACGGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT ATGGACATGAGGGCCCCTGCTCAGTTTTTTGGGATCTTGTTGCTCTGGTTTCCAGGTATCAGATGTGACATCAAGATGACCCAGTCTCCATCCTCCATGTATGCATCGCTGGGAGAGAGAGTCACTATCACTTGCAAGGCGAGTCAGGACATTAAAAGCTATTTAAGCTGGTACCAGCAGAAACCATGGAAATCTCCTAAGACCCTGATCTATTATGCAACAAGCTTGGCAGATGGGGTCCCATCAAGATTCAGTGGCAGTGGATCTGGGCAAGATTATTCTCTAACCATCAGCAGCCTGGAGTCTGACGATACAGCAACTTATTACTGTCTACAGCATGGTGAGAGCCCTCCATTCACGTTCGGCTCGGGGACAAAGTTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGAACAACTTCTACCCCAAAGACATCAATGTCAAGTGGAAGATTGATGGCAGTGAACGACAAAATGGCGTCCTGAACAGTTGGACTGATCAGGACAGCAAAGACAGCACCTACAGCATGAGCAGCACCCTCACGTTGACCAAGGACGAGTATGAACGACATAACAGCTATACCTGTGAGGCCACTCACAAGACATCAACTTCACCCATTGTCAAGAGCTTCAACAGGAATGAGTGT clonotype2_concat_ref_1 clonotype2_concat_ref_2 283 NA NA Unspecified clonotype2 209 752 283 clonotype2 clonotype2 287 clonotype2 GAAGTGCAGCTGTTGGAGACTGGAGGAGGCTTGGTGCAACCTGGGGGGTCACGGGGACTCTCTTGTGAAGGCTCA GGGTTTACTTTTAGTGGCTTCTGG ATGAGCTGGGTTCGACAGACACCTGGGAAGACCCTGGAGTGGATTGGAGAC ATTAATTCTGATGGCAGTGCAATA AACTACGCACCATCCATAAAGGATCGATTCACTATCTTCAGAGACAATGACAAGAGCACCCTGTACCTGCAGATGAGCAATGTGCGATCTGAGGACACAGCCACGTATTTC TGTATGAGATATGGTAACTACTGGTACTTCGATGTCTGG GGCGCAGGGACCACGGTCACCGTCTCCTCAG EVQLLETGGGLVQPGGSRGLSCEGS GFTFSGFW MSWVRQTPGKTLEWIGD INSDGSAI NYAPSIKDRFTIFRDNDKSTLYLQMSNVRSEDTATYF CMRYGNYWYFDVW GAGTTVTVSS_ 0|297|316|132|429|SC80TSG263T|1457.0 16|73|73|430|487|SG19ASA45G|257.0 s1_AAAGCAATCTTTACAC-1 4 GACATCAAGATGACCCAGTCTCCATCCTCCATGTATGCATCGCTGGGAGAGAGAGTCACTATCACTTGCAAGGCGAGT CAGGACATTAAAAGCTAT TTAAGCTGGTACCAGCAGAAACCATGGAAATCTCCTAAGACCCTGATCTAT TATGCAACA AGCTTGGCAGATGGGGTCCCATCAAGATTCAGTGGCAGTGGATCTGGGCAAGATTATTCTCTAACCATCAGCAGCCTGGAGTCTGACGATACAGCAACTTATTAC TGTCTACAGCATGGTGAGAGCCCATTCACGTTC GGCTCGGGGACAAAGTTGGAAATAAAAC DIKMTQSPSSMYASLGERVTITCKAS QDIKSY LSWYQQKPWKSPKTLIY YAT SLADGVPSRFSGSGSGQDYSLTISSLESDDTATYY CLQHGESPFTF GSGTKLEIK_ 0|284|307|94|378||1420.0 20|58|58|378|416||190.0 s1_AAAGCAATCTTTACAC-1 0 GAAGTGCAGCTGTTGGAGACTGGAGGAGGCTTGGTGCAACCTGGGGGGTCACGGGGACTCTCTTGTGAAGGCTCAGGGTTTACTTTTAGTGGCTTCTGGATGAGCTGGGTTCGACAGACACCTGGGAAGACCCTGGAGTGGATTGGAGACATTAATTCTGATGGCAGTGCAATAAACTACGCACCATCCATAAAGGATCGATTCACTATCTTCAGAGACAATGACAAGAGCACCCTGTACCTGCAGATGAGCAATGTGCGATCTGAGGACACAGCCACGTATTTCTGTATGAGATATGGTAACTACTGGTACTTCGATGTCTGGGGCGCAGGGACCACGGTCACCGTCTCCTCAG GACATCAAGATGACCCAGTCTCCATCCTCCATGTATGCATCGCTGGGAGAGAGAGTCACTATCACTTGCAAGGCGAGTCAGGACATTAAAAGCTATTTAAGCTGGTACCAGCAGAAACCATGGAAATCTCCTAAGACCCTGATCTATTATGCAACAAGCTTGGCAGATGGGGTCCCATCAAGATTCAGTGGCAGTGGATCTGGGCAAGATTATTCTCTAACCATCAGCAGCCTGGAGTCTGACGATACAGCAACTTATTACTGTCTACAGCATGGTGAGAGCCCATTCACGTTCGGCTCGGGGACAAAGTTGGAAATAAAAC EVQLLETGGGLVQPGGSRGLSCEGSGFTFSGFWMSWVRQTPGKTLEWIGDINSDGSAINYAPSIKDRFTIFRDNDKSTLYLQMSNVRSEDTATYFCMRYGNYWYFDVWGAGTTVTVSS_ DIKMTQSPSSMYASLGERVTITCKASQDIKSYLSWYQQKPWKSPKTLIYYATSLADGVPSRFSGSGSGQDYSLTISSLESDDTATYYCLQHGESPFTFGSGTKLEIK_
s1_AACACGTAGGGTCGAT-1 AACACGTAGGGTCGAT s1_clonotype3 s1 Not assignable 1 B cell 1 1 CTRSEDYYWFAYW CLQYDEFPYTF TGTACAAGATCAGAGGATTATTACTGGTTTGCTTACTGG TGTCTACAGTATGATGAGTTTCCGTACACGTTC AACACGTAGGGTCGAT-1_contig_1 AACACGTAGGGTCGAT-1_contig_2 IGH IGK IGHV1-5 IGKV14-111 IGHJ3 IGKJ2 IGHG1 IGKC GATCAGTATCCTCTTCACAGTCACTGAAAACATTGACTCTAATCATGGAATGTAACTGGATACTTCCTTTTATTCTGTCGGTAACTTCAGGGGTCTACTCAGAGGTTCAGCTCCAGCAGTCTGGGACTATGCTGGCAAGGCCTGGGGCTTCAGTGAAGATGTCCTGCAAGGCTTCTGGCTACACCTTTACCAACTACTGGATACACTGGGTAAAACAGAGGCCTGGACAGGGTCTGGAATGGATTGGCGCTATTTATCCTGGAAATAGTGATACTAGGTACAACCAGAAGTTCAAGGGCAAGGCCAAACTGACTGCAGTCACATCCACCACCACTGCCTACATGGATCTCAGCAGCCTGACAAATGAGGACTCTGCGGTCTATTGCTGTACAAGATCAGAGGATTATTACTGGTTTGCTTACTGGGGCCAAGGGACTCTGGTCACTGTCTCTGCAGCCAAAACGACACCCCCATCTGTCTATCCACTGGCCCCTGGATCTGCTGCCCAAACTAACTCCATGGTGACCCTGGGATGCCTGGTCAAGG CACCCGTTTCTTATATGGGGATTGTCATTGCAGCCAGGACTCAGCATGGACATGAGGACCCCTGCTCAGTTTCTTGGAATCTTGTTGCTCTGGTTTCCAGGTATCAAATGTGACATCAAGATGACCCAGTCTCCATCTTCCATGTATGCATCTCTAGGAGAGAGAGTCACTATCACTTGCAAGGCGAGTCAGGACATTAATAGATATTTAAGCTGGTTCCAGCAGAAACCAGGGAAATCTCCTAAGACCCTGATCTATCGTGCAAACAGATTGGTAGATGGGGTCCCATCAAGGTTCAGTGGCAGTGGATCTGGGCAAGATTATTCTCTCACCATCAGCAGCCTGGAGTATGAAGATATGGGAATTTATTATTGTCTACAGTATGATGAGTTTCCGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC GATCAGTATCCTCTTCACAGTCACTGAAAACATTGACTCTAATCATGGAATGTAACTGGATACTTCCTTTTATTCTGTCGGTAACTTCAGGGGTCTACTCAGAGGTTCAGCTCCAGCAGTCTGGGACTGTGCTGGCAAGGCCTGGGGCTTCAGTGAAGATGTCCTGCAAGACTTCTGGCTACACATTTACCAGCTACTGGATGCACTGGGTAAAACAGAGGCCTGGACAGGGTCTGGAATGGATAGGGGCTATTTATCCTGGAAATAGTGATACTAGCTACAACCAGAAGTTCAAGGGCAAGGCCAAACTGACTGCAGTCACATCCGCCAGCACTGCCTACATGGAGCTCAGCAGCCTGACAAATGAGGACTCTGCGGTCTATTACTGTACAAGACCTGGTTTGCTTACTGGGGCCAAGGGACTCTGGTCACTGTCTCTGCAGCCAAAACGACACCCCCATCTGTCTATCCACTGGCCCCTGGATCTGCTGCCCAAACTAACTCCATGGTGACCCTGGGATGCCTGGTCAAGGGCTATTTCCCTGAGCCAGTGACAGTGACCTGGAACTCTGGATCCCTGTCCAGCGGTGTGCACACCTTCCCAGCTGTCCTGCAGTCTGACCTCTACACTCTGAGCAGCTCAGTGACTGTCCCCTCCAGCACCTGGCCCAGCCAGACCGTCACCTGCAACGTTGCCCACCCGGCCAGCAGCACCAAGGTGGACAAGAAAATTGTGCCCAGGGATTGTGGTTGTAAGCCTTGCATATGTACAGTCCCAGAAGTATCATCTGTCTTCATCTTCCCCCCAAAGCCCAAGGATGTGCTCACCATTACTCTGACTCCTAAGGTCACGTGTGTTGTGGTAGACATCAGCAAGGATGATCCCGAGGTCCAGTTCAGCTGGTTTGTAGATGATGTGGAGGTGCACACAGCTCAGACGAAACCCCGGGAGGAGCAGATCAACAGCACTTTCCGTTCAGTCAGTGAACTTCCCATCATGCACCAGGACTGGCTCAATGGCAAGGAGTTCAAATGCAGGGTCAACAGTGCAGCTTTCCCTGCCCCCATCGAGAAAACCATCTCCAAAACCAAAGGCAGACCGAAGGCTCCACAGGTGTACACCATTCCACCTCCCAAGGAGCAGATGGCCAAGGATAAAGTCAGTCTGACCTGCATGATAACAAACTTCTTCCCTGAAGACATTACTGTGGAGTGGCAGTGGAATGGGCAGCCAGCGGAGAACTACAAGAACACTCAGCCCATCATGGACACAGATGGCTCTTACTTCGTCTACAGCAAGCTCAATGTGCAGAAGAGCAACTGGGAGGCAGGAAATACTTTCACCTGCTCTGTGTTACATGAGGGCCTGCACAACCACCATACTGAGAAGAGCCTCTCCCACTCTCCTGGTAAA ATGGACATGAGGACCCCTGCTCAGTTTCTTGGAATCTTGTTGCTCTGGTTTCCAGGTATCAAATGTGACATCAAGATGACCCAGTCTCCATCTTCCATGTATGCATCTCTAGGAGAGAGAGTCACTATCACTTGCAAGGCGAGTCAGGACATTAATAGCTATTTAAGCTGGTTCCAGCAGAAACCAGGGAAATCTCCTAAGACCCTGATCTATCGTGCAAACAGATTGGTAGATGGGGTCCCATCAAGGTTCAGTGGCAGTGGATCTGGGCAAGATTATTCTCTCACCATCAGCAGCCTGGAGTATGAAGATATGGGAATTTATTATTGTCTACAGTATGATGAGTTTCCTCCTGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGAACAACTTCTACCCCAAAGACATCAATGTCAAGTGGAAGATTGATGGCAGTGAACGACAAAATGGCGTCCTGAACAGTTGGACTGATCAGGACAGCAAAGACAGCACCTACAGCATGAGCAGCACCCTCACGTTGACCAAGGACGAGTATGAACGACATAACAGCTATACCTGTGAGGCCACTCACAAGACATCAACTTCACCCATTGTCAAGAGCTTCAACAGGAATGAGTGT clonotype3_concat_ref_1 clonotype3_concat_ref_2 116 NA NA Unspecified clonotype3 1935 6006 116 clonotype3 clonotype3 118 clonotype3 GAGGTTCAGCTCCAGCAGTCTGGGACTATGCTGGCAAGGCCTGGGGCTTCAGTGAAGATGTCCTGCAAGGCTTCT GGCTACACCTTTACCAACTACTGG ATACACTGGGTAAAACAGAGGCCTGGACAGGGTCTGGAATGGATTGGCGCT ATTTATCCTGGAAATAGTGATACT AGGTACAACCAGAAGTTCAAGGGCAAGGCCAAACTGACTGCAGTCACATCCACCACCACTGCCTACATGGATCTCAGCAGCCTGACAAATGAGGACTCTGCGGTCTATTGC TGTACAAGATCAGAGGATTATTACTGGTTTGCTTACTGG GGCCAAGGGACTCTGGTCACTGTCTCTGCAG EVQLQQSGTMLARPGASVKMSCKAS GYTFTNYW IHWVKQRPGQGLEWIGA IYPGNSDT RYNQKFKGKAKLTAVTSTTTAYMDLSSLTNEDSAVYC CTRSEDYYWFAYW GQGTLVTVSA_ 0|296|314|101|397|SG27ASA69GSA83CSG91ASG101ASA143TSG146CSC176GSG225ASG229CSG245TSA283G|1312.0 21|68|68|409|456||235.0 24|30|69|403|409||30.0 s1_AACACGTAGGGTCGAT-1 12 GACATCAAGATGACCCAGTCTCCATCTTCCATGTATGCATCTCTAGGAGAGAGAGTCACTATCACTTGCAAGGCGAGT CAGGACATTAATAGATAT TTAAGCTGGTTCCAGCAGAAACCAGGGAAATCTCCTAAGACCCTGATCTAT CGTGCAAAC AGATTGGTAGATGGGGTCCCATCAAGGTTCAGTGGCAGTGGATCTGGGCAAGATTATTCTCTCACCATCAGCAGCCTGGAGTATGAAGATATGGGAATTTATTAT TGTCTACAGTATGATGAGTTTCCGTACACGTTC GGAGGGGGGACCAAGCTGGAAATAAAAC DIKMTQSPSSMYASLGERVTITCKAS QDINRY LSWFQQKPGKSPKTLIY RAN RLVDGVPSRFSGSGSGQDYSLTISSLEYEDMGIYY CLQYDEFPYTF GGGTKLEIK_ 0|284|307|111|395|SC92A|1406.0 21|59|59|395|433||190.0 s1_AACACGTAGGGTCGAT-1 1 GAGGTTCAGCTCCAGCAGTCTGGGACTATGCTGGCAAGGCCTGGGGCTTCAGTGAAGATGTCCTGCAAGGCTTCTGGCTACACCTTTACCAACTACTGGATACACTGGGTAAAACAGAGGCCTGGACAGGGTCTGGAATGGATTGGCGCTATTTATCCTGGAAATAGTGATACTAGGTACAACCAGAAGTTCAAGGGCAAGGCCAAACTGACTGCAGTCACATCCACCACCACTGCCTACATGGATCTCAGCAGCCTGACAAATGAGGACTCTGCGGTCTATTGCTGTACAAGATCAGAGGATTATTACTGGTTTGCTTACTGGGGCCAAGGGACTCTGGTCACTGTCTCTGCAG GACATCAAGATGACCCAGTCTCCATCTTCCATGTATGCATCTCTAGGAGAGAGAGTCACTATCACTTGCAAGGCGAGTCAGGACATTAATAGATATTTAAGCTGGTTCCAGCAGAAACCAGGGAAATCTCCTAAGACCCTGATCTATCGTGCAAACAGATTGGTAGATGGGGTCCCATCAAGGTTCAGTGGCAGTGGATCTGGGCAAGATTATTCTCTCACCATCAGCAGCCTGGAGTATGAAGATATGGGAATTTATTATTGTCTACAGTATGATGAGTTTCCGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAAC EVQLQQSGTMLARPGASVKMSCKASGYTFTNYWIHWVKQRPGQGLEWIGAIYPGNSDTRYNQKFKGKAKLTAVTSTTTAYMDLSSLTNEDSAVYCCTRSEDYYWFAYWGQGTLVTVSA_ DIKMTQSPSSMYASLGERVTITCKASQDINRYLSWFQQKPGKSPKTLIYRANRLVDGVPSRFSGSGSGQDYSLTISSLEYEDMGIYYCLQYDEFPYTFGGGTKLEIK_
s1_AACTCAGGTGGTTTCA-1 AACTCAGGTGGTTTCA s1_clonotype3 s1 Not assignable 1 B cell 1 1 CTRSEDYYWFAYW CLQYDEFPYTF TGTACAAGATCAGAGGATTACTACTGGTTTGCTTACTGG TGTCTACAGTATGATGAGTTTCCGTACACGTTC AACTCAGGTGGTTTCA-1_contig_1 AACTCAGGTGGTTTCA-1_contig_2 IGH IGK IGHV1-5 IGKV14-111 IGHJ3 IGKJ2 IGHG1 IGKC GATCAGTATCCTCTTCACAGTCACTGAAAACATTGACTCTAATCATGGAATGTAACTGGATACTTCCTTTTATTCTGTCGGTAACTTCAGGGGTCTACTCAGAGGTTCAGCTCCAGCAGTCTGGGACTGCGGTGGCAAGGCCTGGGGCTTCAGTGAAGATGTCCTGCAAGACTTCTGGCTACACCTTTACCAACTACTGGATGAACTGGGTAAAACAGAGGCCTGGACAGGGTCTGGAATGGATTGGCGCTATTTATCCTGGAAATAGTGACACTAGGCACAACCAGAAGTTCAAGGGCAAGGCCAAACTGACTGCAGTCACATCCACCAGTACTGCCTACATGGACCTCAGCAGCCTGACAAACGAGGACTCTGCGGTCTATTACTGTACAAGATCAGAGGATTACTACTGGTTTGCTTACTGGGGCCAAGGGACCCTGGTCACTGTCTCTGCAGCCAAAACGACACCCCCATCTGTCTATCCACTGGCCCCTGGATCTGCTGCCCAAACTAACTCCATGGTGACCCTGGGATGCCTGGTCAAGGG GGGGTTGGTTTCTTATATGGGGATTGTCATTGCAGCCAGGACTCAGCATGGACATGAGGACCCCTGCTCAGTTTCTTGGAATCTTGTTGCTCTGGTTTCCAGGTATCAAATGTGACATCACGATGACCCAGTCTCCATCTTCCATGTATGCATCTCTAGGAGAGAGAGTCACTTTCACTTGCAAGGCGAGTCAGGACATTAATAGCTATTTAAGCTGGTTCCAGCAGAAACCAGGGAAATCTCCTAAGACCCTGATCTATCGTGCAAACAGATTGGTAGATGGGGTCCCATCAAGGTTCAGTGGCAGTGGATCTGGGCAAGATTATTCTCTCACCATCAGCAGCCTGGAGTATGAAGATATGGGAATTTATTATTGTCTACAGTATGATGAGTTTCCGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC GATCAGTATCCTCTTCACAGTCACTGAAAACATTGACTCTAATCATGGAATGTAACTGGATACTTCCTTTTATTCTGTCGGTAACTTCAGGGGTCTACTCAGAGGTTCAGCTCCAGCAGTCTGGGACTGTGCTGGCAAGGCCTGGGGCTTCAGTGAAGATGTCCTGCAAGACTTCTGGCTACACATTTACCAGCTACTGGATGCACTGGGTAAAACAGAGGCCTGGACAGGGTCTGGAATGGATAGGGGCTATTTATCCTGGAAATAGTGATACTAGCTACAACCAGAAGTTCAAGGGCAAGGCCAAACTGACTGCAGTCACATCCGCCAGCACTGCCTACATGGAGCTCAGCAGCCTGACAAATGAGGACTCTGCGGTCTATTACTGTACAAGACCTGGTTTGCTTACTGGGGCCAAGGGACTCTGGTCACTGTCTCTGCAGCCAAAACGACACCCCCATCTGTCTATCCACTGGCCCCTGGATCTGCTGCCCAAACTAACTCCATGGTGACCCTGGGATGCCTGGTCAAGGGCTATTTCCCTGAGCCAGTGACAGTGACCTGGAACTCTGGATCCCTGTCCAGCGGTGTGCACACCTTCCCAGCTGTCCTGCAGTCTGACCTCTACACTCTGAGCAGCTCAGTGACTGTCCCCTCCAGCACCTGGCCCAGCCAGACCGTCACCTGCAACGTTGCCCACCCGGCCAGCAGCACCAAGGTGGACAAGAAAATTGTGCCCAGGGATTGTGGTTGTAAGCCTTGCATATGTACAGTCCCAGAAGTATCATCTGTCTTCATCTTCCCCCCAAAGCCCAAGGATGTGCTCACCATTACTCTGACTCCTAAGGTCACGTGTGTTGTGGTAGACATCAGCAAGGATGATCCCGAGGTCCAGTTCAGCTGGTTTGTAGATGATGTGGAGGTGCACACAGCTCAGACGAAACCCCGGGAGGAGCAGATCAACAGCACTTTCCGTTCAGTCAGTGAACTTCCCATCATGCACCAGGACTGGCTCAATGGCAAGGAGTTCAAATGCAGGGTCAACAGTGCAGCTTTCCCTGCCCCCATCGAGAAAACCATCTCCAAAACCAAAGGCAGACCGAAGGCTCCACAGGTGTACACCATTCCACCTCCCAAGGAGCAGATGGCCAAGGATAAAGTCAGTCTGACCTGCATGATAACAAACTTCTTCCCTGAAGACATTACTGTGGAGTGGCAGTGGAATGGGCAGCCAGCGGAGAACTACAAGAACACTCAGCCCATCATGGACACAGATGGCTCTTACTTCGTCTACAGCAAGCTCAATGTGCAGAAGAGCAACTGGGAGGCAGGAAATACTTTCACCTGCTCTGTGTTACATGAGGGCCTGCACAACCACCATACTGAGAAGAGCCTCTCCCACTCTCCTGGTAAA ATGGACATGAGGACCCCTGCTCAGTTTCTTGGAATCTTGTTGCTCTGGTTTCCAGGTATCAAATGTGACATCAAGATGACCCAGTCTCCATCTTCCATGTATGCATCTCTAGGAGAGAGAGTCACTATCACTTGCAAGGCGAGTCAGGACATTAATAGCTATTTAAGCTGGTTCCAGCAGAAACCAGGGAAATCTCCTAAGACCCTGATCTATCGTGCAAACAGATTGGTAGATGGGGTCCCATCAAGGTTCAGTGGCAGTGGATCTGGGCAAGATTATTCTCTCACCATCAGCAGCCTGGAGTATGAAGATATGGGAATTTATTATTGTCTACAGTATGATGAGTTTCCTCCTGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGAACAACTTCTACCCCAAAGACATCAATGTCAAGTGGAAGATTGATGGCAGTGAACGACAAAATGGCGTCCTGAACAGTTGGACTGATCAGGACAGCAAAGACAGCACCTACAGCATGAGCAGCACCCTCACGTTGACCAAGGACGAGTATGAACGACATAACAGCTATACCTGTGAGGCCACTCACAAGACATCAACTTCACCCATTGTCAAGAGCTTCAACAGGAATGAGTGT clonotype3_concat_ref_1 clonotype3_concat_ref_2 116 NA NA Unspecified clonotype3 544 1728 116 clonotype3 clonotype3 118 clonotype3 GAGGTTCAGCTCCAGCAGTCTGGGACTGCGGTGGCAAGGCCTGGGGCTTCAGTGAAGATGTCCTGCAAGACTTCT GGCTACACCTTTACCAACTACTGG ATGAACTGGGTAAAACAGAGGCCTGGACAGGGTCTGGAATGGATTGGCGCT ATTTATCCTGGAAATAGTGACACT AGGCACAACCAGAAGTTCAAGGGCAAGGCCAAACTGACTGCAGTCACATCCACCAGTACTGCCTACATGGACCTCAGCAGCCTGACAAACGAGGACTCTGCGGTCTATTAC TGTACAAGATCAGAGGATTACTACTGGTTTGCTTACTGG GGCCAAGGGACCCTGGTCACTGTCTCTGCAG EVQLQQSGTAVARPGASVKMSCKTS GYTFTNYW MNWVKQRPGQGLEWIGA IYPGNSDT RHNQKFKGKAKLTAVTSTSTAYMDLSSLTNEDSAVYY CTRSEDYYWFAYW GQGTLVTVSA_ 0|296|314|101|397|ST28CSC30GSA83CSG91ASC102ASA143TSG146CST170CSC176GST177CSG225ASC230TSG245CST263C|1284.0 21|68|68|409|456|ST48C|221.0 26|33|69|402|409||35.0 s1_AACTCAGGTGGTTTCA-1 15 GACATCACGATGACCCAGTCTCCATCTTCCATGTATGCATCTCTAGGAGAGAGAGTCACTTTCACTTGCAAGGCGAGT CAGGACATTAATAGCTAT TTAAGCTGGTTCCAGCAGAAACCAGGGAAATCTCCTAAGACCCTGATCTAT CGTGCAAAC AGATTGGTAGATGGGGTCCCATCAAGGTTCAGTGGCAGTGGATCTGGGCAAGATTATTCTCTCACCATCAGCAGCCTGGAGTATGAAGATATGGGAATTTATTAT TGTCTACAGTATGATGAGTTTCCGTACACGTTC GGAGGGGGGACCAAGCTGGAAATAAAAC DITMTQSPSSMYASLGERVTFTCKAS QDINSY LSWFQQKPGKSPKTLIY RAN RLVDGVPSRFSGSGSGQDYSLTISSLEYEDMGIYY CLQYDEFPYTF GGGTKLEIK_ 0|284|307|113|397|SA7CSA60T|1392.0 21|59|59|397|435||190.0 s1_AACTCAGGTGGTTTCA-1 2 GAGGTTCAGCTCCAGCAGTCTGGGACTGCGGTGGCAAGGCCTGGGGCTTCAGTGAAGATGTCCTGCAAGACTTCTGGCTACACCTTTACCAACTACTGGATGAACTGGGTAAAACAGAGGCCTGGACAGGGTCTGGAATGGATTGGCGCTATTTATCCTGGAAATAGTGACACTAGGCACAACCAGAAGTTCAAGGGCAAGGCCAAACTGACTGCAGTCACATCCACCAGTACTGCCTACATGGACCTCAGCAGCCTGACAAACGAGGACTCTGCGGTCTATTACTGTACAAGATCAGAGGATTACTACTGGTTTGCTTACTGGGGCCAAGGGACCCTGGTCACTGTCTCTGCAG GACATCACGATGACCCAGTCTCCATCTTCCATGTATGCATCTCTAGGAGAGAGAGTCACTTTCACTTGCAAGGCGAGTCAGGACATTAATAGCTATTTAAGCTGGTTCCAGCAGAAACCAGGGAAATCTCCTAAGACCCTGATCTATCGTGCAAACAGATTGGTAGATGGGGTCCCATCAAGGTTCAGTGGCAGTGGATCTGGGCAAGATTATTCTCTCACCATCAGCAGCCTGGAGTATGAAGATATGGGAATTTATTATTGTCTACAGTATGATGAGTTTCCGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAAC EVQLQQSGTAVARPGASVKMSCKTSGYTFTNYWMNWVKQRPGQGLEWIGAIYPGNSDTRHNQKFKGKAKLTAVTSTSTAYMDLSSLTNEDSAVYYCTRSEDYYWFAYWGQGTLVTVSA_ DITMTQSPSSMYASLGERVTFTCKASQDINSYLSWFQQKPGKSPKTLIYRANRLVDGVPSRFSGSGSGQDYSLTISSLEYEDMGIYYCLQYDEFPYTFGGGTKLEIK_
s2_AAAGATGAGCATCATC-1 AAAGATGAGCATCATC s2_clonotype1 s2 Not assignable 2 B cell 1 1 CARDAYDWYFDVW CQQGQSYPLTF TGTGCAAGAGATGCTTACGACTGGTACTTCGATGTCTGG TGTCAACAGGGTCAAAGTTATCCTCTGACGTTC AAAGATGAGCATCATC-1_contig_1 AAAGATGAGCATCATC-1_contig_2 IGH IGK IGHV7-1 IGKV15-103 IGHJ1 IGKJ1 IGHM IGKC AGTTGACGTTTTCTTATATGGGGGGGATCCTGTCCTGAGTTCCCCAATCTTCACATTCAGAAATCAGCACTCAGTCCTGTCACTATGAAGTTGTGGTTAAACTGGGTTTTTCTTTTAACACTTTTACATGGTATCCAGTGTGAGGTGAAGCTGGTGGAATCTGGAGGAGGCTTGGTACAGCCTGGGGGTTCTCTGAGACTCTCCTGTGCAACTTCTGGGTTCACCTTCAGTGATTTCTACATGGAGTGGGTCCGCCAGCCTCCAGGGAAGAGACTGGAGTGGATTGCTGCAAGTAGAAACAAAGCTAATGATTATACAACAGAGTACAGTGCATCTGTGAAGGGTCGGTTCATCGTCTCCAGAGACACTTCCCAAAGCATCCTCTACCTTCAGATGAATGCCCTGAGAGCTGAGGACACTGCCATTTATTACTGTGCAAGAGATGCTTACGACTGGTACTTCGATGTCTGGGGCGCAGGGACCACGGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA TCACTCTCAGTGAGGATACACCATCAGCATGAGGGTCCTTGCTGAGCTCCTGGGGCTGCTGCTGTTCTGCTTTTTAGGTGTGAGATGTGACATCCAGATGAACCAGTCTCCATCCAGTCTGTCTGCATCCCTTGGAGACACAATTACCATCACTTGCCATGCCAGTCAGAACATTAATGTTTGGTTAAGCTGGTACCAGCAGAAACCAGGAAATATTCCTAAACTATTGATCTATAAGGCTTCCAACTTGCACACAGGCGTCCCATCAAGGTTTAGTGGCAGTGGATCTGGAACAGGTTTCACATTAACCATCAGCAGCCTGCAGCCTGAAGACATTGCCACTTACTACTGTCAACAGGGTCAAAGTTATCCTCTGACGTTCGGTGGAGGCACCAAGCTGGAAATCAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC ATGAAGTTGTGGTTAAACTGGGTTTTTCTTTTAACACTTTTACATGGTATCCAGTGTGAGGTGAAGCTGGTGGAATCTGGAGGAGGCTTGGTACAGTCTGGGCGTTCTCTGAGACTCTCCTGTGCAACTTCTGGGTTCACCTTCAGTGATTTCTACATGGAGTGGGTCCGCCAAGCTCCAGGGAAGGGACTGGAGTGGATTGCTGCAAGTAGAAACAAAGCTAATGATTATACAACAGAGTACAGTGCATCTGTGAAGGGTCGGTTCATCGTCTCCAGAGACACTTCCCAAAGCATCCTCTACCTTCAGATGAATGCCCTGAGAGCTGAGGACACTGCCATTTATTACTGTGCAAGAGATGCACTAACTGGGACCTACTGGTACTTCGATGTCTGGGGCACAGGGACCACGGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT TCACTCTCAGTGAGGATACACCATCAGCATGAGGGTCCTTGCTGAGCTCCTGGGGCTGCTGCTGTTCTGCTTTTTAGGTGTGAGATGTGACATCCAGATGAACCAGTCTCCATCCAGTCTGTCTGCATCCCTTGGAGACACAATTACCATCACTTGCCATGCCAGTCAGAACATTAATGTTTGGTTAAGCTGGTACCAGCAGAAACCAGGAAATATTCCTAAACTATTGATCTATAAGGCTTCCAACTTGCACACAGGCGTCCCATCAAGGTTTAGTGGCAGTGGATCTGGAACAGGTTTCACATTAACCATCAGCAGCCTGCAGCCTGAAGACATTGCCACTTACTACTGTCAACAGGGTCAAAGTTATCCTCTGTGGACGTTCGGTGGAGGCACCAAGCTGGAAATCAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGAACAACTTCTACCCCAAAGACATCAATGTCAAGTGGAAGATTGATGGCAGTGAACGACAAAATGGCGTCCTGAACAGTTGGACTGATCAGGACAGCAAAGACAGCACCTACAGCATGAGCAGCACCCTCACGTTGACCAAGGACGAGTATGAACGACATAACAGCTATACCTGTGAGGCCACTCACAAGACATCAACTTCACCCATTGTCAAGAGCTTCAACAGGAATGAGTGT clonotype1_concat_ref_1 clonotype1_concat_ref_7 220 NA NA Unspecified clonotype1 1367 5508 220 clonotype1 clonotype1 225 clonotype1 GAGGTGAAGCTGGTGGAATCTGGAGGAGGCTTGGTACAGCCTGGGGGTTCTCTGAGACTCTCCTGTGCAACTTCT GGGTTCACCTTCAGTGATTTCTAC ATGGAGTGGGTCCGCCAGCCTCCAGGGAAGAGACTGGAGTGGATTGCTGCA AGTAGAAACAAAGCTAATGATTATACAACA GAGTACAGTGCATCTGTGAAGGGTCGGTTCATCGTCTCCAGAGACACTTCCCAAAGCATCCTCTACCTTCAGATGAATGCCCTGAGAGCTGAGGACACTGCCATTTATTAC TGTGCAAGAGATGCTTACGACTGGTACTTCGATGTCTGG GGCGCAGGGACCACGGTCACCGTCTCCTCAG EVKLVESGGGLVQPGGSLRLSCATS GFTFSDFY MEWVRQPPGKRLEWIAA SRNKANDYTT EYSASVKGRFIVSRDTSQSILYLQMNALRAEDTAIYY CARDAYDWYFDVW GAGTTVTVSS_ 0|305|326|141|446|ST39CSC45GSA116GSG117CSG129A|1455.0 22|73|73|451|502|SA45G|241.0 27|32|51|446|451||25.0 s2_AAAGATGAGCATCATC-1 6 GACATCCAGATGAACCAGTCTCCATCCAGTCTGTCTGCATCCCTTGGAGACACAATTACCATCACTTGCCATGCCAGT CAGAACATTAATGTTTGG TTAAGCTGGTACCAGCAGAAACCAGGAAATATTCCTAAACTATTGATCTAT AAGGCTTCC AACTTGCACACAGGCGTCCCATCAAGGTTTAGTGGCAGTGGATCTGGAACAGGTTTCACATTAACCATCAGCAGCCTGCAGCCTGAAGACATTGCCACTTACTAC TGTCAACAGGGTCAAAGTTATCCTCTGACGTTC GGTGGAGGCACCAAGCTGGAAATCAAAC DIQMNQSPSSLSASLGDTITITCHAS QNINVW LSWYQQKPGNIPKLLIY KAS NLHTGVPSRFSGSGSGTGFTLTISSLQPEDIATYY CQQGQSYPLTF GGGTKLEIK_ 0|287|307|88|375||1435.0 23|58|58|375|410||175.0 s2_AAAGATGAGCATCATC-1 0 GAGGTGAAGCTGGTGGAATCTGGAGGAGGCTTGGTACAGCCTGGGGGTTCTCTGAGACTCTCCTGTGCAACTTCTGGGTTCACCTTCAGTGATTTCTACATGGAGTGGGTCCGCCAGCCTCCAGGGAAGAGACTGGAGTGGATTGCTGCAAGTAGAAACAAAGCTAATGATTATACAACAGAGTACAGTGCATCTGTGAAGGGTCGGTTCATCGTCTCCAGAGACACTTCCCAAAGCATCCTCTACCTTCAGATGAATGCCCTGAGAGCTGAGGACACTGCCATTTATTACTGTGCAAGAGATGCTTACGACTGGTACTTCGATGTCTGGGGCGCAGGGACCACGGTCACCGTCTCCTCAG GACATCCAGATGAACCAGTCTCCATCCAGTCTGTCTGCATCCCTTGGAGACACAATTACCATCACTTGCCATGCCAGTCAGAACATTAATGTTTGGTTAAGCTGGTACCAGCAGAAACCAGGAAATATTCCTAAACTATTGATCTATAAGGCTTCCAACTTGCACACAGGCGTCCCATCAAGGTTTAGTGGCAGTGGATCTGGAACAGGTTTCACATTAACCATCAGCAGCCTGCAGCCTGAAGACATTGCCACTTACTACTGTCAACAGGGTCAAAGTTATCCTCTGACGTTCGGTGGAGGCACCAAGCTGGAAATCAAAC EVKLVESGGGLVQPGGSLRLSCATSGFTFSDFYMEWVRQPPGKRLEWIAASRNKANDYTTEYSASVKGRFIVSRDTSQSILYLQMNALRAEDTAIYYCARDAYDWYFDVWGAGTTVTVSS_ DIQMNQSPSSLSASLGDTITITCHASQNINVWLSWYQQKPGNIPKLLIYKASNLHTGVPSRFSGSGSGTGFTLTISSLQPEDIATYYCQQGQSYPLTFGGGTKLEIK_
s2_AAAGCAACAGCTGTGC-1 AAAGCAACAGCTGTGC s2_clonotype1 s2 Not assignable 2 B cell 1 1 CARDAWDWYFDVW CQQGQSYPLTF TGTGCAAGAGATGCATGGGACTGGTACTTCGATGTCTGG TGTCAACAGGGTCAAAGTTATCCTCTGACGTTC AAAGCAACAGCTGTGC-1_contig_2 AAAGCAACAGCTGTGC-1_contig_1 IGH IGK IGHV7-1 IGKV15-103 IGHJ1 IGKJ1 IGHM IGKC TGGGGAGTGGGATCCTGTCCTGAGTTCCCCAATCTTCACATTCAGAAATCAGCACTCAGTCCTGTCACTATGAAGTTGTGGTTAAACTGGGTTTTTCTTTTAACACTTTTACATGGTATCCAGTGTGAGGTGAAGCTGGTGGAATCTGGAGGAGGCTTGGTACAGCCTGGGGGTTCTCTGAGACTCTCCTGTGCAACTTCTGGGTTCACCTTCAGTGATTTCTACATGGAGTGGGTCCGCCAGCCTCCAGGGAAGAGACTGGAGTGGATTGCTGCAAGTAGAAACAAAGCTAATGATTATACAACAGAGTACAGTGCATCTGTGAAGGGTCGGTTCATCGTCTCCAGAGACACTTCCCAAAGCATCCTCTACCTTCAGATGAATGCCCTGAGAGCTGAGGACACTGCCATTTATTACTGTGCAAGAGATGCATGGGACTGGTACTTCGATGTCTGGGGCGCAGGGACCACGGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA TCACTCTCAGTGAGGATACACCATCAGCATGAGGGTCCTTGCTGAGCTCCTGGGGCTGCTGCTGTTCTGCTTTTTAGGTGTGAGATGTGACATCCAGATGAACCAGTCTCCATCCAGTCTGTCTGCATCCCTTGGAGACACAATTACCATCACTTGCCATGCCAGTCAGAACATTAATGTTTGGTTAAGCTGGTACCAGCAGAAACCAGGAAATATTCCTAAACTATTGATCTATAAGGCTTCCAACTTGCACACAGGCGTCCCATCAAGGTTTAGTGGCAGTGGATCTGGAACAGGTTTCACATTAACCATCAGCAGCCTGCAGCCTGAAGACATTGCCACTTACTACTGTCAACAGGGTCAAAGTTATCCTCTGACGTTCGGTGGAGGCACCAAGCTGGAAATCAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC ATGAAGTTGTGGTTAAACTGGGTTTTTCTTTTAACACTTTTACATGGTATCCAGTGTGAGGTGAAGCTGGTGGAATCTGGAGGAGGCTTGGTACAGTCTGGGCGTTCTCTGAGACTCTCCTGTGCAACTTCTGGGTTCACCTTCAGTGATTTCTACATGGAGTGGGTCCGCCAAGCTCCAGGGAAGGGACTGGAGTGGATTGCTGCAAGTAGAAACAAAGCTAATGATTATACAACAGAGTACAGTGCATCTGTGAAGGGTCGGTTCATCGTCTCCAGAGACACTTCCCAAAGCATCCTCTACCTTCAGATGAATGCCCTGAGAGCTGAGGACACTGCCATTTATTACTGTGCAAGAGATGCACTAACTGGGACCTACTGGTACTTCGATGTCTGGGGCACAGGGACCACGGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT TCACTCTCAGTGAGGATACACCATCAGCATGAGGGTCCTTGCTGAGCTCCTGGGGCTGCTGCTGTTCTGCTTTTTAGGTGTGAGATGTGACATCCAGATGAACCAGTCTCCATCCAGTCTGTCTGCATCCCTTGGAGACACAATTACCATCACTTGCCATGCCAGTCAGAACATTAATGTTTGGTTAAGCTGGTACCAGCAGAAACCAGGAAATATTCCTAAACTATTGATCTATAAGGCTTCCAACTTGCACACAGGCGTCCCATCAAGGTTTAGTGGCAGTGGATCTGGAACAGGTTTCACATTAACCATCAGCAGCCTGCAGCCTGAAGACATTGCCACTTACTACTGTCAACAGGGTCAAAGTTATCCTCTGTGGACGTTCGGTGGAGGCACCAAGCTGGAAATCAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGAACAACTTCTACCCCAAAGACATCAATGTCAAGTGGAAGATTGATGGCAGTGAACGACAAAATGGCGTCCTGAACAGTTGGACTGATCAGGACAGCAAAGACAGCACCTACAGCATGAGCAGCACCCTCACGTTGACCAAGGACGAGTATGAACGACATAACAGCTATACCTGTGAGGCCACTCACAAGACATCAACTTCACCCATTGTCAAGAGCTTCAACAGGAATGAGTGT clonotype1_concat_ref_1 clonotype1_concat_ref_7 220 NA NA Unspecified clonotype1 477 1828 220 clonotype1 clonotype1 225 clonotype1 GAGGTGAAGCTGGTGGAATCTGGAGGAGGCTTGGTACAGCCTGGGGGTTCTCTGAGACTCTCCTGTGCAACTTCT GGGTTCACCTTCAGTGATTTCTAC ATGGAGTGGGTCCGCCAGCCTCCAGGGAAGAGACTGGAGTGGATTGCTGCA AGTAGAAACAAAGCTAATGATTATACAACA GAGTACAGTGCATCTGTGAAGGGTCGGTTCATCGTCTCCAGAGACACTTCCCAAAGCATCCTCTACCTTCAGATGAATGCCCTGAGAGCTGAGGACACTGCCATTTATTAC TGTGCAAGAGATGCATGGGACTGGTACTTCGATGTCTGG GGCGCAGGGACCACGGTCACCGTCTCCTCAG EVKLVESGGGLVQPGGSLRLSCATS GFTFSDFY MEWVRQPPGKRLEWIAA SRNKANDYTT EYSASVKGRFIVSRDTSQSILYLQMNALRAEDTAIYY CARDAWDWYFDVW GAGTTVTVSS_ 0|308|326|126|434|ST39CSC45GSA116GSG117CSG129A|1470.0 22|73|73|436|487|SA45G|241.0 s2_AAAGCAACAGCTGTGC-1 6 GACATCCAGATGAACCAGTCTCCATCCAGTCTGTCTGCATCCCTTGGAGACACAATTACCATCACTTGCCATGCCAGT CAGAACATTAATGTTTGG TTAAGCTGGTACCAGCAGAAACCAGGAAATATTCCTAAACTATTGATCTAT AAGGCTTCC AACTTGCACACAGGCGTCCCATCAAGGTTTAGTGGCAGTGGATCTGGAACAGGTTTCACATTAACCATCAGCAGCCTGCAGCCTGAAGACATTGCCACTTACTAC TGTCAACAGGGTCAAAGTTATCCTCTGACGTTC GGTGGAGGCACCAAGCTGGAAATCAAAC DIQMNQSPSSLSASLGDTITITCHAS QNINVW LSWYQQKPGNIPKLLIY KAS NLHTGVPSRFSGSGSGTGFTLTISSLQPEDIATYY CQQGQSYPLTF GGGTKLEIK_ 0|287|307|88|375||1435.0 23|58|58|375|410||175.0 s2_AAAGCAACAGCTGTGC-1 0 GAGGTGAAGCTGGTGGAATCTGGAGGAGGCTTGGTACAGCCTGGGGGTTCTCTGAGACTCTCCTGTGCAACTTCTGGGTTCACCTTCAGTGATTTCTACATGGAGTGGGTCCGCCAGCCTCCAGGGAAGAGACTGGAGTGGATTGCTGCAAGTAGAAACAAAGCTAATGATTATACAACAGAGTACAGTGCATCTGTGAAGGGTCGGTTCATCGTCTCCAGAGACACTTCCCAAAGCATCCTCTACCTTCAGATGAATGCCCTGAGAGCTGAGGACACTGCCATTTATTACTGTGCAAGAGATGCATGGGACTGGTACTTCGATGTCTGGGGCGCAGGGACCACGGTCACCGTCTCCTCAG GACATCCAGATGAACCAGTCTCCATCCAGTCTGTCTGCATCCCTTGGAGACACAATTACCATCACTTGCCATGCCAGTCAGAACATTAATGTTTGGTTAAGCTGGTACCAGCAGAAACCAGGAAATATTCCTAAACTATTGATCTATAAGGCTTCCAACTTGCACACAGGCGTCCCATCAAGGTTTAGTGGCAGTGGATCTGGAACAGGTTTCACATTAACCATCAGCAGCCTGCAGCCTGAAGACATTGCCACTTACTACTGTCAACAGGGTCAAAGTTATCCTCTGACGTTCGGTGGAGGCACCAAGCTGGAAATCAAAC EVKLVESGGGLVQPGGSLRLSCATSGFTFSDFYMEWVRQPPGKRLEWIAASRNKANDYTTEYSASVKGRFIVSRDTSQSILYLQMNALRAEDTAIYYCARDAWDWYFDVWGAGTTVTVSS_ DIQMNQSPSSLSASLGDTITITCHASQNINVWLSWYQQKPGNIPKLLIYKASNLHTGVPSRFSGSGSGTGFTLTISSLQPEDIATYYCQQGQSYPLTFGGGTKLEIK_

The second element contains a list for every structure with the ranked models. So here the structure from cell “s1_AAACCTGCATTGGCGC-1” is selected and from all predictions the one with the highest confidence “ranked_0.pdb” is chosen. This is a pdb object with the following attributes: atom, xyz, calpha, call

VDJ_top_clonotypes_structure[[2]]$`s1_AAACCTGCATTGGCGC-1`$ranked_0.pdb
## 
##  Call:  bio3d::read.pdb(file = paste0(CurDir, "/", OutDir, "/", out[i], 
##     "_ranked/", "ranked_", n - 1, ".pdb"))
## 
##    Total Models#: 1
##      Total Atoms#: 3419,  XYZs#: 10257  Chains#: 2  (values: A B)
## 
##      Protein Atoms#: 3419  (residues/Calpha atoms#: 225)
##      Nucleic acid Atoms#: 0  (residues/phosphate atoms#: 0)
## 
##      Non-protein/nucleic Atoms#: 0  (residues: 0)
##      Non-protein/nucleic resid values: [ none ]
## 
##    Protein sequence:
##       EVQLLETGGGLVQPGGSRGLSCEGSGFTFSGFWMSWVRQTPGKTLEWIGDINSDGSAINY
##       APSIKDRFTIFRDNDKSTLYLQMSNVRSEDTATYFCMRYSAYWYFDVWGAGTTVTVSSDI
##       KMTQSPSSMYASLGERVTITCKASQDIKSYLSWYQQKPWKSPKTLIYYATSLADGVPSRF
##       SGSGSGQDYSLTISSLESDDTATYYCLQHGESPYTFGGGTKLEIK
## 
## + attr: atom, xyz, calpha, call

2.3.2 Structure prediction from local FASTA files

The AlphaFold_prediction function can also be used to predict a protein of interest form a local FASTA file. For this purpose the argument fasta.directory.path can be specified with a path to a directory containing the FASTA files. The function is written in a way that the structure is predicted for all files in the specified directory with the ending .fasta. Like this a lot of structures can be predicted with calling the function only once and without adding a separate path for every file. Here we are only interested in the structure of ovalbumin, so the fasta.directory.path points to a directory which only contains the FASTA file of ovalbumin.

#AlphaFold structure prediction from local fasta file
AlphaFold_prediction(fasta.directory.path = "OVA", 
                     euler.user.name = "lucstalder", 
                     dir.name = "OVA_only")

The structure can then be imported with the same function as for prediction. If there is no vgm object specified, the output will just be a list of structures with a sublist for each prediction containing the top ranked pdb files.

OVA_structure <- AlphaFold_prediction(import = "euler", 
                                      euler.user.name = "lucstalder", 
                                      euler.dirname = "OVA", 
                                      n.ranked = 5)
OVA_structure[[1]]$OVA$ranked_0.pdb
## 
##  Call:  bio3d::read.pdb(file = paste0(CurDir, "/", OutDir, "/", out[i], 
##     "_ranked/", "ranked_", n - 1, ".pdb"))
## 
##    Total Models#: 1
##      Total Atoms#: 5999,  XYZs#: 17997  Chains#: 1  (values: A)
## 
##      Protein Atoms#: 5999  (residues/Calpha atoms#: 386)
##      Nucleic acid Atoms#: 0  (residues/phosphate atoms#: 0)
## 
##      Non-protein/nucleic Atoms#: 0  (residues: 0)
##      Non-protein/nucleic resid values: [ none ]
## 
##    Protein sequence:
##       MGSIGAASMEFCFDVFKELKVHHANENIFYCPIAIMSALAMVYLGAKDSTRTQINKVVRF
##       DKLPGFGDSIEAQCGTSVNVHSSLRDILNQITKPNDVYSFSLASRLYAEERYPILPEYLQ
##       CVKELYRGGLEPINFQTAADQARELINSWVESQTNGIIRNVLQPSSVDSQTAMVLVNAIV
##       FKGLWEKAFKDEDTQAMPFRVTEQESKPVQMMYQIGLFRVASMAS...<cut>...CVSP
## 
## + attr: atom, xyz, calpha, call

2.3.3 Structure prediction with Antigen interaction

It might be of interest not only to get the structure of the antibody but also to see AlphaFold’s prediction of antigen interaction. Therefore the antigen can be added to the prediction as an additional domain.
This can be easily done by specifying the path to the antigen FASTA file in the antigen.fasta.path argument of the function.

Here we are interested to see the prediction of some experimentally identified binders to the OVA protein. So we first filter the vgm object for the given samples

VDJ_OVA_binder <- vgm[[1]] %>% 
                  filter(barcode %in% c("s1_AAACCTGCACGTCTCT-1", 
                                        "s1_AACCGCGGTTCAGTAC-1", 
                                        "s2_TTTGCGCTCGGCTACG-1", 
                                        "s2_AACTGGTAGTGCAAGC-1", 
                                        "s5_ATTACTCAGGCTCTTA-1"))

For prediction we need to get the full length amino acid HC and LC sequence with MIXCR

VDJ_OVA_binder_mixcr_out <- VDJ_call_MIXCR_full(VDJ = VDJ_OVA_binder, 
                                                mixcr.directory = "/usr/local/Cellar/mixcr/3.0.13-2", 
                                                species = "mmu")
## MAC system detected

Now the structure of these antibodies can be predicted with the OVA antigen as an additional domain. The only thing to be added is the path to the antigen amino acid FASTA file in the antigen.fasta.path argument of the function.

#AlphaFold structure prediction
AlphaFold_prediction(VDJ.mixcr.out = VDJ_OVA_binder_mixcr_out, 
                     cells.to.predict = "ALL", 
                     euler.user.name = "lucstalder", 
                     dir.name = "OVA_binder_with_OVA", 
                     antigen.fasta.path = "OVA.fasta")

After prediction is finished the data is imported from the server:

#AlphaFold  import
OVA_binder_with_OVA_structure <- AlphaFold_prediction(VDJ.mixcr.out = VDJ_OVA_binder_mixcr_out, 
                                                      import = "euler", 
                                                      euler.user.name = "lucstalder", 
                                                      euler.dirname = "OVA_binder_with_OVA", 
                                                      n.ranked = 5, 
                                                      rm.local.output = F)

2.3 Visulisation and Analysis

2.3.1 Antibody visualisation

The predicted structures can be visualized and analysed in R using a variety of packages for protein structure analysis. For an easy visualization, annotation and analysis of antibody structures, the Platypus VDJ_structure_analysis function can be used. It is manly based on the bio3d and r3dmol packages. The VDJ_structure_analysis function visualizes the antibody structure and automatically annotates the framework and CDR regions.

Lets have a look at the three structures of the top most expanded clonotypes that were previously predicted in 2.3.1.

out <- VDJ_structure_analysis(VDJ.structure = VDJ_top_clonotypes_structure, 
                              cells.to.vis = "ALL")

The function automatically annotates the structure based on the VDJ MIXCR sequences. The annotation can be disabled by setting the VDJ.anno argument to FALSE.

out <- VDJ_structure_analysis(VDJ.structure = VDJ_top_clonotypes_structure, 
                       cells.to.vis = "s1_AAACCTGCATTGGCGC-1", 
                       VDJ.anno = F)

The function has a variety of options which can be seen in the description. The annotation label can for example be disabled, or the label size can be changed, the color of the different regions can be set individually and so on.

The framework color can be set to gray, for a better visualization of the CDR’s.

out <- VDJ_structure_analysis(VDJ.structure = VDJ_top_clonotypes_structure, 
                       cells.to.vis = "s1_AAACCTGCATTGGCGC-1", 
                       label = F, 
                       color.frameworks = "#C0C0C0", 
                       angle.x = 0, 
                       angle.z = -10, 
                       angle.y = 170)

The function can also be used to visualize structures directly from pdb files. Previously the structure of ovalbumin was predicted with AlphaFold and can be visualized now by the PDB.file option

out <- VDJ_structure_analysis(PDB.file = OVA_structure[[1]]$OVA$ranked_0.pdb)

If you would like to visualize a local .pdb file then you have to use the function bio3d::read.pdb() function to read in the PDB file as follows:

VDJ_structure_analysis(PDB.file = bio3d::read.pdb("Path/to/file.pdb"))

2.3.2 Binding Interaction

For the OVA binders the structure was predicted together with the antigen. The Platypus VDJ_structure_analysis function can be used to visualize the binding interaction and determine some binding site metrics.

First let’s have a glimpse at the structures

out <- VDJ_structure_analysis(VDJ.structure = OVA_binder_with_OVA_structure, 
                       cells.to.vis = "s1_AACCGCGGTTCAGTAC-1", 
                       angle.x = 40, 
                       angle.y = 200, 
                       angle.z = -50, 
                       label = F)

Experimentally there were some distinct epitops determined on the ovalbumin antigen. With the anno.seq argument of the VDJ_plot_structure function there is the possibility to annotate custom regions on the sequence. It can either be a list of multiple regions to annotate or a single region. For every annotation there are four mandatory elements: The residue index of the annotation’s start, the residue index of the annotation’s end, the protein chain (this is alphabetic A:Z and in the same order as the sequences in the FASTA file) and the color of the annotation. There is an optional fifth element if you would like to have a label for the annotation.

For the OVA there are three bins to be annotated so a list of annotations is specified. First the OVA structure is set to gray, then annotations for Bin1 to Bin3 are specified with the respective color and the label. Bin3 consists of two regions so there is a second annotation with the same color but no label.

OVA_Bins <- list(
  c(1,392,"C","#b3b3b3"),
  c(186,201,"C","#9900ff","Bin1"),
  c(133,148,"C","#ffcc00","Bin2"),
  c(265,280,"C","#0000ff","Bin3"),
  c(214,229,"C","#0000ff")
)

Now the created custom annotation can be added by the anno.seq argument.

out <- VDJ_structure_analysis(VDJ.structure = OVA_binder_with_OVA_structure, 
                       cells.to.vis = "s1_AACCGCGGTTCAGTAC-1", 
                       angle.x = 40, 
                       angle.y = 200, 
                       angle.z = -50, 
                       label = F, 
                       anno.seq = OVA_Bins)

The three bins are now visible and the antibody seems to interact with the antigen somewhere close to Bin3. To get some more information about the binding site there is the antigen.interaction option. This will determine the binding site residues based on the bio3d::binding.site function and then calculate some metrics of the binding site residues which are summarized in a data frame.

By setting the BindingResidues.plot argument to TRUE, a plot is shown where the binding residues are colored on the antigen and on the antibody.

out <- VDJ_structure_analysis(VDJ.structure = OVA_binder_with_OVA_structure, 
                       cells.to.vis = "s1_AACCGCGGTTCAGTAC-1", 
                       structure.plot = F, 
                       anno.seq = OVA_Bins, 
                       antigen.interaction = T, 
                       BindingResidues.plot = T, 
                       angle.x = 40, 
                       angle.y = 200, 
                       angle.z = -50)

Alpha Fold has a confidence score for every residue in form a pLDDT value. This is a scale from 1 to 100 where a higher score means more confidence in the model prediction. The per residue pLDDT scores can also be visualized with the VDJ_structure_analysis function by setting the plddt.plot argument to TRUE. In this plot blue means high confidence and red low confidence. It can be seen that the binding residues are normally of lower confidence compared to the rest of the structure.

out <- VDJ_structure_analysis(VDJ.structure = OVA_binder_with_OVA_structure, 
                       cells.to.vis = "s1_AACCGCGGTTCAGTAC-1", 
                       structure.plot = F, 
                       anno.seq = OVA_Bins, 
                       antigen.interaction = T, 
                       plddt.plot = T, 
                       angle.x = 40, 
                       angle.y = 200, 
                       angle.z = -50)

Now let’s have a look at the binding site metrics data frame that is produced when the antigen.interaction label is set to TRUE. The first two columns are just the cellular barcode and the rank of the model that was analyzed. Then the mean confidence (pLDDT) is summarized for the non binding residues over the whole structure as well as for the binding residues separately for heavy chain, light chain and antigen.
Then the mean minimal distance is shown in the Mean_dist_bind_resi column. For every binding residue on the antibody the distance to its closest partner on the antigen is calculated and then the mean over all binding residues is calculated. So it is the mean of all the minimal distances.
In the last three columns the residue indexes for all the binding sites are summarized as a semi colon separated list.

out <- VDJ_structure_analysis(VDJ.structure = OVA_binder_with_OVA_structure, 
                       cells.to.vis = "s1_AACCGCGGTTCAGTAC-1", 
                       structure.plot = F, 
                       anno.seq = OVA_Bins, 
                       antigen.interaction = T)
barcode rank Mean_plddt.non_bind_resi. Mean_plddt_bind_resi_HC. Mean_plddt_bind_resi_LC. Mean_plddt_bind_resi_antigen. Mean_dist_bind_resi Bind_res_HC Bind_res_LC Bind_res_antigen
s1_AACCGCGGTTCAGTAC-1 ranked_0.pdb 92.15 68.25 68.83 67.64 8.03 28;30;31;32;33;35;50;52;54;55;56;57;58;59;99;100;101;102;103 28;29;30;31;32;50;53;56;92;93;94 97;186;187;217;218;219;232;233;234;235;236;237;238;239;241;271;274;275;276;277;278;340;342;343;344;345;346;347;348;349;350;351;354;355;372

It might be of interest to see to which region of the antibody the binding site residues belong to. It would be expected that manly the CDR’s are involved in binding and most prominently the CDR3. The Platypus VDJ_structure_analysis function has an argument binding.residue.barplot, which will plot the distribution of the biding site residues to the regions of the antigen in a bar plot.

out <- VDJ_structure_analysis(VDJ.structure = OVA_binder_with_OVA_structure, 
                       cells.to.vis = "s1_AACCGCGGTTCAGTAC-1", 
                       structure.plot = F, 
                       anno.seq = OVA_Bins, 
                       antigen.interaction = T, 
                       binding.residue.barplot = T)
Sum_nr_bind_res region Sum_nr_bind_res_pct
VDJ_CDR1 5 VDJ_CDR1 0.20
VDJ_FR2 2 VDJ_FR2 0.08
VDJ_CDR2 6 VDJ_CDR2 0.24
VDJ_FR3 1 VDJ_FR3 0.04
VDJ_CDR3 5 VDJ_CDR3 0.20
VDJ_FR4 0 VDJ_FR4 0.00
VJ_FR1 0 VJ_FR1 0.00
VJ_FR2 0 VJ_FR2 0.00
VJ_CDR2 1 VJ_CDR2 0.04
VJ_FR3 2 VJ_FR3 0.08
VJ_CDR3 3 VJ_CDR3 0.12
VJ_FR4 0 VJ_FR4 0.00

A bar plot is produced for every structure that is analyzed with the absolute counts of binding residues for each region of the antibody. Furthermore, a summary bar plot is produced where a percentage of binding site residues is given for each region. This plot is the average over all the analyzed structures. In this example, only one structure was analyzed, so the summary plot is only based on this one structure. The data of for this summary baplot is also returned as a dataframe.

For this vignette we might be interested in the summary plot of all the predicted structures

out <- VDJ_structure_analysis(VDJ.structure = OVA_binder_with_OVA_structure, 
                              cells.to.vis = "ALL", 
                              structure.plot = F, 
                              anno.seq = OVA_Bins, 
                              antigen.interaction = T, 
                              binding.residue.barplot = T)

It can be nicely seen that for the heavy chain most of the binding residues are actually part of the CDR3 and in the light chain they are part of the CDR’s as well.

The barplot can also be shown in a simplyfied version where only the distribution between frame work vs CDR is sown. For this the binding.residue.barplot.style argument can be set to “FR_CDR”

out <- VDJ_structure_analysis(VDJ.structure = OVA_binder_with_OVA_structure, 
                              cells.to.vis = "ALL", 
                              structure.plot = F, 
                              anno.seq = OVA_Bins, 
                              antigen.interaction = T, 
                              binding.residue.barplot = T, 
                              binding.residue.barplot.style = "FR_CDR")
## Not for every entry a structure is defined! All the defined structures are analysed.

Here it can be seen that for the analyzed models in this vignette 80% of all the binding residues are actually part of the CDR regions. Which is quite astonishing to see AlphaFold modeling the binding of an antibody with the CDRs as main binding interaction partners.

3 Appendix

One additional feature of the VDJ_structure_analysis function is the r3dmol.code argument, where additional custom lines of code can be added to the r3dmol visualizations. This allows for the highest flexibility. The code has to be in single quotes ’ ’ and start with a pipe %>% . So for example the only binding site residues can be shown as stick model by adding the following lines of code in the r3dmol.code argument.

out <- VDJ_structure_analysis(VDJ.structure = OVA_binder_with_OVA_structure, 
                       cells.to.vis = "s1_AACCGCGGTTCAGTAC-1", 
                       angle.x = 40, 
                       angle.y = 200, 
                       angle.z = -50, 
                       label = F, 
                       anno.seq = OVA_Bins,
                       r3dmol.code = ' %>% 
                       m_set_style(
                        style = m_style_stick(), 
                        sel = m_sel(
                          chain = "A",resi = 
                            c(28,30,31,32,33,35,50,52,54,55,56,57,58,59,99,100,101,102,103)
                          )
                        ) %>% 
                       
                       m_set_style(
                        style = m_style_stick(), 
                        sel = m_sel(
                          chain = "B",resi = 
                            c(28,29,30,31,32,50,53,56,92,93,94)
                          )
                        ) %>%
                        
                        m_set_style(
                        style = m_style_stick(), 
                        sel = m_sel(
                          chain = "C",resi = 
                          c(97,186,187,217,218,219,232,233,234,235,236,237,238,239,241,271,274,275,276,277,278,340,342,343,344,345,346,347,348,349,350,351,354,355,372)
                          )
                        )')