library(Platypus)
library(tidyverse)
library(stringr)
library(utils)
library(tidyr)
library(dplyr)
library(stats, warn.conflicts = F)
library(ssh)
library(Seurat)
library(bio3d)
library(r3dmol)
source("VDJ_select_clonotypes.R")
source("AlphaFold_prediction.R")
source("VDJ_structure_analysis.R")
The Platypus package contains various tools and pipelines for computational immunology. This vignettes focus is on the use of Platypus for structural analysis of antibodies. Starting by loading VDJ data to a VGM object, this vignette guides you through the process of getting germline reference sequences by using MIXCR, determine the most frequent clonotypes and do structure prediction of antibodies with AlphaFold. Furthermore, Platypus contains functions for visualization of antibody structures and determination of structural and binding metrics.
In order to predict a structure with AlphaFold, one must have access to a server or HPC with Alpha Fold installed and GPU usage access. For Platypus users with an ETH user account, the prediction function can connect to your account on the Euler cluster and automatically start the prediction and later import the structures directly from the cluster back into the VGM object. For non ETH members, the function will output the FASTA files in the right format to be used as an input for running AlphaFold on a custom server. After prediction is done, the function can import the structures form a local directory back into the VGM object.
In order to run Alpha Fold on the ETH Euler cluster, you need to have ETH credentials to connect to the Euler cluster (euler@ethz.ch). Furthermore, you need to have GPU usage access, which can only be acquired by research groups. So make sure that your research group has GPU access and that you are part of the group on Euler. If this is not the case, contact your departments IT support and ask them to add you to the Euler user list of your group. To use the function, you have to be connected to the ETH network or have a VPN connection.
The antibody structure can directly be predicted from a VGM object. In this Vignette the OVA data set from Neumeier et al. 2021 is used.
#Downloading PlatypusDB raw data in a list format
#For structure of PlatypusDB links, please refer to the PlatypusDB vignette
<- PlatypusDB_fetch(PlatypusDB.links =
neumeier2021_raw c("neumeier2021b/ALL/VDJ"),
load.to.enviroment = F,
load.to.list = T,)
#Running the VDJ_GEX_matrix function
<- VDJ_GEX_matrix(Data.in = neumeier2021_raw,
vgm verbose = T,
select.excess.chains.by.umi.count = T,
excess.chain.confidence.count.threshold = 1000)
Prior to the structure prediction, desired antibody sequences of the most expanded clonotypes can be selected with the VDJ_select_clonotypes function. Furthermore, this function will call MIXCR to determine the germline reference sequences, which will be used to annotate the predicted sequences. Therefore a copy of MIXCR must be installed locally on your computer and the path has to be specified in the mixcr.direcotry argument. In order to integrate the UMI counts to the VGM data, this function needs the VGM and the raw data from the PlatypusDB_fetch function as an input. The function selects the most expanded clonotypes per sample by default (overall most expanded clonotypes can be selected by setting clonotypes.per.sample to FALSE). Among the clonotypes, the entries are ranked according to UMI counts, where the most prominent unique VDJ/VJ sequences are selected per clonotype. The number of clonotypes can be specified in the top.clontypes argument and the number of sequences per clonotype in the seq.per.clonotype argument.
Here the top three most expanded clonotypes per sample with the two most prominent sequences per clonotype are selected.
<- VDJ_select_clonotypes(VGM = vgm,
VDJ_top_clonotypes_mixcr_out raw.data = neumeier2021_raw,
clone.strategy = "10x.default",
top.clonotypes = 3,
seq.per.clonotype = 2,
donut.plot = F,
mixcr.directory = "/usr/local/Cellar/mixcr/3.0.13-2",
species = "mmu")
## MAC system detected
## Clonotyping strategy: 10x.default
## Processing sample 1 of 5
## Processing sample 2 of 5
## Processing sample 3 of 5
## Processing sample 4 of 5
## Processing sample 5 of 5
## Backing up 10x default clonotyping in columns clonotype_id_10x and clonotype_frequency_10x before updating clonotype_id and clonotype_frequency columns
kable(VDJ_top_clonotypes_mixcr_out %>% head(),caption = "MIXCR output")
Instead of using the VDJ_select_clonotypes function, sequences can also be selected based on custom filters from the vgm object. The germline reference sequences can then be added by calling the VDJ_call_MIXCR_full function. Here only unique reads from sample s1 with the clonotype ID “clonotype4” are considered.
<- vgm[[1]] %>%
vgm_selection filter(sample_id == "s1" & clonotype_id_10x == "clonotype4") %>%
distinct(VDJ_raw_ref, .keep_all = T)
<- VDJ_call_MIXCR_full(VDJ = vgm_selection,
VDJ_mixcr_out mixcr.directory = "/usr/local/Cellar/mixcr/3.0.13-2",
species = "mmu")
## MAC system detected
The MIXCR output can be used as an input for the AlphaFold_prediction function. For ETH members with access to the Euler cluster and GPU access, the function will connect to the Euler cluster and send the AlphaFold structure predictions to the queue. The function will therefore ask for your password, which is handled in a safe manner by the shh package of R.
In the VDJ.mixcr.out argument the MIXCR output data frame either from the VDJ_select_clonotypes function or from calling the VDJ_call_MIXCR_full function can be specified as an input. In the cells.to.predict argument one can additionally specify a list of barcodes from which cells the structure shall be predicted. If it is set to “ALL”, all the structures from of the input data frame are predicted.
In the euler.user.name argument the ETH username for the Euler cluster has to be specified, which is the regular ETH username. In order to connect to server one has either to be connected to the ETH network or use a VPN connection. The function will then create a folder on your computer with all the FASTA files, which are then copied to the Euler Cluster in your scratch directory. By default the directory will be named AlphaFold_Fasta, however by specifying the argument dir.name the directory can be given any name.
Here we use the output from the VDJ_select_clonotypes function and predict the structure of three cells specified in cells.to.predict
#AlphaFold structure prediction
AlphaFold_prediction(VDJ.mixcr.out = VDJ_top_clonotypes_mixcr_out,
cells.to.predict = c("s1_AAACCTGCATTGGCGC-1",
"s2_AAAGATGAGCATCATC-1",
"s3_AAAGATGTCTCTAGGA-1"),
euler.user.name = "lucstalder",
dir.name = "top_clonotypes_OVA")
This generates a single AlphaFold prediction job for each structure.
Depending on the size and the number of structures this will take
several hours to complete. Once the prediction is finished, the
structures can be imported using the same function.
In the VDJ.mixcr.out argument the same input is
specified as used previously for the prediction. In this case the
VDJ_top_clonotypes_mixcr_out data frame. In order to import data, the
import argument must be specified. Here the structures
are imported directly from the Euler cluster and therefore import =
“euler” is used.
As the directory was named “top_clonotypes_OVA” previously, the
euler.dirname argument must be specified to tell the
function in which directory the predicted outputs can be found.
AlphaFold produces multiple outputs for each structure which are then
ranked according to the overall confidence, where the prediction with
the highest confidence is named “ranked_0”. In the import function the
number of ranked predictions per structure can be specified in the
n.ranked argument, where n.ranked = 5 will import the
top five ranked predictions per structure. The structures will be copied
from the server to a directory on your local computer before they are
loaded to the environment. By default, the local copy will be deleted
afterwards. Here we want to keep them and therefore the
rm.local.output is set to FALSE.
#AlphaFold import
<- AlphaFold_prediction(VDJ.mixcr.out = VDJ_top_clonotypes_mixcr_out,
VDJ_top_clonotypes_structure import = "euler",
euler.user.name = "lucstalder",
euler.dirname = "top_clonotypes_OVA",
n.ranked = 5,
rm.local.output = F)
The functions output is a list object which has in its first element the MIXCR output data frame that was used as input for structure prediction and in its second element a list of the predicted structures. As there are multiple ranked predictions per structure, the second list element of the output contains a list for each structure with the ranked predictions.
Lets have a look at the first list entry, which is just the VDJ data frame with the MIXCR information.barcode | orig_barcode | sample_id_clonotype | sample_id | FB_assignment | group_id | celltype | Nr_of_VDJ_chains | Nr_of_VJ_chains | VDJ_cdr3s_aa | VJ_cdr3s_aa | VDJ_cdr3s_nt | VJ_cdr3s_nt | VDJ_chain_contig | VJ_chain_contig | VDJ_chain | VJ_chain | VDJ_vgene | VJ_vgene | VDJ_dgene | VDJ_jgene | VJ_jgene | VDJ_cgene | VJ_cgene | VDJ_sequence_nt_raw | VJ_sequence_nt_raw | VDJ_sequence_nt_trimmed | VJ_sequence_nt_trimmed | VDJ_sequence_aa | VJ_sequence_aa | VDJ_raw_ref | VJ_raw_ref | VDJ_trimmed_ref | VJ_trimmed_ref | VDJ_raw_consensus_id | VJ_raw_consensus_id | clonotype_frequency | specifity | affinity | batches | clonotype_id | umis | reads | clonotype_frequency_10x.default | clonotype_id_10x.default | clonal_feature_10x.default | clonotype_frequency_10x | clonotype_id_10x | VDJ_nSeqFR1 | VDJ_nSeqCDR1 | VDJ_nSeqFR2 | VDJ_nSeqCDR2 | VDJ_nSeqFR3 | VDJ_nSeqCDR3 | VDJ_nSeqFR4 | VDJ_aaSeqFR1 | VDJ_aaSeqCDR1 | VDJ_aaSeqFR2 | VDJ_aaSeqCDR2 | VDJ_aaSeqFR3 | VDJ_aaSeqCDR3 | VDJ_aaSeqFR4 | VDJ_bestVAlignment | VDJ_bestJAlignment | VDJ_bestDAlignment | VDJ_descrsR1 | VDJ_SHM | VJ_nSeqFR1 | VJ_nSeqCDR1 | VJ_nSeqFR2 | VJ_nSeqCDR2 | VJ_nSeqFR3 | VJ_nSeqCDR3 | VJ_nSeqFR4 | VJ_aaSeqFR1 | VJ_aaSeqCDR1 | VJ_aaSeqFR2 | VJ_aaSeqCDR2 | VJ_aaSeqFR3 | VJ_aaSeqCDR3 | VJ_aaSeqFR4 | VJ_bestVAlignment | VJ_bestJAlignment | VJ_descrsR1 | VJ_SHM | VDJ_nt_mixcr | VJ_nt_mixcr | VDJ_aa_mixcr | VJ_aa_mixcr |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
s1_AAACCTGCATTGGCGC-1 | AAACCTGCATTGGCGC | s1_clonotype2 | s1 | Not assignable | 1 | B cell | 1 | 1 | CMRYSAYWYFDVW | CLQHGESPYTF | TGTATGAGATATTCTGCTTACTGGTACTTCGATGTCTGG | TGTCTACAGCATGGTGAGAGCCCGTACACGTTC | AAACCTGCATTGGCGC-1_contig_1 | AAACCTGCATTGGCGC-1_contig_2 | IGH | IGK | IGHV11-2 | IGKV14-126 | IGHJ1 | IGKJ2 | IGHM | IGKC | GCAAATAGGGCCTCTTTCTCCTCATGAAACGCAGACCAACCTATCCTTGCAGTTCAGACATAGGAGCTTGGCTCTGGTTCCCAAGACCTCTCACTCACTTCTCAACATGGAGTGGGAACTGAGCTTAATTTTCATTTTTGCTCTTTTAAAAGATGTCCAGTGTGAAGTGCAGCTGTTGGAGACTGGAGGAGGCTTGGTGCAACCTGGGGGGTCACGGGGACTCTCTTGTGAAGGCTCAGGGTTTACTTTTAGTGGCTTCTGGATGAGCTGGGTTCGACAGACACCTGGGAAGACCCTGGAGTGGATTGGAGACATTAATTCTGATGGCAGTGCAATAAACTACGCACCATCCATAAAGGATCGATTCACTATCTTCAGAGACAATGACAAGAGCACCCTGTACCTGCAGATGAGCAATGTGCGATCTGAGGACACAGCCACGTATTTCTGTATGAGATATTCTGCTTACTGGTACTTCGATGTCTGGGGCGCAGGGACCACGGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA | GGGGTTGTCATTGCAGTCAGGACTCAGCATGGACATGAGGGCCCCTGCTCAGTTTTTTGGGATCTTGTTGCTCTGGTTTCCAGGTATCAGATGTGACATCAAGATGACCCAGTCTCCATCCTCCATGTATGCATCGCTGGGAGAGAGAGTCACTATCACTTGCAAGGCGAGTCAGGACATTAAAAGCTATTTAAGCTGGTACCAGCAGAAACCATGGAAATCTCCTAAGACCCTGATCTATTATGCAACAAGCTTGGCAGATGGGGTCCCATCAAGATTCAGTGGCAGTGGATCTGGGCAAGATTATTCTCTAACCATCAGCAGCCTGGAGTCTGACGATACAGCAACTTATTACTGTCTACAGCATGGTGAGAGCCCGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC | ATGCAAATAGGGCCTCTTTCTACTCATGAAACGCAGACCAACCTATCCTTGCAGTTCAGACAGAGGAGCTTGGCTCTGGTTCCCAAGACCTCTTACTCACTTCTCAACATGGAGTGGGAACTGAGCTTAATTTTCATTTTTGCTCTTTTAAAAGATGTCCAGTGTGAAGTGCAGCTGTTGGAGACTGGAGGAGGCTTGGTGCAACCTGGGGGGTCACGGGGACTCTCTTGTGAAGGCTCAGGGTTCACTTTTAGTGGCTTCTGGATGAGCTGGGTTCGACAGACACCTGGGAAGACCCTGGAGTGGATTGGAGACATTAATTCTGATGGCAGTGCAATAAACTACGCACCATCCATAAAGGATCGATTCACTATCTTCAGAGACAATGACAAGAGCACCCTGTACCTGCAGATGAGCAATGTGCGATCGGAGGACACAGCCACGTATTTCTGTATGAGATACTACTGGTACTTCGATGTCTGGGGCACAGGGACCACGGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT | ATGGACATGAGGGCCCCTGCTCAGTTTTTTGGGATCTTGTTGCTCTGGTTTCCAGGTATCAGATGTGACATCAAGATGACCCAGTCTCCATCCTCCATGTATGCATCGCTGGGAGAGAGAGTCACTATCACTTGCAAGGCGAGTCAGGACATTAAAAGCTATTTAAGCTGGTACCAGCAGAAACCATGGAAATCTCCTAAGACCCTGATCTATTATGCAACAAGCTTGGCAGATGGGGTCCCATCAAGATTCAGTGGCAGTGGATCTGGGCAAGATTATTCTCTAACCATCAGCAGCCTGGAGTCTGACGATACAGCAACTTATTACTGTCTACAGCATGGTGAGAGCCCTCCATTCACGTTCGGCTCGGGGACAAAGTTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGAACAACTTCTACCCCAAAGACATCAATGTCAAGTGGAAGATTGATGGCAGTGAACGACAAAATGGCGTCCTGAACAGTTGGACTGATCAGGACAGCAAAGACAGCACCTACAGCATGAGCAGCACCCTCACGTTGACCAAGGACGAGTATGAACGACATAACAGCTATACCTGTGAGGCCACTCACAAGACATCAACTTCACCCATTGTCAAGAGCTTCAACAGGAATGAGTGT | clonotype2_concat_ref_1 | clonotype2_concat_ref_2 | 283 | NA | NA | Unspecified | clonotype2 | 1007 | 3328 | 283 | clonotype2 | clonotype2 | 287 | clonotype2 | GAAGTGCAGCTGTTGGAGACTGGAGGAGGCTTGGTGCAACCTGGGGGGTCACGGGGACTCTCTTGTGAAGGCTCA | GGGTTTACTTTTAGTGGCTTCTGG | ATGAGCTGGGTTCGACAGACACCTGGGAAGACCCTGGAGTGGATTGGAGAC | ATTAATTCTGATGGCAGTGCAATA | AACTACGCACCATCCATAAAGGATCGATTCACTATCTTCAGAGACAATGACAAGAGCACCCTGTACCTGCAGATGAGCAATGTGCGATCTGAGGACACAGCCACGTATTTC | TGTATGAGATATTCTGCTTACTGGTACTTCGATGTCTGG | GGCGCAGGGACCACGGTCACCGTCTCCTCAG | EVQLLETGGGLVQPGGSRGLSCEGS | GFTFSGFW | MSWVRQTPGKTLEWIGD | INSDGSAI | NYAPSIKDRFTIFRDNDKSTLYLQMSNVRSEDTATYF | CMRYSAYWYFDVW | GAGTTVTVSS_ | 0|301|316|163|463|SC80TSG263TDA297|1460.0 | 21|73|73|466|518|SA45G|246.0 | s1_AAACCTGCATTGGCGC-1 | 3 | GACATCAAGATGACCCAGTCTCCATCCTCCATGTATGCATCGCTGGGAGAGAGAGTCACTATCACTTGCAAGGCGAGT | CAGGACATTAAAAGCTAT | TTAAGCTGGTACCAGCAGAAACCATGGAAATCTCCTAAGACCCTGATCTAT | TATGCAACA | AGCTTGGCAGATGGGGTCCCATCAAGATTCAGTGGCAGTGGATCTGGGCAAGATTATTCTCTAACCATCAGCAGCCTGGAGTCTGACGATACAGCAACTTATTAC | TGTCTACAGCATGGTGAGAGCCCGTACACGTTC | GGAGGGGGGACCAAGCTGGAAATAAAAC | DIKMTQSPSSMYASLGERVTITCKAS | QDIKSY | LSWYQQKPWKSPKTLIY | YAT | SLADGVPSRFSGSGSGQDYSLTISSLESDDTATYY | CLQHGESPYTF | GGGTKLEIK_ | 0|284|307|94|378||1420.0 | 21|59|59|378|416||190.0 | s1_AAACCTGCATTGGCGC-1 | 0 | GAAGTGCAGCTGTTGGAGACTGGAGGAGGCTTGGTGCAACCTGGGGGGTCACGGGGACTCTCTTGTGAAGGCTCAGGGTTTACTTTTAGTGGCTTCTGGATGAGCTGGGTTCGACAGACACCTGGGAAGACCCTGGAGTGGATTGGAGACATTAATTCTGATGGCAGTGCAATAAACTACGCACCATCCATAAAGGATCGATTCACTATCTTCAGAGACAATGACAAGAGCACCCTGTACCTGCAGATGAGCAATGTGCGATCTGAGGACACAGCCACGTATTTCTGTATGAGATATTCTGCTTACTGGTACTTCGATGTCTGGGGCGCAGGGACCACGGTCACCGTCTCCTCAG | GACATCAAGATGACCCAGTCTCCATCCTCCATGTATGCATCGCTGGGAGAGAGAGTCACTATCACTTGCAAGGCGAGTCAGGACATTAAAAGCTATTTAAGCTGGTACCAGCAGAAACCATGGAAATCTCCTAAGACCCTGATCTATTATGCAACAAGCTTGGCAGATGGGGTCCCATCAAGATTCAGTGGCAGTGGATCTGGGCAAGATTATTCTCTAACCATCAGCAGCCTGGAGTCTGACGATACAGCAACTTATTACTGTCTACAGCATGGTGAGAGCCCGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAAC | EVQLLETGGGLVQPGGSRGLSCEGSGFTFSGFWMSWVRQTPGKTLEWIGDINSDGSAINYAPSIKDRFTIFRDNDKSTLYLQMSNVRSEDTATYFCMRYSAYWYFDVWGAGTTVTVSS_ | DIKMTQSPSSMYASLGERVTITCKASQDIKSYLSWYQQKPWKSPKTLIYYATSLADGVPSRFSGSGSGQDYSLTISSLESDDTATYYCLQHGESPYTFGGGTKLEIK_ | ||||||||
s1_AAAGCAATCTTTACAC-1 | AAAGCAATCTTTACAC | s1_clonotype2 | s1 | Not assignable | 1 | B cell | 1 | 1 | CMRYGNYWYFDVW | CLQHGESPFTF | TGTATGAGATATGGTAACTACTGGTACTTCGATGTCTGG | TGTCTACAGCATGGTGAGAGCCCATTCACGTTC | AAAGCAATCTTTACAC-1_contig_1 | AAAGCAATCTTTACAC-1_contig_2 | IGH | IGK | IGHV11-2 | IGKV14-126 | IGHJ1 | IGKJ4 | IGHM | IGKC | CAGACCAACCTATCCTTGCAGTTCAGACATAGGAGCTTGGCTCTGGTTCCCAAGACCTCTCACTCACTTCTCAACATGGAGTGGGAACTGAGCTTAATTTTCATTTTTGCTCTTTTAAAAGATGTCCAGTGTGAAGTGCAGCTGTTGGAGACTGGAGGAGGCTTGGTGCAACCTGGGGGGTCACGGGGACTCTCTTGTGAAGGCTCAGGGTTTACTTTTAGTGGCTTCTGGATGAGCTGGGTTCGACAGACACCTGGGAAGACCCTGGAGTGGATTGGAGACATTAATTCTGATGGCAGTGCAATAAACTACGCACCATCCATAAAGGATCGATTCACTATCTTCAGAGACAATGACAAGAGCACCCTGTACCTGCAGATGAGCAATGTGCGATCTGAGGACACAGCCACGTATTTCTGTATGAGATATGGTAACTACTGGTACTTCGATGTCTGGGGCGCAGGGACCACGGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA | GGGGTTGTCATTGCAGTCAGGACTCAGCATGGACATGAGGGCCCCTGCTCAGTTTTTTGGGATCTTGTTGCTCTGGTTTCCAGGTATCAGATGTGACATCAAGATGACCCAGTCTCCATCCTCCATGTATGCATCGCTGGGAGAGAGAGTCACTATCACTTGCAAGGCGAGTCAGGACATTAAAAGCTATTTAAGCTGGTACCAGCAGAAACCATGGAAATCTCCTAAGACCCTGATCTATTATGCAACAAGCTTGGCAGATGGGGTCCCATCAAGATTCAGTGGCAGTGGATCTGGGCAAGATTATTCTCTAACCATCAGCAGCCTGGAGTCTGACGATACAGCAACTTATTACTGTCTACAGCATGGTGAGAGCCCATTCACGTTCGGCTCGGGGACAAAGTTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC | ATGCAAATAGGGCCTCTTTCTACTCATGAAACGCAGACCAACCTATCCTTGCAGTTCAGACAGAGGAGCTTGGCTCTGGTTCCCAAGACCTCTTACTCACTTCTCAACATGGAGTGGGAACTGAGCTTAATTTTCATTTTTGCTCTTTTAAAAGATGTCCAGTGTGAAGTGCAGCTGTTGGAGACTGGAGGAGGCTTGGTGCAACCTGGGGGGTCACGGGGACTCTCTTGTGAAGGCTCAGGGTTCACTTTTAGTGGCTTCTGGATGAGCTGGGTTCGACAGACACCTGGGAAGACCCTGGAGTGGATTGGAGACATTAATTCTGATGGCAGTGCAATAAACTACGCACCATCCATAAAGGATCGATTCACTATCTTCAGAGACAATGACAAGAGCACCCTGTACCTGCAGATGAGCAATGTGCGATCGGAGGACACAGCCACGTATTTCTGTATGAGATACTACTGGTACTTCGATGTCTGGGGCACAGGGACCACGGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT | ATGGACATGAGGGCCCCTGCTCAGTTTTTTGGGATCTTGTTGCTCTGGTTTCCAGGTATCAGATGTGACATCAAGATGACCCAGTCTCCATCCTCCATGTATGCATCGCTGGGAGAGAGAGTCACTATCACTTGCAAGGCGAGTCAGGACATTAAAAGCTATTTAAGCTGGTACCAGCAGAAACCATGGAAATCTCCTAAGACCCTGATCTATTATGCAACAAGCTTGGCAGATGGGGTCCCATCAAGATTCAGTGGCAGTGGATCTGGGCAAGATTATTCTCTAACCATCAGCAGCCTGGAGTCTGACGATACAGCAACTTATTACTGTCTACAGCATGGTGAGAGCCCTCCATTCACGTTCGGCTCGGGGACAAAGTTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGAACAACTTCTACCCCAAAGACATCAATGTCAAGTGGAAGATTGATGGCAGTGAACGACAAAATGGCGTCCTGAACAGTTGGACTGATCAGGACAGCAAAGACAGCACCTACAGCATGAGCAGCACCCTCACGTTGACCAAGGACGAGTATGAACGACATAACAGCTATACCTGTGAGGCCACTCACAAGACATCAACTTCACCCATTGTCAAGAGCTTCAACAGGAATGAGTGT | clonotype2_concat_ref_1 | clonotype2_concat_ref_2 | 283 | NA | NA | Unspecified | clonotype2 | 209 | 752 | 283 | clonotype2 | clonotype2 | 287 | clonotype2 | GAAGTGCAGCTGTTGGAGACTGGAGGAGGCTTGGTGCAACCTGGGGGGTCACGGGGACTCTCTTGTGAAGGCTCA | GGGTTTACTTTTAGTGGCTTCTGG | ATGAGCTGGGTTCGACAGACACCTGGGAAGACCCTGGAGTGGATTGGAGAC | ATTAATTCTGATGGCAGTGCAATA | AACTACGCACCATCCATAAAGGATCGATTCACTATCTTCAGAGACAATGACAAGAGCACCCTGTACCTGCAGATGAGCAATGTGCGATCTGAGGACACAGCCACGTATTTC | TGTATGAGATATGGTAACTACTGGTACTTCGATGTCTGG | GGCGCAGGGACCACGGTCACCGTCTCCTCAG | EVQLLETGGGLVQPGGSRGLSCEGS | GFTFSGFW | MSWVRQTPGKTLEWIGD | INSDGSAI | NYAPSIKDRFTIFRDNDKSTLYLQMSNVRSEDTATYF | CMRYGNYWYFDVW | GAGTTVTVSS_ | 0|297|316|132|429|SC80TSG263T|1457.0 | 16|73|73|430|487|SG19ASA45G|257.0 | s1_AAAGCAATCTTTACAC-1 | 4 | GACATCAAGATGACCCAGTCTCCATCCTCCATGTATGCATCGCTGGGAGAGAGAGTCACTATCACTTGCAAGGCGAGT | CAGGACATTAAAAGCTAT | TTAAGCTGGTACCAGCAGAAACCATGGAAATCTCCTAAGACCCTGATCTAT | TATGCAACA | AGCTTGGCAGATGGGGTCCCATCAAGATTCAGTGGCAGTGGATCTGGGCAAGATTATTCTCTAACCATCAGCAGCCTGGAGTCTGACGATACAGCAACTTATTAC | TGTCTACAGCATGGTGAGAGCCCATTCACGTTC | GGCTCGGGGACAAAGTTGGAAATAAAAC | DIKMTQSPSSMYASLGERVTITCKAS | QDIKSY | LSWYQQKPWKSPKTLIY | YAT | SLADGVPSRFSGSGSGQDYSLTISSLESDDTATYY | CLQHGESPFTF | GSGTKLEIK_ | 0|284|307|94|378||1420.0 | 20|58|58|378|416||190.0 | s1_AAAGCAATCTTTACAC-1 | 0 | GAAGTGCAGCTGTTGGAGACTGGAGGAGGCTTGGTGCAACCTGGGGGGTCACGGGGACTCTCTTGTGAAGGCTCAGGGTTTACTTTTAGTGGCTTCTGGATGAGCTGGGTTCGACAGACACCTGGGAAGACCCTGGAGTGGATTGGAGACATTAATTCTGATGGCAGTGCAATAAACTACGCACCATCCATAAAGGATCGATTCACTATCTTCAGAGACAATGACAAGAGCACCCTGTACCTGCAGATGAGCAATGTGCGATCTGAGGACACAGCCACGTATTTCTGTATGAGATATGGTAACTACTGGTACTTCGATGTCTGGGGCGCAGGGACCACGGTCACCGTCTCCTCAG | GACATCAAGATGACCCAGTCTCCATCCTCCATGTATGCATCGCTGGGAGAGAGAGTCACTATCACTTGCAAGGCGAGTCAGGACATTAAAAGCTATTTAAGCTGGTACCAGCAGAAACCATGGAAATCTCCTAAGACCCTGATCTATTATGCAACAAGCTTGGCAGATGGGGTCCCATCAAGATTCAGTGGCAGTGGATCTGGGCAAGATTATTCTCTAACCATCAGCAGCCTGGAGTCTGACGATACAGCAACTTATTACTGTCTACAGCATGGTGAGAGCCCATTCACGTTCGGCTCGGGGACAAAGTTGGAAATAAAAC | EVQLLETGGGLVQPGGSRGLSCEGSGFTFSGFWMSWVRQTPGKTLEWIGDINSDGSAINYAPSIKDRFTIFRDNDKSTLYLQMSNVRSEDTATYFCMRYGNYWYFDVWGAGTTVTVSS_ | DIKMTQSPSSMYASLGERVTITCKASQDIKSYLSWYQQKPWKSPKTLIYYATSLADGVPSRFSGSGSGQDYSLTISSLESDDTATYYCLQHGESPFTFGSGTKLEIK_ | ||||||||
s1_AACACGTAGGGTCGAT-1 | AACACGTAGGGTCGAT | s1_clonotype3 | s1 | Not assignable | 1 | B cell | 1 | 1 | CTRSEDYYWFAYW | CLQYDEFPYTF | TGTACAAGATCAGAGGATTATTACTGGTTTGCTTACTGG | TGTCTACAGTATGATGAGTTTCCGTACACGTTC | AACACGTAGGGTCGAT-1_contig_1 | AACACGTAGGGTCGAT-1_contig_2 | IGH | IGK | IGHV1-5 | IGKV14-111 | IGHJ3 | IGKJ2 | IGHG1 | IGKC | GATCAGTATCCTCTTCACAGTCACTGAAAACATTGACTCTAATCATGGAATGTAACTGGATACTTCCTTTTATTCTGTCGGTAACTTCAGGGGTCTACTCAGAGGTTCAGCTCCAGCAGTCTGGGACTATGCTGGCAAGGCCTGGGGCTTCAGTGAAGATGTCCTGCAAGGCTTCTGGCTACACCTTTACCAACTACTGGATACACTGGGTAAAACAGAGGCCTGGACAGGGTCTGGAATGGATTGGCGCTATTTATCCTGGAAATAGTGATACTAGGTACAACCAGAAGTTCAAGGGCAAGGCCAAACTGACTGCAGTCACATCCACCACCACTGCCTACATGGATCTCAGCAGCCTGACAAATGAGGACTCTGCGGTCTATTGCTGTACAAGATCAGAGGATTATTACTGGTTTGCTTACTGGGGCCAAGGGACTCTGGTCACTGTCTCTGCAGCCAAAACGACACCCCCATCTGTCTATCCACTGGCCCCTGGATCTGCTGCCCAAACTAACTCCATGGTGACCCTGGGATGCCTGGTCAAGG | CACCCGTTTCTTATATGGGGATTGTCATTGCAGCCAGGACTCAGCATGGACATGAGGACCCCTGCTCAGTTTCTTGGAATCTTGTTGCTCTGGTTTCCAGGTATCAAATGTGACATCAAGATGACCCAGTCTCCATCTTCCATGTATGCATCTCTAGGAGAGAGAGTCACTATCACTTGCAAGGCGAGTCAGGACATTAATAGATATTTAAGCTGGTTCCAGCAGAAACCAGGGAAATCTCCTAAGACCCTGATCTATCGTGCAAACAGATTGGTAGATGGGGTCCCATCAAGGTTCAGTGGCAGTGGATCTGGGCAAGATTATTCTCTCACCATCAGCAGCCTGGAGTATGAAGATATGGGAATTTATTATTGTCTACAGTATGATGAGTTTCCGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC | GATCAGTATCCTCTTCACAGTCACTGAAAACATTGACTCTAATCATGGAATGTAACTGGATACTTCCTTTTATTCTGTCGGTAACTTCAGGGGTCTACTCAGAGGTTCAGCTCCAGCAGTCTGGGACTGTGCTGGCAAGGCCTGGGGCTTCAGTGAAGATGTCCTGCAAGACTTCTGGCTACACATTTACCAGCTACTGGATGCACTGGGTAAAACAGAGGCCTGGACAGGGTCTGGAATGGATAGGGGCTATTTATCCTGGAAATAGTGATACTAGCTACAACCAGAAGTTCAAGGGCAAGGCCAAACTGACTGCAGTCACATCCGCCAGCACTGCCTACATGGAGCTCAGCAGCCTGACAAATGAGGACTCTGCGGTCTATTACTGTACAAGACCTGGTTTGCTTACTGGGGCCAAGGGACTCTGGTCACTGTCTCTGCAGCCAAAACGACACCCCCATCTGTCTATCCACTGGCCCCTGGATCTGCTGCCCAAACTAACTCCATGGTGACCCTGGGATGCCTGGTCAAGGGCTATTTCCCTGAGCCAGTGACAGTGACCTGGAACTCTGGATCCCTGTCCAGCGGTGTGCACACCTTCCCAGCTGTCCTGCAGTCTGACCTCTACACTCTGAGCAGCTCAGTGACTGTCCCCTCCAGCACCTGGCCCAGCCAGACCGTCACCTGCAACGTTGCCCACCCGGCCAGCAGCACCAAGGTGGACAAGAAAATTGTGCCCAGGGATTGTGGTTGTAAGCCTTGCATATGTACAGTCCCAGAAGTATCATCTGTCTTCATCTTCCCCCCAAAGCCCAAGGATGTGCTCACCATTACTCTGACTCCTAAGGTCACGTGTGTTGTGGTAGACATCAGCAAGGATGATCCCGAGGTCCAGTTCAGCTGGTTTGTAGATGATGTGGAGGTGCACACAGCTCAGACGAAACCCCGGGAGGAGCAGATCAACAGCACTTTCCGTTCAGTCAGTGAACTTCCCATCATGCACCAGGACTGGCTCAATGGCAAGGAGTTCAAATGCAGGGTCAACAGTGCAGCTTTCCCTGCCCCCATCGAGAAAACCATCTCCAAAACCAAAGGCAGACCGAAGGCTCCACAGGTGTACACCATTCCACCTCCCAAGGAGCAGATGGCCAAGGATAAAGTCAGTCTGACCTGCATGATAACAAACTTCTTCCCTGAAGACATTACTGTGGAGTGGCAGTGGAATGGGCAGCCAGCGGAGAACTACAAGAACACTCAGCCCATCATGGACACAGATGGCTCTTACTTCGTCTACAGCAAGCTCAATGTGCAGAAGAGCAACTGGGAGGCAGGAAATACTTTCACCTGCTCTGTGTTACATGAGGGCCTGCACAACCACCATACTGAGAAGAGCCTCTCCCACTCTCCTGGTAAA | ATGGACATGAGGACCCCTGCTCAGTTTCTTGGAATCTTGTTGCTCTGGTTTCCAGGTATCAAATGTGACATCAAGATGACCCAGTCTCCATCTTCCATGTATGCATCTCTAGGAGAGAGAGTCACTATCACTTGCAAGGCGAGTCAGGACATTAATAGCTATTTAAGCTGGTTCCAGCAGAAACCAGGGAAATCTCCTAAGACCCTGATCTATCGTGCAAACAGATTGGTAGATGGGGTCCCATCAAGGTTCAGTGGCAGTGGATCTGGGCAAGATTATTCTCTCACCATCAGCAGCCTGGAGTATGAAGATATGGGAATTTATTATTGTCTACAGTATGATGAGTTTCCTCCTGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGAACAACTTCTACCCCAAAGACATCAATGTCAAGTGGAAGATTGATGGCAGTGAACGACAAAATGGCGTCCTGAACAGTTGGACTGATCAGGACAGCAAAGACAGCACCTACAGCATGAGCAGCACCCTCACGTTGACCAAGGACGAGTATGAACGACATAACAGCTATACCTGTGAGGCCACTCACAAGACATCAACTTCACCCATTGTCAAGAGCTTCAACAGGAATGAGTGT | clonotype3_concat_ref_1 | clonotype3_concat_ref_2 | 116 | NA | NA | Unspecified | clonotype3 | 1935 | 6006 | 116 | clonotype3 | clonotype3 | 118 | clonotype3 | GAGGTTCAGCTCCAGCAGTCTGGGACTATGCTGGCAAGGCCTGGGGCTTCAGTGAAGATGTCCTGCAAGGCTTCT | GGCTACACCTTTACCAACTACTGG | ATACACTGGGTAAAACAGAGGCCTGGACAGGGTCTGGAATGGATTGGCGCT | ATTTATCCTGGAAATAGTGATACT | AGGTACAACCAGAAGTTCAAGGGCAAGGCCAAACTGACTGCAGTCACATCCACCACCACTGCCTACATGGATCTCAGCAGCCTGACAAATGAGGACTCTGCGGTCTATTGC | TGTACAAGATCAGAGGATTATTACTGGTTTGCTTACTGG | GGCCAAGGGACTCTGGTCACTGTCTCTGCAG | EVQLQQSGTMLARPGASVKMSCKAS | GYTFTNYW | IHWVKQRPGQGLEWIGA | IYPGNSDT | RYNQKFKGKAKLTAVTSTTTAYMDLSSLTNEDSAVYC | CTRSEDYYWFAYW | GQGTLVTVSA_ | 0|296|314|101|397|SG27ASA69GSA83CSG91ASG101ASA143TSG146CSC176GSG225ASG229CSG245TSA283G|1312.0 | 21|68|68|409|456||235.0 | 24|30|69|403|409||30.0 | s1_AACACGTAGGGTCGAT-1 | 12 | GACATCAAGATGACCCAGTCTCCATCTTCCATGTATGCATCTCTAGGAGAGAGAGTCACTATCACTTGCAAGGCGAGT | CAGGACATTAATAGATAT | TTAAGCTGGTTCCAGCAGAAACCAGGGAAATCTCCTAAGACCCTGATCTAT | CGTGCAAAC | AGATTGGTAGATGGGGTCCCATCAAGGTTCAGTGGCAGTGGATCTGGGCAAGATTATTCTCTCACCATCAGCAGCCTGGAGTATGAAGATATGGGAATTTATTAT | TGTCTACAGTATGATGAGTTTCCGTACACGTTC | GGAGGGGGGACCAAGCTGGAAATAAAAC | DIKMTQSPSSMYASLGERVTITCKAS | QDINRY | LSWFQQKPGKSPKTLIY | RAN | RLVDGVPSRFSGSGSGQDYSLTISSLEYEDMGIYY | CLQYDEFPYTF | GGGTKLEIK_ | 0|284|307|111|395|SC92A|1406.0 | 21|59|59|395|433||190.0 | s1_AACACGTAGGGTCGAT-1 | 1 | GAGGTTCAGCTCCAGCAGTCTGGGACTATGCTGGCAAGGCCTGGGGCTTCAGTGAAGATGTCCTGCAAGGCTTCTGGCTACACCTTTACCAACTACTGGATACACTGGGTAAAACAGAGGCCTGGACAGGGTCTGGAATGGATTGGCGCTATTTATCCTGGAAATAGTGATACTAGGTACAACCAGAAGTTCAAGGGCAAGGCCAAACTGACTGCAGTCACATCCACCACCACTGCCTACATGGATCTCAGCAGCCTGACAAATGAGGACTCTGCGGTCTATTGCTGTACAAGATCAGAGGATTATTACTGGTTTGCTTACTGGGGCCAAGGGACTCTGGTCACTGTCTCTGCAG | GACATCAAGATGACCCAGTCTCCATCTTCCATGTATGCATCTCTAGGAGAGAGAGTCACTATCACTTGCAAGGCGAGTCAGGACATTAATAGATATTTAAGCTGGTTCCAGCAGAAACCAGGGAAATCTCCTAAGACCCTGATCTATCGTGCAAACAGATTGGTAGATGGGGTCCCATCAAGGTTCAGTGGCAGTGGATCTGGGCAAGATTATTCTCTCACCATCAGCAGCCTGGAGTATGAAGATATGGGAATTTATTATTGTCTACAGTATGATGAGTTTCCGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAAC | EVQLQQSGTMLARPGASVKMSCKASGYTFTNYWIHWVKQRPGQGLEWIGAIYPGNSDTRYNQKFKGKAKLTAVTSTTTAYMDLSSLTNEDSAVYCCTRSEDYYWFAYWGQGTLVTVSA_ | DIKMTQSPSSMYASLGERVTITCKASQDINRYLSWFQQKPGKSPKTLIYRANRLVDGVPSRFSGSGSGQDYSLTISSLEYEDMGIYYCLQYDEFPYTFGGGTKLEIK_ | |||||||
s1_AACTCAGGTGGTTTCA-1 | AACTCAGGTGGTTTCA | s1_clonotype3 | s1 | Not assignable | 1 | B cell | 1 | 1 | CTRSEDYYWFAYW | CLQYDEFPYTF | TGTACAAGATCAGAGGATTACTACTGGTTTGCTTACTGG | TGTCTACAGTATGATGAGTTTCCGTACACGTTC | AACTCAGGTGGTTTCA-1_contig_1 | AACTCAGGTGGTTTCA-1_contig_2 | IGH | IGK | IGHV1-5 | IGKV14-111 | IGHJ3 | IGKJ2 | IGHG1 | IGKC | GATCAGTATCCTCTTCACAGTCACTGAAAACATTGACTCTAATCATGGAATGTAACTGGATACTTCCTTTTATTCTGTCGGTAACTTCAGGGGTCTACTCAGAGGTTCAGCTCCAGCAGTCTGGGACTGCGGTGGCAAGGCCTGGGGCTTCAGTGAAGATGTCCTGCAAGACTTCTGGCTACACCTTTACCAACTACTGGATGAACTGGGTAAAACAGAGGCCTGGACAGGGTCTGGAATGGATTGGCGCTATTTATCCTGGAAATAGTGACACTAGGCACAACCAGAAGTTCAAGGGCAAGGCCAAACTGACTGCAGTCACATCCACCAGTACTGCCTACATGGACCTCAGCAGCCTGACAAACGAGGACTCTGCGGTCTATTACTGTACAAGATCAGAGGATTACTACTGGTTTGCTTACTGGGGCCAAGGGACCCTGGTCACTGTCTCTGCAGCCAAAACGACACCCCCATCTGTCTATCCACTGGCCCCTGGATCTGCTGCCCAAACTAACTCCATGGTGACCCTGGGATGCCTGGTCAAGGG | GGGGTTGGTTTCTTATATGGGGATTGTCATTGCAGCCAGGACTCAGCATGGACATGAGGACCCCTGCTCAGTTTCTTGGAATCTTGTTGCTCTGGTTTCCAGGTATCAAATGTGACATCACGATGACCCAGTCTCCATCTTCCATGTATGCATCTCTAGGAGAGAGAGTCACTTTCACTTGCAAGGCGAGTCAGGACATTAATAGCTATTTAAGCTGGTTCCAGCAGAAACCAGGGAAATCTCCTAAGACCCTGATCTATCGTGCAAACAGATTGGTAGATGGGGTCCCATCAAGGTTCAGTGGCAGTGGATCTGGGCAAGATTATTCTCTCACCATCAGCAGCCTGGAGTATGAAGATATGGGAATTTATTATTGTCTACAGTATGATGAGTTTCCGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC | GATCAGTATCCTCTTCACAGTCACTGAAAACATTGACTCTAATCATGGAATGTAACTGGATACTTCCTTTTATTCTGTCGGTAACTTCAGGGGTCTACTCAGAGGTTCAGCTCCAGCAGTCTGGGACTGTGCTGGCAAGGCCTGGGGCTTCAGTGAAGATGTCCTGCAAGACTTCTGGCTACACATTTACCAGCTACTGGATGCACTGGGTAAAACAGAGGCCTGGACAGGGTCTGGAATGGATAGGGGCTATTTATCCTGGAAATAGTGATACTAGCTACAACCAGAAGTTCAAGGGCAAGGCCAAACTGACTGCAGTCACATCCGCCAGCACTGCCTACATGGAGCTCAGCAGCCTGACAAATGAGGACTCTGCGGTCTATTACTGTACAAGACCTGGTTTGCTTACTGGGGCCAAGGGACTCTGGTCACTGTCTCTGCAGCCAAAACGACACCCCCATCTGTCTATCCACTGGCCCCTGGATCTGCTGCCCAAACTAACTCCATGGTGACCCTGGGATGCCTGGTCAAGGGCTATTTCCCTGAGCCAGTGACAGTGACCTGGAACTCTGGATCCCTGTCCAGCGGTGTGCACACCTTCCCAGCTGTCCTGCAGTCTGACCTCTACACTCTGAGCAGCTCAGTGACTGTCCCCTCCAGCACCTGGCCCAGCCAGACCGTCACCTGCAACGTTGCCCACCCGGCCAGCAGCACCAAGGTGGACAAGAAAATTGTGCCCAGGGATTGTGGTTGTAAGCCTTGCATATGTACAGTCCCAGAAGTATCATCTGTCTTCATCTTCCCCCCAAAGCCCAAGGATGTGCTCACCATTACTCTGACTCCTAAGGTCACGTGTGTTGTGGTAGACATCAGCAAGGATGATCCCGAGGTCCAGTTCAGCTGGTTTGTAGATGATGTGGAGGTGCACACAGCTCAGACGAAACCCCGGGAGGAGCAGATCAACAGCACTTTCCGTTCAGTCAGTGAACTTCCCATCATGCACCAGGACTGGCTCAATGGCAAGGAGTTCAAATGCAGGGTCAACAGTGCAGCTTTCCCTGCCCCCATCGAGAAAACCATCTCCAAAACCAAAGGCAGACCGAAGGCTCCACAGGTGTACACCATTCCACCTCCCAAGGAGCAGATGGCCAAGGATAAAGTCAGTCTGACCTGCATGATAACAAACTTCTTCCCTGAAGACATTACTGTGGAGTGGCAGTGGAATGGGCAGCCAGCGGAGAACTACAAGAACACTCAGCCCATCATGGACACAGATGGCTCTTACTTCGTCTACAGCAAGCTCAATGTGCAGAAGAGCAACTGGGAGGCAGGAAATACTTTCACCTGCTCTGTGTTACATGAGGGCCTGCACAACCACCATACTGAGAAGAGCCTCTCCCACTCTCCTGGTAAA | ATGGACATGAGGACCCCTGCTCAGTTTCTTGGAATCTTGTTGCTCTGGTTTCCAGGTATCAAATGTGACATCAAGATGACCCAGTCTCCATCTTCCATGTATGCATCTCTAGGAGAGAGAGTCACTATCACTTGCAAGGCGAGTCAGGACATTAATAGCTATTTAAGCTGGTTCCAGCAGAAACCAGGGAAATCTCCTAAGACCCTGATCTATCGTGCAAACAGATTGGTAGATGGGGTCCCATCAAGGTTCAGTGGCAGTGGATCTGGGCAAGATTATTCTCTCACCATCAGCAGCCTGGAGTATGAAGATATGGGAATTTATTATTGTCTACAGTATGATGAGTTTCCTCCTGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGAACAACTTCTACCCCAAAGACATCAATGTCAAGTGGAAGATTGATGGCAGTGAACGACAAAATGGCGTCCTGAACAGTTGGACTGATCAGGACAGCAAAGACAGCACCTACAGCATGAGCAGCACCCTCACGTTGACCAAGGACGAGTATGAACGACATAACAGCTATACCTGTGAGGCCACTCACAAGACATCAACTTCACCCATTGTCAAGAGCTTCAACAGGAATGAGTGT | clonotype3_concat_ref_1 | clonotype3_concat_ref_2 | 116 | NA | NA | Unspecified | clonotype3 | 544 | 1728 | 116 | clonotype3 | clonotype3 | 118 | clonotype3 | GAGGTTCAGCTCCAGCAGTCTGGGACTGCGGTGGCAAGGCCTGGGGCTTCAGTGAAGATGTCCTGCAAGACTTCT | GGCTACACCTTTACCAACTACTGG | ATGAACTGGGTAAAACAGAGGCCTGGACAGGGTCTGGAATGGATTGGCGCT | ATTTATCCTGGAAATAGTGACACT | AGGCACAACCAGAAGTTCAAGGGCAAGGCCAAACTGACTGCAGTCACATCCACCAGTACTGCCTACATGGACCTCAGCAGCCTGACAAACGAGGACTCTGCGGTCTATTAC | TGTACAAGATCAGAGGATTACTACTGGTTTGCTTACTGG | GGCCAAGGGACCCTGGTCACTGTCTCTGCAG | EVQLQQSGTAVARPGASVKMSCKTS | GYTFTNYW | MNWVKQRPGQGLEWIGA | IYPGNSDT | RHNQKFKGKAKLTAVTSTSTAYMDLSSLTNEDSAVYY | CTRSEDYYWFAYW | GQGTLVTVSA_ | 0|296|314|101|397|ST28CSC30GSA83CSG91ASC102ASA143TSG146CST170CSC176GST177CSG225ASC230TSG245CST263C|1284.0 | 21|68|68|409|456|ST48C|221.0 | 26|33|69|402|409||35.0 | s1_AACTCAGGTGGTTTCA-1 | 15 | GACATCACGATGACCCAGTCTCCATCTTCCATGTATGCATCTCTAGGAGAGAGAGTCACTTTCACTTGCAAGGCGAGT | CAGGACATTAATAGCTAT | TTAAGCTGGTTCCAGCAGAAACCAGGGAAATCTCCTAAGACCCTGATCTAT | CGTGCAAAC | AGATTGGTAGATGGGGTCCCATCAAGGTTCAGTGGCAGTGGATCTGGGCAAGATTATTCTCTCACCATCAGCAGCCTGGAGTATGAAGATATGGGAATTTATTAT | TGTCTACAGTATGATGAGTTTCCGTACACGTTC | GGAGGGGGGACCAAGCTGGAAATAAAAC | DITMTQSPSSMYASLGERVTFTCKAS | QDINSY | LSWFQQKPGKSPKTLIY | RAN | RLVDGVPSRFSGSGSGQDYSLTISSLEYEDMGIYY | CLQYDEFPYTF | GGGTKLEIK_ | 0|284|307|113|397|SA7CSA60T|1392.0 | 21|59|59|397|435||190.0 | s1_AACTCAGGTGGTTTCA-1 | 2 | GAGGTTCAGCTCCAGCAGTCTGGGACTGCGGTGGCAAGGCCTGGGGCTTCAGTGAAGATGTCCTGCAAGACTTCTGGCTACACCTTTACCAACTACTGGATGAACTGGGTAAAACAGAGGCCTGGACAGGGTCTGGAATGGATTGGCGCTATTTATCCTGGAAATAGTGACACTAGGCACAACCAGAAGTTCAAGGGCAAGGCCAAACTGACTGCAGTCACATCCACCAGTACTGCCTACATGGACCTCAGCAGCCTGACAAACGAGGACTCTGCGGTCTATTACTGTACAAGATCAGAGGATTACTACTGGTTTGCTTACTGGGGCCAAGGGACCCTGGTCACTGTCTCTGCAG | GACATCACGATGACCCAGTCTCCATCTTCCATGTATGCATCTCTAGGAGAGAGAGTCACTTTCACTTGCAAGGCGAGTCAGGACATTAATAGCTATTTAAGCTGGTTCCAGCAGAAACCAGGGAAATCTCCTAAGACCCTGATCTATCGTGCAAACAGATTGGTAGATGGGGTCCCATCAAGGTTCAGTGGCAGTGGATCTGGGCAAGATTATTCTCTCACCATCAGCAGCCTGGAGTATGAAGATATGGGAATTTATTATTGTCTACAGTATGATGAGTTTCCGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAAC | EVQLQQSGTAVARPGASVKMSCKTSGYTFTNYWMNWVKQRPGQGLEWIGAIYPGNSDTRHNQKFKGKAKLTAVTSTSTAYMDLSSLTNEDSAVYYCTRSEDYYWFAYWGQGTLVTVSA_ | DITMTQSPSSMYASLGERVTFTCKASQDINSYLSWFQQKPGKSPKTLIYRANRLVDGVPSRFSGSGSGQDYSLTISSLEYEDMGIYYCLQYDEFPYTFGGGTKLEIK_ | |||||||
s2_AAAGATGAGCATCATC-1 | AAAGATGAGCATCATC | s2_clonotype1 | s2 | Not assignable | 2 | B cell | 1 | 1 | CARDAYDWYFDVW | CQQGQSYPLTF | TGTGCAAGAGATGCTTACGACTGGTACTTCGATGTCTGG | TGTCAACAGGGTCAAAGTTATCCTCTGACGTTC | AAAGATGAGCATCATC-1_contig_1 | AAAGATGAGCATCATC-1_contig_2 | IGH | IGK | IGHV7-1 | IGKV15-103 | IGHJ1 | IGKJ1 | IGHM | IGKC | AGTTGACGTTTTCTTATATGGGGGGGATCCTGTCCTGAGTTCCCCAATCTTCACATTCAGAAATCAGCACTCAGTCCTGTCACTATGAAGTTGTGGTTAAACTGGGTTTTTCTTTTAACACTTTTACATGGTATCCAGTGTGAGGTGAAGCTGGTGGAATCTGGAGGAGGCTTGGTACAGCCTGGGGGTTCTCTGAGACTCTCCTGTGCAACTTCTGGGTTCACCTTCAGTGATTTCTACATGGAGTGGGTCCGCCAGCCTCCAGGGAAGAGACTGGAGTGGATTGCTGCAAGTAGAAACAAAGCTAATGATTATACAACAGAGTACAGTGCATCTGTGAAGGGTCGGTTCATCGTCTCCAGAGACACTTCCCAAAGCATCCTCTACCTTCAGATGAATGCCCTGAGAGCTGAGGACACTGCCATTTATTACTGTGCAAGAGATGCTTACGACTGGTACTTCGATGTCTGGGGCGCAGGGACCACGGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA | TCACTCTCAGTGAGGATACACCATCAGCATGAGGGTCCTTGCTGAGCTCCTGGGGCTGCTGCTGTTCTGCTTTTTAGGTGTGAGATGTGACATCCAGATGAACCAGTCTCCATCCAGTCTGTCTGCATCCCTTGGAGACACAATTACCATCACTTGCCATGCCAGTCAGAACATTAATGTTTGGTTAAGCTGGTACCAGCAGAAACCAGGAAATATTCCTAAACTATTGATCTATAAGGCTTCCAACTTGCACACAGGCGTCCCATCAAGGTTTAGTGGCAGTGGATCTGGAACAGGTTTCACATTAACCATCAGCAGCCTGCAGCCTGAAGACATTGCCACTTACTACTGTCAACAGGGTCAAAGTTATCCTCTGACGTTCGGTGGAGGCACCAAGCTGGAAATCAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC | ATGAAGTTGTGGTTAAACTGGGTTTTTCTTTTAACACTTTTACATGGTATCCAGTGTGAGGTGAAGCTGGTGGAATCTGGAGGAGGCTTGGTACAGTCTGGGCGTTCTCTGAGACTCTCCTGTGCAACTTCTGGGTTCACCTTCAGTGATTTCTACATGGAGTGGGTCCGCCAAGCTCCAGGGAAGGGACTGGAGTGGATTGCTGCAAGTAGAAACAAAGCTAATGATTATACAACAGAGTACAGTGCATCTGTGAAGGGTCGGTTCATCGTCTCCAGAGACACTTCCCAAAGCATCCTCTACCTTCAGATGAATGCCCTGAGAGCTGAGGACACTGCCATTTATTACTGTGCAAGAGATGCACTAACTGGGACCTACTGGTACTTCGATGTCTGGGGCACAGGGACCACGGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT | TCACTCTCAGTGAGGATACACCATCAGCATGAGGGTCCTTGCTGAGCTCCTGGGGCTGCTGCTGTTCTGCTTTTTAGGTGTGAGATGTGACATCCAGATGAACCAGTCTCCATCCAGTCTGTCTGCATCCCTTGGAGACACAATTACCATCACTTGCCATGCCAGTCAGAACATTAATGTTTGGTTAAGCTGGTACCAGCAGAAACCAGGAAATATTCCTAAACTATTGATCTATAAGGCTTCCAACTTGCACACAGGCGTCCCATCAAGGTTTAGTGGCAGTGGATCTGGAACAGGTTTCACATTAACCATCAGCAGCCTGCAGCCTGAAGACATTGCCACTTACTACTGTCAACAGGGTCAAAGTTATCCTCTGTGGACGTTCGGTGGAGGCACCAAGCTGGAAATCAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGAACAACTTCTACCCCAAAGACATCAATGTCAAGTGGAAGATTGATGGCAGTGAACGACAAAATGGCGTCCTGAACAGTTGGACTGATCAGGACAGCAAAGACAGCACCTACAGCATGAGCAGCACCCTCACGTTGACCAAGGACGAGTATGAACGACATAACAGCTATACCTGTGAGGCCACTCACAAGACATCAACTTCACCCATTGTCAAGAGCTTCAACAGGAATGAGTGT | clonotype1_concat_ref_1 | clonotype1_concat_ref_7 | 220 | NA | NA | Unspecified | clonotype1 | 1367 | 5508 | 220 | clonotype1 | clonotype1 | 225 | clonotype1 | GAGGTGAAGCTGGTGGAATCTGGAGGAGGCTTGGTACAGCCTGGGGGTTCTCTGAGACTCTCCTGTGCAACTTCT | GGGTTCACCTTCAGTGATTTCTAC | ATGGAGTGGGTCCGCCAGCCTCCAGGGAAGAGACTGGAGTGGATTGCTGCA | AGTAGAAACAAAGCTAATGATTATACAACA | GAGTACAGTGCATCTGTGAAGGGTCGGTTCATCGTCTCCAGAGACACTTCCCAAAGCATCCTCTACCTTCAGATGAATGCCCTGAGAGCTGAGGACACTGCCATTTATTAC | TGTGCAAGAGATGCTTACGACTGGTACTTCGATGTCTGG | GGCGCAGGGACCACGGTCACCGTCTCCTCAG | EVKLVESGGGLVQPGGSLRLSCATS | GFTFSDFY | MEWVRQPPGKRLEWIAA | SRNKANDYTT | EYSASVKGRFIVSRDTSQSILYLQMNALRAEDTAIYY | CARDAYDWYFDVW | GAGTTVTVSS_ | 0|305|326|141|446|ST39CSC45GSA116GSG117CSG129A|1455.0 | 22|73|73|451|502|SA45G|241.0 | 27|32|51|446|451||25.0 | s2_AAAGATGAGCATCATC-1 | 6 | GACATCCAGATGAACCAGTCTCCATCCAGTCTGTCTGCATCCCTTGGAGACACAATTACCATCACTTGCCATGCCAGT | CAGAACATTAATGTTTGG | TTAAGCTGGTACCAGCAGAAACCAGGAAATATTCCTAAACTATTGATCTAT | AAGGCTTCC | AACTTGCACACAGGCGTCCCATCAAGGTTTAGTGGCAGTGGATCTGGAACAGGTTTCACATTAACCATCAGCAGCCTGCAGCCTGAAGACATTGCCACTTACTAC | TGTCAACAGGGTCAAAGTTATCCTCTGACGTTC | GGTGGAGGCACCAAGCTGGAAATCAAAC | DIQMNQSPSSLSASLGDTITITCHAS | QNINVW | LSWYQQKPGNIPKLLIY | KAS | NLHTGVPSRFSGSGSGTGFTLTISSLQPEDIATYY | CQQGQSYPLTF | GGGTKLEIK_ | 0|287|307|88|375||1435.0 | 23|58|58|375|410||175.0 | s2_AAAGATGAGCATCATC-1 | 0 | GAGGTGAAGCTGGTGGAATCTGGAGGAGGCTTGGTACAGCCTGGGGGTTCTCTGAGACTCTCCTGTGCAACTTCTGGGTTCACCTTCAGTGATTTCTACATGGAGTGGGTCCGCCAGCCTCCAGGGAAGAGACTGGAGTGGATTGCTGCAAGTAGAAACAAAGCTAATGATTATACAACAGAGTACAGTGCATCTGTGAAGGGTCGGTTCATCGTCTCCAGAGACACTTCCCAAAGCATCCTCTACCTTCAGATGAATGCCCTGAGAGCTGAGGACACTGCCATTTATTACTGTGCAAGAGATGCTTACGACTGGTACTTCGATGTCTGGGGCGCAGGGACCACGGTCACCGTCTCCTCAG | GACATCCAGATGAACCAGTCTCCATCCAGTCTGTCTGCATCCCTTGGAGACACAATTACCATCACTTGCCATGCCAGTCAGAACATTAATGTTTGGTTAAGCTGGTACCAGCAGAAACCAGGAAATATTCCTAAACTATTGATCTATAAGGCTTCCAACTTGCACACAGGCGTCCCATCAAGGTTTAGTGGCAGTGGATCTGGAACAGGTTTCACATTAACCATCAGCAGCCTGCAGCCTGAAGACATTGCCACTTACTACTGTCAACAGGGTCAAAGTTATCCTCTGACGTTCGGTGGAGGCACCAAGCTGGAAATCAAAC | EVKLVESGGGLVQPGGSLRLSCATSGFTFSDFYMEWVRQPPGKRLEWIAASRNKANDYTTEYSASVKGRFIVSRDTSQSILYLQMNALRAEDTAIYYCARDAYDWYFDVWGAGTTVTVSS_ | DIQMNQSPSSLSASLGDTITITCHASQNINVWLSWYQQKPGNIPKLLIYKASNLHTGVPSRFSGSGSGTGFTLTISSLQPEDIATYYCQQGQSYPLTFGGGTKLEIK_ | |||||||
s2_AAAGCAACAGCTGTGC-1 | AAAGCAACAGCTGTGC | s2_clonotype1 | s2 | Not assignable | 2 | B cell | 1 | 1 | CARDAWDWYFDVW | CQQGQSYPLTF | TGTGCAAGAGATGCATGGGACTGGTACTTCGATGTCTGG | TGTCAACAGGGTCAAAGTTATCCTCTGACGTTC | AAAGCAACAGCTGTGC-1_contig_2 | AAAGCAACAGCTGTGC-1_contig_1 | IGH | IGK | IGHV7-1 | IGKV15-103 | IGHJ1 | IGKJ1 | IGHM | IGKC | TGGGGAGTGGGATCCTGTCCTGAGTTCCCCAATCTTCACATTCAGAAATCAGCACTCAGTCCTGTCACTATGAAGTTGTGGTTAAACTGGGTTTTTCTTTTAACACTTTTACATGGTATCCAGTGTGAGGTGAAGCTGGTGGAATCTGGAGGAGGCTTGGTACAGCCTGGGGGTTCTCTGAGACTCTCCTGTGCAACTTCTGGGTTCACCTTCAGTGATTTCTACATGGAGTGGGTCCGCCAGCCTCCAGGGAAGAGACTGGAGTGGATTGCTGCAAGTAGAAACAAAGCTAATGATTATACAACAGAGTACAGTGCATCTGTGAAGGGTCGGTTCATCGTCTCCAGAGACACTTCCCAAAGCATCCTCTACCTTCAGATGAATGCCCTGAGAGCTGAGGACACTGCCATTTATTACTGTGCAAGAGATGCATGGGACTGGTACTTCGATGTCTGGGGCGCAGGGACCACGGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA | TCACTCTCAGTGAGGATACACCATCAGCATGAGGGTCCTTGCTGAGCTCCTGGGGCTGCTGCTGTTCTGCTTTTTAGGTGTGAGATGTGACATCCAGATGAACCAGTCTCCATCCAGTCTGTCTGCATCCCTTGGAGACACAATTACCATCACTTGCCATGCCAGTCAGAACATTAATGTTTGGTTAAGCTGGTACCAGCAGAAACCAGGAAATATTCCTAAACTATTGATCTATAAGGCTTCCAACTTGCACACAGGCGTCCCATCAAGGTTTAGTGGCAGTGGATCTGGAACAGGTTTCACATTAACCATCAGCAGCCTGCAGCCTGAAGACATTGCCACTTACTACTGTCAACAGGGTCAAAGTTATCCTCTGACGTTCGGTGGAGGCACCAAGCTGGAAATCAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC | ATGAAGTTGTGGTTAAACTGGGTTTTTCTTTTAACACTTTTACATGGTATCCAGTGTGAGGTGAAGCTGGTGGAATCTGGAGGAGGCTTGGTACAGTCTGGGCGTTCTCTGAGACTCTCCTGTGCAACTTCTGGGTTCACCTTCAGTGATTTCTACATGGAGTGGGTCCGCCAAGCTCCAGGGAAGGGACTGGAGTGGATTGCTGCAAGTAGAAACAAAGCTAATGATTATACAACAGAGTACAGTGCATCTGTGAAGGGTCGGTTCATCGTCTCCAGAGACACTTCCCAAAGCATCCTCTACCTTCAGATGAATGCCCTGAGAGCTGAGGACACTGCCATTTATTACTGTGCAAGAGATGCACTAACTGGGACCTACTGGTACTTCGATGTCTGGGGCACAGGGACCACGGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT | TCACTCTCAGTGAGGATACACCATCAGCATGAGGGTCCTTGCTGAGCTCCTGGGGCTGCTGCTGTTCTGCTTTTTAGGTGTGAGATGTGACATCCAGATGAACCAGTCTCCATCCAGTCTGTCTGCATCCCTTGGAGACACAATTACCATCACTTGCCATGCCAGTCAGAACATTAATGTTTGGTTAAGCTGGTACCAGCAGAAACCAGGAAATATTCCTAAACTATTGATCTATAAGGCTTCCAACTTGCACACAGGCGTCCCATCAAGGTTTAGTGGCAGTGGATCTGGAACAGGTTTCACATTAACCATCAGCAGCCTGCAGCCTGAAGACATTGCCACTTACTACTGTCAACAGGGTCAAAGTTATCCTCTGTGGACGTTCGGTGGAGGCACCAAGCTGGAAATCAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGAACAACTTCTACCCCAAAGACATCAATGTCAAGTGGAAGATTGATGGCAGTGAACGACAAAATGGCGTCCTGAACAGTTGGACTGATCAGGACAGCAAAGACAGCACCTACAGCATGAGCAGCACCCTCACGTTGACCAAGGACGAGTATGAACGACATAACAGCTATACCTGTGAGGCCACTCACAAGACATCAACTTCACCCATTGTCAAGAGCTTCAACAGGAATGAGTGT | clonotype1_concat_ref_1 | clonotype1_concat_ref_7 | 220 | NA | NA | Unspecified | clonotype1 | 477 | 1828 | 220 | clonotype1 | clonotype1 | 225 | clonotype1 | GAGGTGAAGCTGGTGGAATCTGGAGGAGGCTTGGTACAGCCTGGGGGTTCTCTGAGACTCTCCTGTGCAACTTCT | GGGTTCACCTTCAGTGATTTCTAC | ATGGAGTGGGTCCGCCAGCCTCCAGGGAAGAGACTGGAGTGGATTGCTGCA | AGTAGAAACAAAGCTAATGATTATACAACA | GAGTACAGTGCATCTGTGAAGGGTCGGTTCATCGTCTCCAGAGACACTTCCCAAAGCATCCTCTACCTTCAGATGAATGCCCTGAGAGCTGAGGACACTGCCATTTATTAC | TGTGCAAGAGATGCATGGGACTGGTACTTCGATGTCTGG | GGCGCAGGGACCACGGTCACCGTCTCCTCAG | EVKLVESGGGLVQPGGSLRLSCATS | GFTFSDFY | MEWVRQPPGKRLEWIAA | SRNKANDYTT | EYSASVKGRFIVSRDTSQSILYLQMNALRAEDTAIYY | CARDAWDWYFDVW | GAGTTVTVSS_ | 0|308|326|126|434|ST39CSC45GSA116GSG117CSG129A|1470.0 | 22|73|73|436|487|SA45G|241.0 | s2_AAAGCAACAGCTGTGC-1 | 6 | GACATCCAGATGAACCAGTCTCCATCCAGTCTGTCTGCATCCCTTGGAGACACAATTACCATCACTTGCCATGCCAGT | CAGAACATTAATGTTTGG | TTAAGCTGGTACCAGCAGAAACCAGGAAATATTCCTAAACTATTGATCTAT | AAGGCTTCC | AACTTGCACACAGGCGTCCCATCAAGGTTTAGTGGCAGTGGATCTGGAACAGGTTTCACATTAACCATCAGCAGCCTGCAGCCTGAAGACATTGCCACTTACTAC | TGTCAACAGGGTCAAAGTTATCCTCTGACGTTC | GGTGGAGGCACCAAGCTGGAAATCAAAC | DIQMNQSPSSLSASLGDTITITCHAS | QNINVW | LSWYQQKPGNIPKLLIY | KAS | NLHTGVPSRFSGSGSGTGFTLTISSLQPEDIATYY | CQQGQSYPLTF | GGGTKLEIK_ | 0|287|307|88|375||1435.0 | 23|58|58|375|410||175.0 | s2_AAAGCAACAGCTGTGC-1 | 0 | GAGGTGAAGCTGGTGGAATCTGGAGGAGGCTTGGTACAGCCTGGGGGTTCTCTGAGACTCTCCTGTGCAACTTCTGGGTTCACCTTCAGTGATTTCTACATGGAGTGGGTCCGCCAGCCTCCAGGGAAGAGACTGGAGTGGATTGCTGCAAGTAGAAACAAAGCTAATGATTATACAACAGAGTACAGTGCATCTGTGAAGGGTCGGTTCATCGTCTCCAGAGACACTTCCCAAAGCATCCTCTACCTTCAGATGAATGCCCTGAGAGCTGAGGACACTGCCATTTATTACTGTGCAAGAGATGCATGGGACTGGTACTTCGATGTCTGGGGCGCAGGGACCACGGTCACCGTCTCCTCAG | GACATCCAGATGAACCAGTCTCCATCCAGTCTGTCTGCATCCCTTGGAGACACAATTACCATCACTTGCCATGCCAGTCAGAACATTAATGTTTGGTTAAGCTGGTACCAGCAGAAACCAGGAAATATTCCTAAACTATTGATCTATAAGGCTTCCAACTTGCACACAGGCGTCCCATCAAGGTTTAGTGGCAGTGGATCTGGAACAGGTTTCACATTAACCATCAGCAGCCTGCAGCCTGAAGACATTGCCACTTACTACTGTCAACAGGGTCAAAGTTATCCTCTGACGTTCGGTGGAGGCACCAAGCTGGAAATCAAAC | EVKLVESGGGLVQPGGSLRLSCATSGFTFSDFYMEWVRQPPGKRLEWIAASRNKANDYTTEYSASVKGRFIVSRDTSQSILYLQMNALRAEDTAIYYCARDAWDWYFDVWGAGTTVTVSS_ | DIQMNQSPSSLSASLGDTITITCHASQNINVWLSWYQQKPGNIPKLLIYKASNLHTGVPSRFSGSGSGTGFTLTISSLQPEDIATYYCQQGQSYPLTFGGGTKLEIK_ |
The second element contains a list for every structure with the ranked models. So here the structure from cell “s1_AAACCTGCATTGGCGC-1” is selected and from all predictions the one with the highest confidence “ranked_0.pdb” is chosen. This is a pdb object with the following attributes: atom, xyz, calpha, call
2]]$`s1_AAACCTGCATTGGCGC-1`$ranked_0.pdb VDJ_top_clonotypes_structure[[
##
## Call: bio3d::read.pdb(file = paste0(CurDir, "/", OutDir, "/", out[i],
## "_ranked/", "ranked_", n - 1, ".pdb"))
##
## Total Models#: 1
## Total Atoms#: 3419, XYZs#: 10257 Chains#: 2 (values: A B)
##
## Protein Atoms#: 3419 (residues/Calpha atoms#: 225)
## Nucleic acid Atoms#: 0 (residues/phosphate atoms#: 0)
##
## Non-protein/nucleic Atoms#: 0 (residues: 0)
## Non-protein/nucleic resid values: [ none ]
##
## Protein sequence:
## EVQLLETGGGLVQPGGSRGLSCEGSGFTFSGFWMSWVRQTPGKTLEWIGDINSDGSAINY
## APSIKDRFTIFRDNDKSTLYLQMSNVRSEDTATYFCMRYSAYWYFDVWGAGTTVTVSSDI
## KMTQSPSSMYASLGERVTITCKASQDIKSYLSWYQQKPWKSPKTLIYYATSLADGVPSRF
## SGSGSGQDYSLTISSLESDDTATYYCLQHGESPYTFGGGTKLEIK
##
## + attr: atom, xyz, calpha, call
The AlphaFold_prediction function can also be used to predict a protein of interest form a local FASTA file. For this purpose the argument fasta.directory.path can be specified with a path to a directory containing the FASTA files. The function is written in a way that the structure is predicted for all files in the specified directory with the ending .fasta. Like this a lot of structures can be predicted with calling the function only once and without adding a separate path for every file. Here we are only interested in the structure of ovalbumin, so the fasta.directory.path points to a directory which only contains the FASTA file of ovalbumin.
#AlphaFold structure prediction from local fasta file
AlphaFold_prediction(fasta.directory.path = "OVA",
euler.user.name = "lucstalder",
dir.name = "OVA_only")
The structure can then be imported with the same function as for prediction. If there is no vgm object specified, the output will just be a list of structures with a sublist for each prediction containing the top ranked pdb files.
<- AlphaFold_prediction(import = "euler",
OVA_structure euler.user.name = "lucstalder",
euler.dirname = "OVA",
n.ranked = 5)
1]]$OVA$ranked_0.pdb OVA_structure[[
##
## Call: bio3d::read.pdb(file = paste0(CurDir, "/", OutDir, "/", out[i],
## "_ranked/", "ranked_", n - 1, ".pdb"))
##
## Total Models#: 1
## Total Atoms#: 5999, XYZs#: 17997 Chains#: 1 (values: A)
##
## Protein Atoms#: 5999 (residues/Calpha atoms#: 386)
## Nucleic acid Atoms#: 0 (residues/phosphate atoms#: 0)
##
## Non-protein/nucleic Atoms#: 0 (residues: 0)
## Non-protein/nucleic resid values: [ none ]
##
## Protein sequence:
## MGSIGAASMEFCFDVFKELKVHHANENIFYCPIAIMSALAMVYLGAKDSTRTQINKVVRF
## DKLPGFGDSIEAQCGTSVNVHSSLRDILNQITKPNDVYSFSLASRLYAEERYPILPEYLQ
## CVKELYRGGLEPINFQTAADQARELINSWVESQTNGIIRNVLQPSSVDSQTAMVLVNAIV
## FKGLWEKAFKDEDTQAMPFRVTEQESKPVQMMYQIGLFRVASMAS...<cut>...CVSP
##
## + attr: atom, xyz, calpha, call
It might be of interest not only to get the structure of the antibody
but also to see AlphaFold’s prediction of antigen interaction. Therefore
the antigen can be added to the prediction as an additional
domain.
This can be easily done by specifying the path to the antigen FASTA file
in the antigen.fasta.path argument of the function.
Here we are interested to see the prediction of some experimentally identified binders to the OVA protein. So we first filter the vgm object for the given samples
<- vgm[[1]] %>%
VDJ_OVA_binder filter(barcode %in% c("s1_AAACCTGCACGTCTCT-1",
"s1_AACCGCGGTTCAGTAC-1",
"s2_TTTGCGCTCGGCTACG-1",
"s2_AACTGGTAGTGCAAGC-1",
"s5_ATTACTCAGGCTCTTA-1"))
For prediction we need to get the full length amino acid HC and LC sequence with MIXCR
<- VDJ_call_MIXCR_full(VDJ = VDJ_OVA_binder,
VDJ_OVA_binder_mixcr_out mixcr.directory = "/usr/local/Cellar/mixcr/3.0.13-2",
species = "mmu")
## MAC system detected
Now the structure of these antibodies can be predicted with the OVA antigen as an additional domain. The only thing to be added is the path to the antigen amino acid FASTA file in the antigen.fasta.path argument of the function.
#AlphaFold structure prediction
AlphaFold_prediction(VDJ.mixcr.out = VDJ_OVA_binder_mixcr_out,
cells.to.predict = "ALL",
euler.user.name = "lucstalder",
dir.name = "OVA_binder_with_OVA",
antigen.fasta.path = "OVA.fasta")
After prediction is finished the data is imported from the server:
#AlphaFold import
<- AlphaFold_prediction(VDJ.mixcr.out = VDJ_OVA_binder_mixcr_out,
OVA_binder_with_OVA_structure import = "euler",
euler.user.name = "lucstalder",
euler.dirname = "OVA_binder_with_OVA",
n.ranked = 5,
rm.local.output = F)
The predicted structures can be visualized and analysed in R using a variety of packages for protein structure analysis. For an easy visualization, annotation and analysis of antibody structures, the Platypus VDJ_structure_analysis function can be used. It is manly based on the bio3d and r3dmol packages. The VDJ_structure_analysis function visualizes the antibody structure and automatically annotates the framework and CDR regions.
Lets have a look at the three structures of the top most expanded clonotypes that were previously predicted in 2.3.1.
<- VDJ_structure_analysis(VDJ.structure = VDJ_top_clonotypes_structure,
out cells.to.vis = "ALL")
The function automatically annotates the structure based on the VDJ MIXCR sequences. The annotation can be disabled by setting the VDJ.anno argument to FALSE.
<- VDJ_structure_analysis(VDJ.structure = VDJ_top_clonotypes_structure,
out cells.to.vis = "s1_AAACCTGCATTGGCGC-1",
VDJ.anno = F)
The function has a variety of options which can be seen in the description. The annotation label can for example be disabled, or the label size can be changed, the color of the different regions can be set individually and so on.
The framework color can be set to gray, for a better visualization of the CDR’s.
<- VDJ_structure_analysis(VDJ.structure = VDJ_top_clonotypes_structure,
out cells.to.vis = "s1_AAACCTGCATTGGCGC-1",
label = F,
color.frameworks = "#C0C0C0",
angle.x = 0,
angle.z = -10,
angle.y = 170)
The function can also be used to visualize structures directly from pdb files. Previously the structure of ovalbumin was predicted with AlphaFold and can be visualized now by the PDB.file option
<- VDJ_structure_analysis(PDB.file = OVA_structure[[1]]$OVA$ranked_0.pdb) out
If you would like to visualize a local .pdb file then you have to use the function bio3d::read.pdb() function to read in the PDB file as follows:
VDJ_structure_analysis(PDB.file = bio3d::read.pdb("Path/to/file.pdb"))
For the OVA binders the structure was predicted together with the antigen. The Platypus VDJ_structure_analysis function can be used to visualize the binding interaction and determine some binding site metrics.
First let’s have a glimpse at the structures
<- VDJ_structure_analysis(VDJ.structure = OVA_binder_with_OVA_structure,
out cells.to.vis = "s1_AACCGCGGTTCAGTAC-1",
angle.x = 40,
angle.y = 200,
angle.z = -50,
label = F)
Experimentally there were some distinct epitops determined on the ovalbumin antigen. With the anno.seq argument of the VDJ_plot_structure function there is the possibility to annotate custom regions on the sequence. It can either be a list of multiple regions to annotate or a single region. For every annotation there are four mandatory elements: The residue index of the annotation’s start, the residue index of the annotation’s end, the protein chain (this is alphabetic A:Z and in the same order as the sequences in the FASTA file) and the color of the annotation. There is an optional fifth element if you would like to have a label for the annotation.
For the OVA there are three bins to be annotated so a list of annotations is specified. First the OVA structure is set to gray, then annotations for Bin1 to Bin3 are specified with the respective color and the label. Bin3 consists of two regions so there is a second annotation with the same color but no label.
<- list(
OVA_Bins c(1,392,"C","#b3b3b3"),
c(186,201,"C","#9900ff","Bin1"),
c(133,148,"C","#ffcc00","Bin2"),
c(265,280,"C","#0000ff","Bin3"),
c(214,229,"C","#0000ff")
)
Now the created custom annotation can be added by the anno.seq argument.
<- VDJ_structure_analysis(VDJ.structure = OVA_binder_with_OVA_structure,
out cells.to.vis = "s1_AACCGCGGTTCAGTAC-1",
angle.x = 40,
angle.y = 200,
angle.z = -50,
label = F,
anno.seq = OVA_Bins)
The three bins are now visible and the antibody seems to interact with the antigen somewhere close to Bin3. To get some more information about the binding site there is the antigen.interaction option. This will determine the binding site residues based on the bio3d::binding.site function and then calculate some metrics of the binding site residues which are summarized in a data frame.
By setting the BindingResidues.plot argument to TRUE, a plot is shown where the binding residues are colored on the antigen and on the antibody.
<- VDJ_structure_analysis(VDJ.structure = OVA_binder_with_OVA_structure,
out cells.to.vis = "s1_AACCGCGGTTCAGTAC-1",
structure.plot = F,
anno.seq = OVA_Bins,
antigen.interaction = T,
BindingResidues.plot = T,
angle.x = 40,
angle.y = 200,
angle.z = -50)
Alpha Fold has a confidence score for every residue in form a pLDDT value. This is a scale from 1 to 100 where a higher score means more confidence in the model prediction. The per residue pLDDT scores can also be visualized with the VDJ_structure_analysis function by setting the plddt.plot argument to TRUE. In this plot blue means high confidence and red low confidence. It can be seen that the binding residues are normally of lower confidence compared to the rest of the structure.
<- VDJ_structure_analysis(VDJ.structure = OVA_binder_with_OVA_structure,
out cells.to.vis = "s1_AACCGCGGTTCAGTAC-1",
structure.plot = F,
anno.seq = OVA_Bins,
antigen.interaction = T,
plddt.plot = T,
angle.x = 40,
angle.y = 200,
angle.z = -50)
Now let’s have a look at the binding site metrics data frame that is
produced when the antigen.interaction label is set to
TRUE. The first two columns are just the cellular barcode and the rank
of the model that was analyzed. Then the mean confidence (pLDDT) is
summarized for the non binding residues over the whole structure as well
as for the binding residues separately for heavy chain, light chain and
antigen.
Then the mean minimal distance is shown in the
Mean_dist_bind_resi column. For every binding residue on the
antibody the distance to its closest partner on the antigen is
calculated and then the mean over all binding residues is calculated. So
it is the mean of all the minimal distances.
In the last three columns the residue indexes for all the binding sites
are summarized as a semi colon separated list.
<- VDJ_structure_analysis(VDJ.structure = OVA_binder_with_OVA_structure,
out cells.to.vis = "s1_AACCGCGGTTCAGTAC-1",
structure.plot = F,
anno.seq = OVA_Bins,
antigen.interaction = T)
barcode | rank | Mean_plddt.non_bind_resi. | Mean_plddt_bind_resi_HC. | Mean_plddt_bind_resi_LC. | Mean_plddt_bind_resi_antigen. | Mean_dist_bind_resi | Bind_res_HC | Bind_res_LC | Bind_res_antigen |
---|---|---|---|---|---|---|---|---|---|
s1_AACCGCGGTTCAGTAC-1 | ranked_0.pdb | 92.15 | 68.25 | 68.83 | 67.64 | 8.03 | 28;30;31;32;33;35;50;52;54;55;56;57;58;59;99;100;101;102;103 | 28;29;30;31;32;50;53;56;92;93;94 | 97;186;187;217;218;219;232;233;234;235;236;237;238;239;241;271;274;275;276;277;278;340;342;343;344;345;346;347;348;349;350;351;354;355;372 |
It might be of interest to see to which region of the antibody the binding site residues belong to. It would be expected that manly the CDR’s are involved in binding and most prominently the CDR3. The Platypus VDJ_structure_analysis function has an argument binding.residue.barplot, which will plot the distribution of the biding site residues to the regions of the antigen in a bar plot.
<- VDJ_structure_analysis(VDJ.structure = OVA_binder_with_OVA_structure,
out cells.to.vis = "s1_AACCGCGGTTCAGTAC-1",
structure.plot = F,
anno.seq = OVA_Bins,
antigen.interaction = T,
binding.residue.barplot = T)
Sum_nr_bind_res | region | Sum_nr_bind_res_pct | |
---|---|---|---|
VDJ_CDR1 | 5 | VDJ_CDR1 | 0.20 |
VDJ_FR2 | 2 | VDJ_FR2 | 0.08 |
VDJ_CDR2 | 6 | VDJ_CDR2 | 0.24 |
VDJ_FR3 | 1 | VDJ_FR3 | 0.04 |
VDJ_CDR3 | 5 | VDJ_CDR3 | 0.20 |
VDJ_FR4 | 0 | VDJ_FR4 | 0.00 |
VJ_FR1 | 0 | VJ_FR1 | 0.00 |
VJ_FR2 | 0 | VJ_FR2 | 0.00 |
VJ_CDR2 | 1 | VJ_CDR2 | 0.04 |
VJ_FR3 | 2 | VJ_FR3 | 0.08 |
VJ_CDR3 | 3 | VJ_CDR3 | 0.12 |
VJ_FR4 | 0 | VJ_FR4 | 0.00 |
A bar plot is produced for every structure that is analyzed with the absolute counts of binding residues for each region of the antibody. Furthermore, a summary bar plot is produced where a percentage of binding site residues is given for each region. This plot is the average over all the analyzed structures. In this example, only one structure was analyzed, so the summary plot is only based on this one structure. The data of for this summary baplot is also returned as a dataframe.
For this vignette we might be interested in the summary plot of all the predicted structures
<- VDJ_structure_analysis(VDJ.structure = OVA_binder_with_OVA_structure,
out cells.to.vis = "ALL",
structure.plot = F,
anno.seq = OVA_Bins,
antigen.interaction = T,
binding.residue.barplot = T)
It can be nicely seen that for the heavy chain most of the binding residues are actually part of the CDR3 and in the light chain they are part of the CDR’s as well.
The barplot can also be shown in a simplyfied version where only the distribution between frame work vs CDR is sown. For this the binding.residue.barplot.style argument can be set to “FR_CDR”
<- VDJ_structure_analysis(VDJ.structure = OVA_binder_with_OVA_structure,
out cells.to.vis = "ALL",
structure.plot = F,
anno.seq = OVA_Bins,
antigen.interaction = T,
binding.residue.barplot = T,
binding.residue.barplot.style = "FR_CDR")
## Not for every entry a structure is defined! All the defined structures are analysed.
Here it can be seen that for the analyzed models in this vignette 80% of all the binding residues are actually part of the CDR regions. Which is quite astonishing to see AlphaFold modeling the binding of an antibody with the CDRs as main binding interaction partners.
One additional feature of the VDJ_structure_analysis function is the r3dmol.code argument, where additional custom lines of code can be added to the r3dmol visualizations. This allows for the highest flexibility. The code has to be in single quotes ’ ’ and start with a pipe %>% . So for example the only binding site residues can be shown as stick model by adding the following lines of code in the r3dmol.code argument.
<- VDJ_structure_analysis(VDJ.structure = OVA_binder_with_OVA_structure,
out cells.to.vis = "s1_AACCGCGGTTCAGTAC-1",
angle.x = 40,
angle.y = 200,
angle.z = -50,
label = F,
anno.seq = OVA_Bins,
r3dmol.code = ' %>%
m_set_style(
style = m_style_stick(),
sel = m_sel(
chain = "A",resi =
c(28,30,31,32,33,35,50,52,54,55,56,57,58,59,99,100,101,102,103)
)
) %>%
m_set_style(
style = m_style_stick(),
sel = m_sel(
chain = "B",resi =
c(28,29,30,31,32,50,53,56,92,93,94)
)
) %>%
m_set_style(
style = m_style_stick(),
sel = m_sel(
chain = "C",resi =
c(97,186,187,217,218,219,232,233,234,235,236,237,238,239,241,271,274,275,276,277,278,340,342,343,344,345,346,347,348,349,350,351,354,355,372)
)
)')