1. VGM Introduction

The VGM is the central data object of the current iteration of Platypus that is produced by the VDJ_GEX_matrix function. The main idea was to create a single object that contains all relevant immune information that can be supplied to all other functions in our package. This additionally provides the benefit that in the case of custom data formats (e.g. pre-existing Seurat objects or non 10x single-cell repertoire information), the user can adapt the necessary column names and still use the downstream functions of Platypus.

For downstream examples of how the VGM interacts with other Platypus functions, please refer to the Platypus Quickstart vignette. This vignette will describe how we create the VGM from the output of 10x genomics cellranger and how this can be modulated by the various function arguments.

As the function processes both GEX and VDJ, this vignette is devided into 3 parts: 2. General settings 3.1 Gene expression (GEX) 3.2. Feature Barcodes (FB) 4. Immune receptor repertoire (VDJ)

Most examples use the yermanos2021a dataset from PlatypusDB, which contains B and T cells (GEX + VDJ) and is also featured in the quickstart vignette due to the low number of cells. For more information, please refer to the corresponding publication: https://doi.org/10.1098/rspb.2020.2793

2. General Information

2.1 Input formats

The VGM takes three different input formats that will be covered in the respective sections: 1. Local Paths to cellranger output files (covered below) 2. Data.in list input from either PlatypusDB_fetch() or PlatypusDB_load_from_disk() (covered in the PlatypusDB vignette) This is for the case where a user would like to download raw PlatypusDB datasets and integrate these with local data. 3. A processed Seurat object as Seurat.in (Covered in section 3.1.4) This is for the case that a user would like to use an existing seurat object, which may be desired when using custom normalization and integration methods for GEX data.

Local paths should be provided to cellranger directories which… …for GEX: Corresponds to the “outs” folder from cellranger count function. Under default parameters, the directory supplied as input to the VGM function should contain the filtered_feature_annotations folder from 10x cellranger count. …for VDJ: Corresponds to the “outs” folder from the cellranger vdj function. Under default parameters, the directory supplied as input to the VGM function should contain files such as clonotypes.csv 10x cellranger vdj.

Below is an example of a basic run. The user will need to change the input directory to their own local output files from cellranger.

#Creating a list with local paths to cellranger directories

VDJ.out.directory.list <-  
        list("C:/Users/PlatypusDB/yermanos2021b__VDJ_RAW/Aged.CNS.pool.3m.Bcell.S1",                     "C:/Users/PlatypusDB/yermanos2021b__VDJ_RAW/Aged.CNS.pool.12m.Bcell.S2")
GEX.out.directory.list <- 
        list("C:/Users/PlatypusDB/yermanos2021b__GEX_RAW/Aged.CNS.pool.3m.Bcell.S1",                     "C:/Users/PlatypusDB/yermanos2021b__GEX_RAW/Aged.CNS.pool.12m.Bcell.S2")


#Running the VDJ_GEX_matrix function
vgm <- VDJ_GEX_matrix(VDJ.out.directory.list = VDJ.out.directory.list,
                      GEX.out.directory.list = GEX.out.directory.list,
                      verbose = T) #For more detailed runtime messages
## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session

The function can also be run with just GEX or VDJ input. The output object will have the same format as if both GEX and VDJ folders were supplied as input to ensure compatibility with all downstream functions. If this is needed, please provide only the desired input

#Only VDJ run
vgm <- VDJ_GEX_matrix(VDJ.out.directory.list = VDJ.out.directory.list) 

#Only GEX run
vgm <- VDJ_GEX_matrix(GEX.out.directory.list = GEX.out.directory.list) 

2.2 Output format

Irregardless of the input, the output format stays the same. This allows compatibility with all other downstream Platypus as well as custom functions.

names(vgm)
## [1] "VDJ"            "GEX"            "VDJ.GEX.stats"  "Running params"
## [5] "sessionInfo"

The VGM output is a list of 5 elements as seen above. Certain downstream functions may add additional list elements but will always maintain the first 5.

VDJ is a data.frame with standard column output

head(vgm[[1]])
##               barcode sample_id group_id clonotype_id_10x clonotype_id
## 1 s1_AAACGGGGTTTAGGAA        s1        1       clonotype7   clonotype7
## 2 s1_AAGGCAGTCTCTGCTG        s1        1      clonotype58  clonotype58
## 3 s1_ACAGCTACAGTCGTGC        s1        1      clonotype65  clonotype65
## 4 s1_ACTTGTTGTACTTAGC        s1        1      clonotype43  clonotype43
## 5 s1_AGAGCTTGTCATGCAT        s1        1      clonotype33  clonotype33
## 6 s1_AGAGTGGAGGAGTCTG        s1        1      clonotype49  clonotype49
##   clonotype_frequency celltype Nr_of_VDJ_chains Nr_of_VJ_chains
## 1                   1   B cell                1               1
## 2                   1   B cell                1               1
## 3                   1   B cell                1               1
## 4                   1   B cell                1               1
## 5                   1   B cell                1               1
## 6                   1   B cell                1               1
##         VDJ_cdr3s_aa  VJ_cdr3s_aa
## 1        CARIGYAMDYW  CQQGNTLPPTF
## 2  CARDFTTVVARGYFDVW  CQQDYSSPWTF
## 3 CARGITTVVAYYYAMDYW   CLQYDNLYTF
## 4     CTTGPYDYYAMDYW  CQQHYSTPYTF
## 5      CARPHDYDGVDYW CSQSTHVPPWTF
## 6    CARRYYSNYAWFAYW  CQQWSSYPYTF
##                                             VDJ_cdr3s_nt
## 1                      TGTGCTCGAATAGGATATGCTATGGACTACTGG
## 2    TGTGCAAGAGACTTTACTACGGTAGTAGCCCGGGGGTACTTCGATGTCTGG
## 3 TGTGCAAGAGGGATTACTACGGTAGTAGCTTATTACTATGCTATGGACTACTGG
## 4             TGTACTACGGGCCCTTATGATTACTATGCTATGGACTACTGG
## 5                TGTGCAAGGCCCCATGATTACGACGGAGTTGACTACTGG
## 6          TGTGCAAGACGCTACTATAGTAACTACGCCTGGTTTGCTTACTGG
##                            VJ_cdr3s_nt VDJ_umis VJ_umis
## 1    TGCCAACAGGGTAATACGCTTCCTCCGACGTTC        2       8
## 2    TGTCAGCAGGATTATAGCTCTCCGTGGACGTTC        1      11
## 3       TGTCTACAGTATGATAATCTGTACACGTTC       51     163
## 4    TGTCAGCAACATTATAGCACTCCGTACACGTTC        2       9
## 5 TGCTCTCAAAGTACACATGTTCCTCCGTGGACGTTC       20      58
## 6    TGCCAGCAGTGGAGTAGTTACCCGTACACGTTC        7       8
##              VDJ_chain_contig             VJ_chain_contig VDJ_chain VJ_chain
## 1 AAACGGGGTTTAGGAA-1_contig_2 AAACGGGGTTTAGGAA-1_contig_1       IGH      IGK
## 2 AAGGCAGTCTCTGCTG-1_contig_2 AAGGCAGTCTCTGCTG-1_contig_1       IGH      IGK
## 3 ACAGCTACAGTCGTGC-1_contig_1 ACAGCTACAGTCGTGC-1_contig_2       IGH      IGK
## 4 ACTTGTTGTACTTAGC-1_contig_1 ACTTGTTGTACTTAGC-1_contig_2       IGH      IGK
## 5 AGAGCTTGTCATGCAT-1_contig_2 AGAGCTTGTCATGCAT-1_contig_1       IGH      IGK
## 6 AGAGTGGAGGAGTCTG-1_contig_2 AGAGTGGAGGAGTCTG-1_contig_1       IGH      IGK
##   VDJ_vgene  VJ_vgene VDJ_dgene VDJ_jgene VJ_jgene VDJ_cgene VJ_cgene
## 1   IGHV8-8 IGKV10-96               IGHJ4    IGKJ1      IGHM     IGKC
## 2   IGHV1-9  IGKV6-32               IGHJ1    IGKJ1      IGHD     IGKC
## 3  IGHV1-26 IGKV19-93               IGHJ4    IGKJ2      IGHD     IGKC
## 4  IGHV14-1  IGKV8-24               IGHJ4    IGKJ2      IGHD     IGKC
## 5  IGHV5-17 IGKV1-110               IGHJ2    IGKJ1      IGHM     IGKC
## 6   IGHV1-9  IGKV4-55   IGHD2-5     IGHJ3    IGKJ2      IGHM     IGKC
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 VDJ_sequence_nt_raw
## 1                                                                                                                                                      TGGGAAGTGTGCAGCCATGGGCAGGCTTACTTCTTCATTCCTGTTACTGATTGTCCCTGCATATGTCCTGTCCCAGGTTACTCTGAAAGAGTCTGGCCCTGGGATATTGCAGCCCTCCCAGACCCTCAGTCTGACTTGTTCTTTCTCTGGGTTTTCACTGAGCACTTTTGGTATGGGTGTAGGCTGGATTCGTCAGCCTTCAGGGAAGGGTCTGGAGTGGCTGGCACACATTTGGTGGGATGATGATAAGTACTATAACCCAGCCCTGAAGAGTCGGCTCACAATCTCCAAGGATACCTCCAAAAACCAGGTATTCCTCAAGATCGCCAATGTGGACACTGCAGATACTGCCACATACTACTGTGCTCGAATAGGATATGCTATGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA
## 2                                                                                                           TTGGGGAGTGTCCTCTCCAAAGTCCTTGAACATAGACTCTAACCATGGAATGGACCTGGGTCTTTCTCTTCCTCCTGTCAGTAACTGCAGGTGTCCACTCCCAGGTTCAGCTGCAGCAGTCTGGAGCTGAGCTGATGAAGCCTGGGGCCTCAGTGAAGCTTTCCTGCAAGGCTACTGGCTACACATTCACTGGCTACTGGATAGAGTGGGTAAAGCAGAGGCCTGGACATGGCCTTGAGTGGATTGGAGAGATTTTACCTGGAAGTGGTAGTACTAACTACAATGAGAAGTTCAAGGGCAAGGCCACATTCACTGCAGATACATCCTCCAACACAGCCTACATGCAACTCAGCAGCCTGACAACTGAGGACTCTGCCATCTATTACTGTGCAAGAGACTTTACTACGGTAGTAGCCCGGGGGTACTTCGATGTCTGGGGCACAGGGACCACGGTCACCGTCTCCTCAGGTAATGAAAAGGGACCTGACATGTTCCTCCTCTCAGAGTGCAAAGCCCCAGAGGAAAATGAAAAGATAAACCTGGGCTGTTTAGTAATTGGAAGTCAGCCACTGAAAATCAGCTGGGAGCCAAAGAAGTCAAGTATAGTTGAACATGTCTTCCCCTCTGAAATGAGAAATGGCAATTATACAATGGTCCTCCAGGTCACTGTGCTGGCCTC
## 3                                                                                              GGGAACATATGTACAATGTCCTCACCACAGACACTGAACACACTGACTCTAACCATGGGATGGAGCTGGATCTTTCTCTTTCTCCTGTCAGGAACTGCAGGTGTCCTCTCTGAGGTCCAGCTGCAACAATCTGGACCTGAGCTGGTGAAGCCTGGGGCTTCAGTGAAGATATCCTGTAAGGCTTCTGGATACACGTTCACTGACTACTACATGAACTGGGTGAAGCAGAGCCATGGAAAGAGCCTTGAGTGGATTGGAGATATTAATCCTAACAATGGTGGTACTAGCTACAACCAGAAGTTCAAGGGCAAGGCCACATTGACTGTAGACAAGTCCTCCAGCACAGCCTACATGGAGCTCCGCAGCCTGACATCTGAGGACTCTGCAGTCTATTACTGTGCAAGAGGGATTACTACGGTAGTAGCTTATTACTATGCTATGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCAGGTAATGAAAAGGGACCTGACATGTTCCTCCTCTCAGAGTGCAAAGCCCCAGAGGAAAATGAAAAGATAAACCTGGGCTGTTTAGTAATTGGAAGTCAGCCACTGAAAATCAGCTGGGAGCCAAAGAAGTCAAGTATAGTTGAACATGTCTTCCCCTCTGAAATGAGAAATGGCAATTATACAATGGTCCTCCAGGTCACTGTGCTGGCCTC
## 4 GGGACTCAACTTCCTTCTTCTCCAGCCAGAATGTCCTTATGTAAGAAAGATCCTGTATGCAAATCATGTGAGACTGTGATGATTAATATAGGGATATCCACACCAAACATCATATGAGCCCTGTCTTCTCTACAGCCACTGAATCTCAAGATCCTTACAATGAAATGCAGCTGGGTCATCTTCTTCCTGATGGCAGTGGTTACAGGGGTCAATTCAGAGGTTCAGCTGCAGCAGTCTGGGGCAGAGCTTGTGAGGCCAGGGGCCTCAGTCAAGTTGTCCTGCACAGCTTCTGGCTTCAACATTAAAGACTACTATATGCACTGGGTGAAGCAGAGGCCTGAACAGGGCCTGGAGTGGATTGGAAGGATTGATCCTGAGGATGGTGATACTGAATATGCCCCGAAGTTCCAGGGCAAGGCCACTATGACTGCAGACACATCCTCCAACACAGCCTACCTGCAGCTCAGCAGCCTGACATCTGAGGACACTGCCGTCTATTACTGTACTACGGGCCCTTATGATTACTATGCTATGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCAGGTAATGAAAAGGGACCTGACATGTTCCTCCTCTCAGAGTGCAAAGCCCCAGAGGAAAATGAAAAGATAAACCTGGGCTGTTTAGTAATTGGAAGTCAGCCACTGAAAATCAGCTGGGAGCCAAAGAAGTCAAGTATAGTTGAACATGTCTTCCCCTCTGAAATGAGAAATGGCAATTATACAATGGTCCTCCAGGTCACTGTGCTGGCCTC
## 5                                                                                                             TGGATTCCCAGGTCCTCACATTCAGTGATCAGCACTGAACACAGACCACTCACCATGGACTCCAGGCTCAATTTAGTTTTCCTTGTCCTTATTTTAAAAGGTGTCCAGTGTGAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTAGTGAAGCCTGGAGGGTCCCTGAAACTCTCCTGTGCAGCCTCTGGATTCACTTTCAGTGACTATGGAATGCACTGGGTTCGTCAGGCTCCAGAGAAGGGGCTGGAGTGGGTTGCATACATTAGTAGTGGCAGTAGTACCATCTACTATGCAGACACAGTGAAGGGCCGATTCACCATCTCCAGAGACAATGCCAAGAACACCCTGTTCCTGCAAATGACCAGTCTGAGGTCTGAGGACACGGCCATGTATTACTGTGCAAGGCCCCATGATTACGACGGAGTTGACTACTGGGGCCAAGGCACCACTCTCACAGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA
## 6                                                                                                       TGGGGAGCATATGATCAGTGTCCTCTCCAAAGTCCTTGAACATAGACTCTAACCATGGAATGGACCTGGGTCTTTCTCTTCCTCCTGTCAGTAACTGCAGGTGTCCACTCCCAGGTTCAGCTGCAGCAGTCTGGAGCTGAGCTGATGAAGCCTGGGGCCTCAGTGAAGCTTTCCTGCAAGGCTACTGGCTACACATTCACTGGCTACTGGATAGAGTGGGTAAAGCAGAGGCCTGGACATGGCCTTGAGTGGATTGGAGAGATTTTACCTGGAAGTGGTAGTACTAACTACAATGAGAAGTTCAAGGGCAAGGCCACATTCACTGCAGATACATCCTCCAACACAGCCTACATGCAACTCAGCAGCCTGACAACTGAGGACTCTGCCATCTATTACTGTGCAAGACGCTACTATAGTAACTACGCCTGGTTTGCTTACTGGGGCCAAGGGACTCTGGTCACTGTCTCTGCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        VJ_sequence_nt_raw
## 1                                                                                 TTGGGGCATTGTAATTGAAGTCAAGACTCAGCCTGGACATGATGTCCTCTGCTCAGTTCCTTGGTCTCCTGTTGCTCTGTTTTCAAGGTACCAGATGTGATATCCAGATGACACAGACTACATCCTCCCTGTCTGCCTCTCTGGGAGACAGAGTCACCATCAGTTGCAGGGCAAGTCAGGACATTAGCAATTATTTAAACTGGTATCAGCAGAAACCAGATGGAACTGTTAAACTCCTGATCTACTACACATCAAGATTACACTCAGGAGTCCCATCAAGGTTCAGTGGCAGTGGGTCTGGAACAGATTATTCTCTCACCATTAGCAACCTGGAGCAAGAAGATATTGCCACTTACTTTTGCCAACAGGGTAATACGCTTCCTCCGACGTTCGGTGGAGGCACCAAGCTGGAAATCAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC
## 2                                                                                                    GGCAGGCAAGGGCATCAAGATGAAGTCACAGACCCAGGTCTTCGTATTTCTACTGCTCTGTGTGTCTGGTGCTCATGGGAGTATTGTGATGACCCAGACTCCCAAATTCCTGCTTGTATCAGCAGGAGACAGGGTTACCATAACCTGCAAGGCCAGTCAGAGTGTGAGTAATGATGTAGCTTGGTACCAACAGAAGCCAGGGCAGTCTCCTAAACTGCTGATATACTATGCATCCAATCGCTACACTGGAGTCCCTGATCGCTTCACTGGCAGTGGATATGGGACGGATTTCACTTTCACCATCAGCACTGTGCAGGCTGAAGACCTGGCAGTTTATTTCTGTCAGCAGGATTATAGCTCTCCGTGGACGTTCGGTGGAGGCACCAAGCTGGAAATCAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC
## 3                                                                                  TTATTTGGGGAGTCATTCTTGGTCAGGAGACGTTGTAGAAATGAGACCGTCTATTCAGTTCCTGGGGCTCTTGTTGTTCTGGCTTCATGGTGCTCAGTGTGACATCCAGATGACACAGTCTCCATCCTCACTGTCTGCATCTCTGGGAGGCAAAGTCACCATCACTTGCAAGGCAAGCCAAGACATTAACAAGTATATAGCTTGGTACCAACACAAGCCTGGAAAAGGTCCTAGGCTGCTCATACATTACACATCTACATTACAGCCAGGCATCCCATCAAGGTTCAGTGGAAGTGGGTCTGGGAGAGATTATTCCTTCAGCATCAGCAACCTGGAGCCTGAAGATATTGCAACTTATTATTGTCTACAGTATGATAATCTGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC
## 4 TTCCGATCTACTTGTTGTACTTAGCCTGTGTTCACATTCTTATTTGGGGAGTGTTGCTGGTGTCCAGGCATGATGAGCATCAGACAGGCTGGGCAGCAAGATGGAATCACAGACCCAGGTCCTCATGTTTCTTCTGCTCTGGGTATCTGGTGCCTGTGCAGACATTGTGATGACACAGTCTCCATCCTCCCTGGCTATGTCAGTAGGACAGAAGGTCACTATGAGCTGCAAGTCCAGTCAGAGCCTTTTAAATAGTAGCAATCAAAAGAACTATTTGGCCTGGTACCAGCAGAAACCAGGACAGTCTCCTAAACTTCTGGTATACTTTGCATCCACTAGGGAATCTGGGGTCCCTGATCGCTTCATAGGCAGTGGATCTGGGACAGATTTCACTCTTACCATCAGCAGTGTGCAGGCTGAAGACCTGGCAGATTACTTCTGTCAGCAACATTATAGCACTCCGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC
## 5                   AGAGCTTGTCATGCATTTTGGGCCTATTTCTTTTTTGGGGACTGATCAGTCTCCTCAGGCTGTCTCCTCAGGTTGCCTCCTCAAAATGAAGTTGCCTGTTAGGCTGTTGGTGCTGATGTTCTGGATTCCTGCTTCCAGCAGTGATGTTGTGATGACCCAAACTCCACTCTCCCTGCCTGTCAGTCTTGGAGATCAAGCCTCCATCTCTTGCAGATCTAGTCAGAGCCTTGTACACAGTAATGGAAACACCTATTTACATTGGTACCTGCAGAAGCCAGGCCAGTCTCCAAAGCTCCTGATCTACAAAGTTTCCAACCGATTTTCTGGGGTCCCAGACAGGTTCAGTGGCAGTGGATCAGGGACAGATTTCACACTCAAGATCAGCAGAGTGGAGGCTGAGGATCTGGGAGTTTATTTCTGCTCTCAAAGTACACATGTTCCTCCGTGGACGTTCGGTGGAGGCACCAAGCTGGAAATCAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC
## 6                                                             TTGGGGACTTATGAGAATAGTAGTAATTAGCTAGGGACCAAAGTTCAAAGACAAAATGGATTTTCAAGTGCAGATTTTCAGCTTCCTGCTAATCAGTGCCTCAGTCATACTGTCCAGAGGACAAATTGTTCTCACCCAGTCTCCAGCAATCATGTCTGCATCTCCAGGGGAGAAGGTCACCATGACCTGCAGTGCCAGCTCAAGTGTAAGTTACATGTACTGGTACCAGCAGAAGCCAGGATCCTCCCCCAGACTCCTGATTTATGACACATCCAACCTGGCTTCTGGAGTCCCTGTTCGCTTCAGTGGCAGTGGGTCTGGGACCTCTTACTCTCTCACAATCAGCCGAATGGAGGCTGAAGATGCTGCCACTTATTACTGCCAGCAGTGGAGTAGTTACCCGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC
##   VDJ_sequence_nt_trimmed VJ_sequence_nt_trimmed VDJ_sequence_aa VJ_sequence_aa
## 1                                                                              
## 2                                                                              
## 3                                                                              
## 4                                                                              
## 5                                                                              
## 6                                                                              
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 VDJ_raw_ref
## 1                                          ATGGGCAGGCTTACTTCTTCATTCCTGTTACTGATTGTCCCTGCATATGTCCTGTCCCAGGTTACTCTGAAAGAGTCTGGCCCTGGGATATTGCAGCCCTCCCAGACCCTCAGTCTGACTTGTTCTTTCTCTGGGTTTTCACTGAGCACTTTTGGTATGGGTGTAGGCTGGATTCGTCAGCCTTCAGGGAAGGGTCTGGAGTGGCTGGCACACATTTGGTGGGATGATGATAAGTACTATAACCCAGCCCTGAAGAGTCGGCTCACAATCTCCAAGGATACCTCCAAAAACCAGGTATTCCTCAAGATCGCCAATGTGGACACTGCAGATACTGCCACATACTACTGTGCTCGAATAGATTACTATGCTATGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT
## 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 ATGGAATGGACCTGGGTCTTTCTCTTCCTCCTGTCAGTAACTGCAGGTGTCCACTCCCAGGTTCAGCTGCAGCAGTCTGGAGCTGAGCTGATGAAGCCTGGGGCCTCAGTGAAGCTTTCCTGCAAGGCTACTGGCTACACATTCACTGGCTACTGGATAGAGTGGGTAAAGCAGAGGCCTGGACATGGCCTTGAGTGGATTGGAGAGATTTTACCTGGAAGTGGTAGTACTAACTACAATGAGAAGTTCAAGGGCAAGGCCACATTCACTGCAGATACATCCTCCAACACAGCCTACATGCAACTCAGCAGCCTGACAACTGAGGACTCTGCCATCTATTACTGTGCAAGACTACTGGTACTTCGATGTCTGGGGCACAGGGACCACGGTCACCGTCTCCTCAGGTAATGAAAAGGGACCTGACATGTTCCTCCTCTCAGAGTGCAAAGCCCCAGAGGAAAATGAAAAGATAAACCTGGGCTGTTTAGTAATTGGAAGTCAGCCACTGAAAATCAGCTGGGAGCCAAAGAAGTCAAGTATAGTTGAACATGTCTTCCCCTCTGAAATGAGAAATGGCAATTATACAATGGTCCTCCAGGTCACTGTGCTGGCCTCAGAACTGAACCTCAACCACACTTGCACCATAAATAAACCCAAAAGGAAAGAAAAACCTTTCAAGTTTCCTGAGTCATGGGATTCCCAGTCCTCTAAGAGAGTCACTCCAACTCTCCAAGCAAAGAATCACTCCACAGAAGCCACCAAAGCTATTACCACCAAAAAGGACATAGAAGGGGCCATGGCACCCAGCAACCTCACTGTGAACATCCTGACCACATCCACCCATCCTGAGATGTCATCTTGGCTCCTGTGTGAAGTATCTGGCTTCTTCCCCGAAAATATCCACCTCATGTGGCTGAGTGTCCACAGTAAAATGAAGTCTACAAACTTTGTCACTGCAAACCCCACCCCCCAGCCTGGGGGCACATTCCAGACCTGGAGTGTCCTGAGACTACCAGTCGCTCTGAGCTCATCACTTGACACTTACACATGTGTGGTGGAACATGAGGCCTCAAAGACAAAGCTTAATGCCAGCAAGAGCCTAGCAATTAGTGGATGCTACCACCTCCTGCCTGAGTCAGACGGTCCTTCCAGGAGACCTGATGGTCCTGCCCTTGCC
## 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                ATGGGATGGAGCTGGATCTTTCTCTTTCTCCTGTCAGGAACTGCAGGTGTCCTCTCTGAGGTCCAGCTGCAACAATCTGGACCTGAGCTGGTGAAGCCTGGGGCTTCAGTGAAGATATCCTGTAAGGCTTCTGGATACACGTTCACTGACTACTACATGAACTGGGTGAAGCAGAGCCATGGAAAGAGCCTTGAGTGGATTGGAGATATTAATCCTAACAATGGTGGTACTAGCTACAACCAGAAGTTCAAGGGCAAGGCCACATTGACTGTAGACAAGTCCTCCAGCACAGCCTACATGGAGCTCCGCAGCCTGACATCTGAGGACTCTGCAGTCTATTACTGTGCAAGAATTACTATGCTATGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCAGGTAATGAAAAGGGACCTGACATGTTCCTCCTCTCAGAGTGCAAAGCCCCAGAGGAAAATGAAAAGATAAACCTGGGCTGTTTAGTAATTGGAAGTCAGCCACTGAAAATCAGCTGGGAGCCAAAGAAGTCAAGTATAGTTGAACATGTCTTCCCCTCTGAAATGAGAAATGGCAATTATACAATGGTCCTCCAGGTCACTGTGCTGGCCTCAGAACTGAACCTCAACCACACTTGCACCATAAATAAACCCAAAAGGAAAGAAAAACCTTTCAAGTTTCCTGAGTCATGGGATTCCCAGTCCTCTAAGAGAGTCACTCCAACTCTCCAAGCAAAGAATCACTCCACAGAAGCCACCAAAGCTATTACCACCAAAAAGGACATAGAAGGGGCCATGGCACCCAGCAACCTCACTGTGAACATCCTGACCACATCCACCCATCCTGAGATGTCATCTTGGCTCCTGTGTGAAGTATCTGGCTTCTTCCCCGAAAATATCCACCTCATGTGGCTGAGTGTCCACAGTAAAATGAAGTCTACAAACTTTGTCACTGCAAACCCCACCCCCCAGCCTGGGGGCACATTCCAGACCTGGAGTGTCCTGAGACTACCAGTCGCTCTGAGCTCATCACTTGACACTTACACATGTGTGGTGGAACATGAGGCCTCAAAGACAAAGCTTAATGCCAGCAAGAGCCTAGCAATTAGTGGATGCTACCACCTCCTGCCTGAGTCAGACGGTCCTTCCAGGAGACCTGATGGTCCTGCCCTTGCC
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                ATGAAATGCAGCTGGGTCATCTTCTTCCTGATGGCAGTGGTTACAGGGGTCAATTCAGAGGTTCAGCTGCAGCAGTCTGGGGCAGAGCTTGTGAGGCCAGGGGCCTCAGTCAAGTTGTCCTGCACAGCTTCTGGCTTCAACATTAAAGACTACTATATGCACTGGGTGAAGCAGAGGCCTGAACAGGGCCTGGAGTGGATTGGAAGGATTGATCCTGAGGATGGTGATACTGAATATGCCCCGAAGTTCCAGGGCAAGGCCACTATGACTGCAGACACATCCTCCAACACAGCCTACCTGCAGCTCAGCAGCCTGACATCTGAGGACACTGCCGTCTATTACTGTACTACAATTACTATGCTATGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCAGGTAATGAAAAGGGACCTGACATGTTCCTCCTCTCAGAGTGCAAAGCCCCAGAGGAAAATGAAAAGATAAACCTGGGCTGTTTAGTAATTGGAAGTCAGCCACTGAAAATCAGCTGGGAGCCAAAGAAGTCAAGTATAGTTGAACATGTCTTCCCCTCTGAAATGAGAAATGGCAATTATACAATGGTCCTCCAGGTCACTGTGCTGGCCTCAGAACTGAACCTCAACCACACTTGCACCATAAATAAACCCAAAAGGAAAGAAAAACCTTTCAAGTTTCCTGAGTCATGGGATTCCCAGTCCTCTAAGAGAGTCACTCCAACTCTCCAAGCAAAGAATCACTCCACAGAAGCCACCAAAGCTATTACCACCAAAAAGGACATAGAAGGGGCCATGGCACCCAGCAACCTCACTGTGAACATCCTGACCACATCCACCCATCCTGAGATGTCATCTTGGCTCCTGTGTGAAGTATCTGGCTTCTTCCCCGAAAATATCCACCTCATGTGGCTGAGTGTCCACAGTAAAATGAAGTCTACAAACTTTGTCACTGCAAACCCCACCCCCCAGCCTGGGGGCACATTCCAGACCTGGAGTGTCCTGAGACTACCAGTCGCTCTGAGCTCATCACTTGACACTTACACATGTGTGGTGGAACATGAGGCCTCAAAGACAAAGCTTAATGCCAGCAAGAGCCTAGCAATTAGTGGATGCTACCACCTCCTGCCTGAGTCAGACGGTCCTTCCAGGAGACCTGATGGTCCTGCCCTTGCC
## 5 TGGATTCCCAGGTCCTCACATTCAGTGATCAGCACTGAACACAGACCACTCACCATGGACTCCAGGCTCAATTTAGTTTTCCTTGTCCTTATTTTAAAAGGTGTCCAGTGTGAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTAGTGAAGCCTGGAGGGTCCCTGAAACTCTCCTGTGCAGCCTCTGGATTCACTTTCAGTGACTATGGAATGCACTGGGTTCGTCAGGCTCCAGAGAAGGGGCTGGAGTGGGTTGCATACATTAGTAGTGGCAGTAGTACCATCTACTATGCAGACACAGTGAAGGGCCGATTCACCATCTCCAGAGACAATGCCAAGAACACCCTGTTCCTGCAAATGACCAGTCTGAGGTCTGAGGACACGGCCATGTATTACTGTGCAAGGACTACTTTGACTACTGGGGCCAAGGCACCACTCTCACAGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT
## 6                                      ATGGAATGGACCTGGGTCTTTCTCTTCCTCCTGTCAGTAACTGCAGGTGTCCACTCCCAGGTTCAGCTGCAGCAGTCTGGAGCTGAGCTGATGAAGCCTGGGGCCTCAGTGAAGCTTTCCTGCAAGGCTACTGGCTACACATTCACTGGCTACTGGATAGAGTGGGTAAAGCAGAGGCCTGGACATGGCCTTGAGTGGATTGGAGAGATTTTACCTGGAAGTGGTAGTACTAACTACAATGAGAAGTTCAAGGGCAAGGCCACATTCACTGCAGATACATCCTCCAACACAGCCTACATGCAACTCAGCAGCCTGACAACTGAGGACTCTGCCATCTATTACTGTGCAAGACCTACTATAGTAACTACCCTGGTTTGCTTACTGGGGCCAAGGGACTCTGGTCACTGTCTCTGCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             VJ_raw_ref
## 1                    ATGATGTCCTCTGCTCAGTTCCTTGGTCTCCTGTTGCTCTGTTTTCAAGGTACCAGATGTGATATCCAGATGACACAGACTACATCCTCCCTGTCTGCCTCTCTGGGAGACAGAGTCACCATCAGTTGCAGGGCAAGTCAGGACATTAGCAATTATTTAAACTGGTATCAGCAGAAACCAGATGGAACTGTTAAACTCCTGATCTACTACACATCAAGATTACACTCAGGAGTCCCATCAAGGTTCAGTGGCAGTGGGTCTGGAACAGATTATTCTCTCACCATTAGCAACCTGGAGCAAGAAGATATTGCCACTTACTTTTGCCAACAGGGTAATACGCTTCCTCCGTGGACGTTCGGTGGAGGCACCAAGCTGGAAATCAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGAACAACTTCTACCCCAAAGACATCAATGTCAAGTGGAAGATTGATGGCAGTGAACGACAAAATGGCGTCCTGAACAGTTGGACTGATCAGGACAGCAAAGACAGCACCTACAGCATGAGCAGCACCCTCACGTTGACCAAGGACGAGTATGAACGACATAACAGCTATACCTGTGAGGCCACTCACAAGACATCAACTTCACCCATTGTCAAGAGCTTCAACAGGAATGAGTGT
## 2 GGCAGGCAAGGGCATCAAGATGAAGTCACAGACCCAGGTCTTCGTATTTCTACTGCTCTGTGTGTCTGGTGCTCATGGGAGTATTGTGATGACCCAGACTCCCAAATTCCTGCTTGTATCAGCAGGAGACAGGGTTACCATAACCTGCAAGGCCAGTCAGAGTGTGAGTAATGATGTAGCTTGGTACCAACAGAAGCCAGGGCAGTCTCCTAAACTGCTGATATACTATGCATCCAATCGCTACACTGGAGTCCCTGATCGCTTCACTGGCAGTGGATATGGGACGGATTTCACTTTCACCATCAGCACTGTGCAGGCTGAAGACCTGGCAGTTTATTTCTGTCAGCAGGATTATAGCTCTCCTCCGTGGACGTTCGGTGGAGGCACCAAGCTGGAAATCAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGAACAACTTCTACCCCAAAGACATCAATGTCAAGTGGAAGATTGATGGCAGTGAACGACAAAATGGCGTCCTGAACAGTTGGACTGATCAGGACAGCAAAGACAGCACCTACAGCATGAGCAGCACCCTCACGTTGACCAAGGACGAGTATGAACGACATAACAGCTATACCTGTGAGGCCACTCACAAGACATCAACTTCACCCATTGTCAAGAGCTTCAACAGGAATGAGTGT
## 3                   ATGAGACCGTCTATTCAGTTCCTGGGGCTCTTGTTGTTCTGGCTTCATGGTGCTCAGTGTGACATCCAGATGACACAGTCTCCATCCTCACTGTCTGCATCTCTGGGAGGCAAAGTCACCATCACTTGCAAGGCAAGCCAAGACATTAACAAGTATATAGCTTGGTACCAACACAAGCCTGGAAAAGGTCCTAGGCTGCTCATACATTACACATCTACATTACAGCCAGGCATCCCATCAAGGTTCAGTGGAAGTGGGTCTGGGAGAGATTATTCCTTCAGCATCAGCAACCTGGAGCCTGAAGATATTGCAACTTATTATTGTCTACAGTATGATAATCTTCTACCTGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGAACAACTTCTACCCCAAAGACATCAATGTCAAGTGGAAGATTGATGGCAGTGAACGACAAAATGGCGTCCTGAACAGTTGGACTGATCAGGACAGCAAAGACAGCACCTACAGCATGAGCAGCACCCTCACGTTGACCAAGGACGAGTATGAACGACATAACAGCTATACCTGTGAGGCCACTCACAAGACATCAACTTCACCCATTGTCAAGAGCTTCAACAGGAATGAGTGT
## 4 ATGGAATCACAGACCCAGGTCCTCATGTTTCTTCTGCTCTGGGTATCTGGTGCCTGTGCAGACATTGTGATGACACAGTCTCCATCCTCCCTGGCTATGTCAGTAGGACAGAAGGTCACTATGAGCTGCAAGTCCAGTCAGAGCCTTTTAAATAGTAGCAATCAAAAGAACTATTTGGCCTGGTACCAGCAGAAACCAGGACAGTCTCCTAAACTTCTGGTATACTTTGCATCCACTAGGGAATCTGGGGTCCCTGATCGCTTCATAGGCAGTGGATCTGGGACAGATTTCACTCTTACCATCAGCAGTGTGCAGGCTGAAGACCTGGCAGATTACTTCTGTCAGCAACATTATAGCACTCCTCCTGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGAACAACTTCTACCCCAAAGACATCAATGTCAAGTGGAAGATTGATGGCAGTGAACGACAAAATGGCGTCCTGAACAGTTGGACTGATCAGGACAGCAAAGACAGCACCTACAGCATGAGCAGCACCCTCACGTTGACCAAGGACGAGTATGAACGACATAACAGCTATACCTGTGAGGCCACTCACAAGACATCAACTTCACCCATTGTCAAGAGCTTCAACAGGAATGAGTGT
## 5        ATGAAGTTGCCTGTTAGGCTGTTGGTGCTGATGTTCTGGATTCCTGCTTCCAGCAGTGATGTTGTGATGACCCAAACTCCACTCTCCCTGCCTGTCAGTCTTGGAGATCAAGCCTCCATCTCTTGCAGATCTAGTCAGAGCCTTGTACACAGTAATGGAAACACCTATTTACATTGGTACCTGCAGAAGCCAGGCCAGTCTCCAAAGCTCCTGATCTACAAAGTTTCCAACCGATTTTCTGGGGTCCCAGACAGGTTCAGTGGCAGTGGATCAGGGACAGATTTCACACTCAAGATCAGCAGAGTGGAGGCTGAGGATCTGGGAGTTTATTTCTGCTCTCAAAGTACACATGTTCCTCCGTGGACGTTCGGTGGAGGCACCAAGCTGGAAATCAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGAACAACTTCTACCCCAAAGACATCAATGTCAAGTGGAAGATTGATGGCAGTGAACGACAAAATGGCGTCCTGAACAGTTGGACTGATCAGGACAGCAAAGACAGCACCTACAGCATGAGCAGCACCCTCACGTTGACCAAGGACGAGTATGAACGACATAACAGCTATACCTGTGAGGCCACTCACAAGACATCAACTTCACCCATTGTCAAGAGCTTCAACAGGAATGAGTGT
## 6              ATGGATTTTCAAGTGCAGATTTTCAGCTTCCTGCTAATCAGTGCCTCAGTCATACTGTCCAGAGGACAAATTGTTCTCACCCAGTCTCCAGCAATCATGTCTGCATCTCCAGGGGAGAAGGTCACCATGACCTGCAGTGCCAGCTCAAGTGTAAGTTACATGTACTGGTACCAGCAGAAGCCAGGATCCTCCCCCAGACTCCTGATTTATGACACATCCAACCTGGCTTCTGGAGTCCCTGTTCGCTTCAGTGGCAGTGGGTCTGGGACCTCTTACTCTCTCACAATCAGCCGAATGGAGGCTGAAGATGCTGCCACTTATTACTGCCAGCAGTGGAGTAGTTACCCACCCATGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGAACAACTTCTACCCCAAAGACATCAATGTCAAGTGGAAGATTGATGGCAGTGAACGACAAAATGGCGTCCTGAACAGTTGGACTGATCAGGACAGCAAAGACAGCACCTACAGCATGAGCAGCACCCTCACGTTGACCAAGGACGAGTATGAACGACATAACAGCTATACCTGTGAGGCCACTCACAAGACATCAACTTCACCCATTGTCAAGAGCTTCAACAGGAATGAGTGT
##   VDJ_trimmed_ref VJ_trimmed_ref     VDJ_raw_consensus_id
## 1                                 clonotype7_concat_ref_1
## 2                                clonotype58_concat_ref_1
## 3                                clonotype65_concat_ref_1
## 4                                clonotype43_concat_ref_1
## 5                                clonotype33_concat_ref_1
## 6                                clonotype49_concat_ref_1
##        VJ_raw_consensus_id     orig_barcode specifity affinity GEX_available
## 1  clonotype7_concat_ref_2 AAACGGGGTTTAGGAA        NA       NA         FALSE
## 2 clonotype58_concat_ref_2 AAGGCAGTCTCTGCTG        NA       NA          TRUE
## 3 clonotype65_concat_ref_2 ACAGCTACAGTCGTGC        NA       NA          TRUE
## 4 clonotype43_concat_ref_2 ACTTGTTGTACTTAGC        NA       NA          TRUE
## 5 clonotype33_concat_ref_2 AGAGCTTGTCATGCAT        NA       NA          TRUE
## 6 clonotype49_concat_ref_2 AGAGTGGAGGAGTCTG        NA       NA          TRUE
##      orig.ident seurat_clusters      PC_1      PC_2    UMAP_1    UMAP_2
## 1          <NA>            <NA>        NA        NA        NA        NA
## 2 SeuratProject               5 -6.257892 -2.522445 -4.813108 0.6205567
## 3 SeuratProject               0 -6.375512 -5.164284 -6.385532 1.9866081
## 4 SeuratProject               0 -7.476580 -4.585612 -7.730432 0.4828904
## 5 SeuratProject               0 -6.710098 -6.034031 -6.045865 3.3426221
## 6 SeuratProject               0 -6.667460 -2.615907 -5.995199 0.3335993
##     tSNE_1     tSNE_2     batches
## 1       NA         NA Unspecified
## 2 2.188638  -3.411929 Unspecified
## 3 6.511564  -7.350143 Unspecified
## 4 2.640636 -11.282818 Unspecified
## 5 9.943012  -6.504101 Unspecified
## 6 1.924303  -6.548645 Unspecified

The GEX object vgm[[2]] is a Seurat object. Metadata can be accessed as shown below. Depending on integration parameters (Section 2.3), the GEX object can also contain information from the immune receptor VDJ data.

Seurat::DimPlot(vgm[[2]])

names(vgm[[2]]@meta.data)
##  [1] "orig.ident"              "nCount_RNA"             
##  [3] "nFeature_RNA"            "orig_barcode"           
##  [5] "VDJ_available"           "sample_id"              
##  [7] "group_id"                "percent.mt"             
##  [9] "RNA_snn_res.0.5"         "seurat_clusters"        
## [11] "clonotype_id_10x"        "clonotype_id"           
## [13] "clonotype_frequency"     "celltype"               
## [15] "Nr_of_VDJ_chains"        "Nr_of_VJ_chains"        
## [17] "VDJ_cdr3s_aa"            "VJ_cdr3s_aa"            
## [19] "VDJ_cdr3s_nt"            "VJ_cdr3s_nt"            
## [21] "VDJ_umis"                "VJ_umis"                
## [23] "VDJ_chain_contig"        "VJ_chain_contig"        
## [25] "VDJ_chain"               "VJ_chain"               
## [27] "VDJ_vgene"               "VJ_vgene"               
## [29] "VDJ_dgene"               "VDJ_jgene"              
## [31] "VJ_jgene"                "VDJ_cgene"              
## [33] "VJ_cgene"                "VDJ_sequence_nt_raw"    
## [35] "VJ_sequence_nt_raw"      "VDJ_sequence_nt_trimmed"
## [37] "VJ_sequence_nt_trimmed"  "VDJ_sequence_aa"        
## [39] "VJ_sequence_aa"          "VDJ_raw_ref"            
## [41] "VJ_raw_ref"              "VDJ_trimmed_ref"        
## [43] "VJ_trimmed_ref"          "VDJ_raw_consensus_id"   
## [45] "VJ_raw_consensus_id"     "specifity"              
## [47] "affinity"                "PC_1"                   
## [49] "PC_2"                    "UMAP_1"                 
## [51] "UMAP_2"                  "tSNE_1"                 
## [53] "tSNE_2"                  "batches"

VDJ.GEX.stats is a table containing statistics about the processed datasets. This is useful for QC. Many of the values in this dataframe are imported from the metrics.csv tables provided by Cellranger. In case these tables are not available, the output will contain NA values.

The generation of this table can be toggled by setting get.VDJ.stats = F

names(vgm[[3]])
##  [1] "Repertoir path"                                    
##  [2] "Sample name"                                       
##  [3] "Nr unique barcodes"                                
##  [4] "Nr barcodes is_cell"                               
##  [5] "Nr cells 1VDJ 1VJ"                                 
##  [6] "Nr cells 1VDJ 0VJ"                                 
##  [7] "Nr cells 0VDJ 1VJ"                                 
##  [8] "Nr cells 2 or more VDJ 1VJ"                        
##  [9] "Nr cells 1VDJ 2 or more VJ"                        
## [10] "Nr cells 2 or more VDJ 2 or more VJ"               
## [11] "Nr cells full_length"                              
## [12] "Nr cells productive"                               
## [13] "Nr cells high_confidence"                          
## [14] "Nr cells all true"                                 
## [15] "Nr cells all true and 1VDJ 1VJ"                    
## [16] "Nr clonotypes"                                     
## [17] "Nr clonotypes 1VDJ 1VJ"                            
## [18] "Nr clonotypes < 1VDJ 1VJ"                          
## [19] "Nr clonotypes > 1VDJ 1VJ"                          
## [20] "% Nr unique barcodes"                              
## [21] "% Nr barcodes is_cell"                             
## [22] "% Nr cells 1VDJ 1VJ"                               
## [23] "% Nr cells 1VDJ 0VJ"                               
## [24] "% Nr cells 0VDJ 1VJ"                               
## [25] "% Nr cells 2 or more VDJ 1VJ"                      
## [26] "% Nr cells 1VDJ 2 or more VJ"                      
## [27] "% Nr cells 2 or more VDJ 2 or more VJ"             
## [28] "% Nr cells full_length"                            
## [29] "% Nr cells productive"                             
## [30] "% Nr cells high_confidence"                        
## [31] "% Nr cells all true"                               
## [32] "% Nr cells all true and 1VDJ 1VJ"                  
## [33] "% Nr clonotypes"                                   
## [34] "% Nr clonotypes 1VDJ 1VJ"                          
## [35] "% Nr clonotypes < 1VDJ 1VJ"                        
## [36] "% Nr clonotypes > 1VDJ 1VJ"                        
## [37] "Estimated.Number.of.Cells"                         
## [38] "Mean.Read.Pairs.per.Cell"                          
## [39] "Number.of.Cells.With.Productive.V.J.Spanning.Pair" 
## [40] "Number.of.Read.Pairs"                              
## [41] "Valid.Barcodes"                                    
## [42] "Q30.Bases.in.Barcode"                              
## [43] "Q30.Bases.in.RNA.Read.1"                           
## [44] "Q30.Bases.in.RNA.Read.2"                           
## [45] "Q30.Bases.in.UMI"                                  
## [46] "Reads.Mapped.to.Any.V.D.J.Gene"                    
## [47] "Reads.Mapped.to.IGH"                               
## [48] "Reads.Mapped.to.IGK"                               
## [49] "Reads.Mapped.to.IGL"                               
## [50] "Mean.Used.Read.Pairs.per.Cell"                     
## [51] "Fraction.Reads.in.Cells"                           
## [52] "Median.IGH.UMIs.per.Cell"                          
## [53] "Median.IGK.UMIs.per.Cell"                          
## [54] "Median.IGL.UMIs.per.Cell"                          
## [55] "Cells.With.Productive.V.J.Spanning.Pair"           
## [56] "Cells.With.Productive.V.J.Spanning..IGK..IGH..Pair"
## [57] "Cells.With.Productive.V.J.Spanning..IGL..IGH..Pair"
## [58] "Paired.Clonotype.Diversity"                        
## [59] "Cells.With.IGH.Contig"                             
## [60] "Cells.With.IGK.Contig"                             
## [61] "Cells.With.IGL.Contig"                             
## [62] "Cells.With.CDR3.annotated.IGH.Contig"              
## [63] "Cells.With.CDR3.annotated.IGK.Contig"              
## [64] "Cells.With.CDR3.annotated.IGL.Contig"              
## [65] "Cells.With.V.J.Spanning.IGH.Contig"                
## [66] "Cells.With.V.J.Spanning.IGK.Contig"                
## [67] "Cells.With.V.J.Spanning.IGL.Contig"                
## [68] "Cells.With.Productive.IGH.Contig"                  
## [69] "Cells.With.Productive.IGK.Contig"                  
## [70] "Cells.With.Productive.IGL.Contig"                  
## [71] "rep_id"
vgm <- VDJ_GEX_matrix(VDJ.out.directory.list = VDJ.out.directory.list,
                      GEX.out.directory.list = GEX.out.directory.list,
                      get.VDJ.stats = F) #Turn off VDJ stats 

The VGM also stores the parameter used during function call in case a user saves a VGM but deletes/overwrites the initial code….although this should not happen…right?

When the VGM is called under default parameters, the function input arguments can be located in the fourth list element.

vgm[[4]]
##                                                                                                                                sample.path.vdj 
## "C:/Users/PlatypusDB/yermanos2021b__VDJ_RAW/Aged.CNS.pool.3m.Bcell.S1 ; C:/Users/PlatypusDB/yermanos2021b__VDJ_RAW/Aged.CNS.pool.12m.Bcell.S2" 
##                                                                                                                              samples.paths.GEX 
## "C:/Users/PlatypusDB/yermanos2021b__GEX_RAW/Aged.CNS.pool.3m.Bcell.S1 ; C:/Users/PlatypusDB/yermanos2021b__GEX_RAW/Aged.CNS.pool.12m.Bcell.S2" 
##                                                                                                                          FB.out.directory.list 
##                                                                                                                                         "none" 
##                                                                                                                                    GEX.read.h5 
##                                                                                                                                        "FALSE" 
##                                                                                                                                    VDJ.combine 
##                                                                                                                                         "TRUE" 
##                                                                                                                                  GEX.integrate 
##                                                                                                                                         "TRUE" 
##                                                                                                                           integrate.GEX.to.VDJ 
##                                                                                                                                         "TRUE" 
##                                                                                                                           integrate.VDJ.to.GEX 
##                                                                                                                                         "TRUE" 
##                                                                                                                         exclude.GEX.not.in.VDJ 
##                                                                                                                                        "FALSE" 
##                                                                                                                filter.overlapping.barcodes.GEX 
##                                                                                                                                         "TRUE" 
##                                                                                                                filter.overlapping.barcodes.VDJ 
##                                                                                                                                         "TRUE" 
##                                                                                                                  exclude.on.cell.state.markers 
##                                                                                                                                         "none" 
##                                                                                                exclude.on.barcodes (TRUE if barcodes provided) 
##                                                                                                                                         "TRUE" 
##                                                                                                                                  get.VDJ.stats 
##                                                                                                                                         "TRUE" 
##                                                                                                                                       numcores 
##                                                                                                                                            "1" 
##                                                                                                                                 trim.and.align 
##                                                                                                                                        "FALSE" 
##                                                                                                                           append.raw.reference 
##                                                                                                                                         "TRUE" 
##                                                                                                              select.excess.chains.by.umi.count 
##                                                                                                                                        "FALSE" 
##                                                                                                        excess.chain.confidence.count.threshold 
##                                                                                                                                         "1000" 
##                                                                                                                              gap.opening.cost, 
##                                                                                                                                           "10" 
##                                                                                                                             gap.extension.cost 
##                                                                                                                                            "4" 
##                                                                                                                            parallel.processing 
##                                                                                                                                         "none" 
##                                                                                                                             integration.method 
##                                                                                                                                   "scale.data" 
##                                                                                                                                VDJ.gene.filter 
##                                                                                                                                         "TRUE" 
##                                                                                                                                    mito.filter 
##                                                                                                                                           "20" 
##                                                                                                                              norm.scale.factor 
##                                                                                                                                        "10000" 
##                                                                                                                                  n.feature.rna 
##                                                                                                                                            "0" 
##                                                                                                                                n.count.rna.min 
##                                                                                                                                            "0" 
##                                                                                                                                n.count.rna.max 
##                                                                                                                                          "Inf" 
##                                                                                                                            n.variable.features 
##                                                                                                                                         "2000" 
##                                                                                                                             cluster.resolution 
##                                                                                                                                          "0.5" 
##                                                                                                                                   neighbor.dim 
##                                                                                                                         "1;2;3;4;5;6;7;8;9;10" 
##                                                                                                                                        mds.dim 
##                                                                                                                         "1;2;3;4;5;6;7;8;9;10" 
##                                                                                                                             subsample.barcodes 
##                                                                                                                                        "FALSE" 
##                                                                                                                                       group.id 
##                                                                                                                                          "1;2" 
##                                                                                                                             FB.count.threshold 
##                                                                                                                                           "10" 
##                                                                                                                             FB.ratio.threshold 
##                                                                                                                                            "2"

The fifth element of the VGM contains the utils::sessionInfo() output to record the versions of R and accompanying packages used during the VGM creation.

class(vgm[[5]])
## [1] "sessionInfo"

2.3 Parameters for VDJ GEX integration

A key feature of Platypus is the direct pairing of VDJ and GEX data. This is currently achieved by VGM combining the relevant data and metadata from VDJ (vgm[[1]]) and GEX (vgm[[2]]) objects using the cell barcode and sample_id information.
Several parameters control this integration:

#Running the VDJ_GEX_matrix function
vgm <- VDJ_GEX_matrix(VDJ.out.directory.list = VDJ.out.directory.list,
                      GEX.out.directory.list = GEX.out.directory.list,
                      VDJ.combine = T, #Whether to combine all samples into one VDJ dataframe (is highly recommended)
                      GEX.integrate = T, #Whether to integrate all GEX samples. For integration methods see GEX section.  
                      integrate.GEX.to.VDJ = T, #Whether to copy GEX metadata into VDJdataframe 
                      integrate.VDJ.to.GEX = T, #Whether to copy VDJ data into GEX
                      exclude.GEX.not.in.VDJ = F) #Whether to exclude cells in GEX, for which no VDJ data is available. Set this to TRUE if you only want gene expression information for those cells with immune receptor sequences.

In some cases, cell barcodes may not be unique across samples. This may occur by pure chance, barcode hopping during library construction and sequencing, or due to low diversity of barcodes during capture. The VGM deals with this in two ways. Firstly, a sample-id prefix is appended to every barcode.

vgm[[1]]$barcode[1]
## [1] "s1_AAACGGGGTTTAGGAA"
colnames(vgm[[2]])[1]
## [1] "s1_AAAGATGAGTCCGGTC"

Second, the duplicated barcodes can be filtered out to prevent the emergence of unlikely public clones. This is necessary, for example, if a public clone is discovered in two distinct VDJ samples and have the exact same cell barcode. Public clones containing identical cell barcodes are highly unlikely given the massive potential barcode space. If this filtering is set to TRUE, the function will prompt a callback with the number of filtered cells.

#Running the VDJ_GEX_matrix function
vgm <- VDJ_GEX_matrix(VDJ.out.directory.list = VDJ.out.directory.list,
                      GEX.out.directory.list = GEX.out.directory.list,
                      filter.overlapping.barcodes.GEX = T,
                      filter.overlapping.barcodes.VDJ = T) 

3.1 GEX

The VGM function attempts to simplify the standard GEX processing and integration functions common to Seurat and Harmony packages, given the majority of immunological studies use this pipeline. Although we are not making the statement that all gene expression datasets should be processed using identical parameters, we find that this function simplifies the standard copy-paste from the Seurat website.

3.1.1 GEX QC filtering parameters

#Running the VDJ_GEX_matrix function
vgm <- VDJ_GEX_matrix(GEX.out.directory.list = GEX.out.directory.list, 
                      VDJ.gene.filter = T, #Remove all VDJ genes from GEX before clustering.
                      mito.filter = 20, #Remove all cells with a higher % of reads mapped to Mitochondiral genes. Data gathered via Seurat::PercentageFeatureSet(., pattern = "^MT-")
                      n.count.rna.min = 0, #Remove all cells with a total RNA count below this
                      n.count.rna.max = Inf,  #Remove all cells with a total RNA count above this
                      n.feature.rna = 0) #Remove all cells with a gene count lower than this

The default settings are meant to be inclusive, so to not impose filtering to on the dataset which is not directly apparent to the user.

3.1.2 GEX filtering by phenotype

The VGM also offers the option to filter cells based on their gene expression profiles. Removing unwanted cells before clustering can result in more accurate conclusions concerning the cell types of interest. For example, it is likely better to filter out non B and T cells if integrating and analyzing repertoire features such as clonal expansion.

The input format for the exclude.on.cell.state.markers argument is the same as to the GEX_phenotype function. In the example below, we filter out all CD14 positive cells as well as all CD3 epsilon and gamma double negative cells.

#Running the VDJ_GEX_matrix function
vgm <- VDJ_GEX_matrix(GEX.out.directory.list = GEX.out.directory.list, 
                      exclude.on.cell.state.markers = c("CD14+", "CD3E-;CD3G-")) #Remove all cells with a total RNA count above this
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at -2.091
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 0.49627
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 2.8445e-015
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 0.090619
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 118
## Number of edges: 3533
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.6521
## Number of communities: 3
## Elapsed time: 0 seconds
#Plotting this confirms the filtering 
Seurat::FeaturePlot(vgm[[2]], c("CD14","CD3E","CD3G"))
## Warning in Seurat::FeaturePlot(vgm[[2]], c("CD14", "CD3E", "CD3G")): All cells
## have the same value (0) of CD14.

As a workflow we first recommend clustering all cells and running a differential gene expression analysis by cluster using the GEX_cluster_genes() function, which reveals the cluster-defining genes of each cluster. In case that unwanted cells are present (e.g. a contamination of Neutrophils in a B cell dataset), the cluster signatures can be used to identify the best genes for filtering via the initial VGM call.

3.1.3 GEX integration methods

The VGM offers four methods of GEX dataset integration if GEX.integrate is set to TRUE:

1.”scale.data” integration is based on Seurat logNormalize followed by the ScaleData function based on found variable features (set using n.variable.features)

  1. “anchors” is an integration method which uses similar cell states to align to datasets. Extensive documentation is provided here: https://satijalab.org/seurat/articles/integration_introduction.html

  2. “sct” employs the SCTransform function from Seurat

  3. “harmony” uses the Bioconductor package Harmony. This may be better for larger datasets given the runtime and required memory of anchors and sct.

#Running the VDJ_GEX_matrix function
vgm <- VDJ_GEX_matrix(GEX.out.directory.list = GEX.out.directory.list,
                      GEX.integrate = T, 
                      integration.method = "scale.data") #Default

BiocManager::install("harmony")

vgm <- VDJ_GEX_matrix(GEX.out.directory.list = GEX.out.directory.list,
                      GEX.integrate = T, 
                      integration.method = "harmony")

3.1.3 GEX clustering optimisation parameters

Indipendently of integration method, GEX data is scaled and variable features and PCA dimensions are used to calculate low dimensional embedding Three parameters control these:

#Running the VDJ_GEX_matrix function
vgm <- VDJ_GEX_matrix(GEX.out.directory.list = GEX.out.directory.list,
                      GEX.integrate = T, 
                      integration.method = "scale.data", #Default
                      norm.scale.factor = 10000, #passed to Seurat::NormalizeData
                      n.variable.features = 2000, #Passed to Seurat::FindVariableFeatures
                      cluster.resolution = 0.5, #Passed to Seurat::FindClusters
                      neighbor.dim = c(1:10), #Passed to Seurat::FindNeighbors
                      mds.dim = c(1:10)) #Passed to Seurat::RunTSNE and Seurat::RunUMAP

3.1.4 Input a processed Seurat object

The VGM offers simple and flexible GEX processing, but remains one of many options for GEX processing. We therefore made it possible to use the VDJ processing and VDJ-GEX integration capabilities of the VGM function with an already processed Seurat object.

For a Seurat object to be compatible as input, it must contain two metadata columns: 1. sample_id with sample ids from s1,s2,s3 to sn of character or factor class. These must be in the same order as VDJ.out.directory list elements. 2. group_id of character class.

This input Seurat object will not be processed concerning normalisation or dimensional embeddings. Nonetheless the following filtering operations are still available, as shown below

#Running the VDJ_GEX_matrix function
vgm <- VDJ_GEX_matrix(Seurat.in = preprocessed_GEX, 
                      exclude.GEX.not.in.VDJ = F, 
                      integrate.GEX.to.VDJ = T, 
                      integrate.VDJ.to.GEX = T, 
                      filter.overlapping.barcodes.GEX = T, 
                      exclude.on.cell.state.markers = c("CD3E+")
                      GEX.integrate = T) 

3.2.1 Feature Barcode import

Platypus supports feature barcode technology (also referred to as hashing barcodes), but at this time does not support CITE-seq data. (This is on the Platypus-Team To-Do list)

Feature Barcode (FB) data may be imported in two different modes, depending on Cellranger processing proceedures.

  1. FB data processed indipendently of GEX data via Cellranger count will yield an output folder structure which is identical to GEX. These output directories can be fed into the VGM using the FB.out.directory.list

  2. FB data can also be processed in combination with GEX using Cellranger multi and aggr. This yields a single folder structure with both GEX and FB matrices contained within the same output files. In this case, the function will determine the input type (GEX of FB) of each matrix based on the numbers of Features. Any matrix below 100 features is regarded FB and every matrix above that as GEX.

FB.out.directory.list <-  
        list("~path_to_CellrangerCount_outs_directory",                   
             "~path_to_CellrangerCount_outs_directory")

#Running with separate FB input
vgm <- VDJ_GEX_matrix(GEX.out.directory.list = GEX.out.directory.list,
                      FB.out.directory.list = FB.out.directory.list) 

#Running with FB GEX combined input
vgm <- VDJ_GEX_matrix(GEX.out.directory.list = GEX.cellranger.aggr.out.directory.list)

3.2.2 Feature Barcode assignment

The concept of feature barcodes lies in the attribution of a certain sample or group barcode to a cell given the number of sequenced counts of that barcode. While this does work well in most cases, FB data can be noisy and a subset of cells may be difficult to confidently attribute to a single barcode. The VGM assignes FBs to cells by two criteria:

  1. FB.count.threshold determines how many counts for any FB are neccessary to be considered. This defaults to 10. For example, in case a cell has counts < 10 for all FBs, no single FB will be assigned (Function returns “Not assignable”).

  2. FB.ratio.threshold determines the minimum ratio between the most frequent and second most frequent FB for the most frequent to be confidently assigned. This defaults to 2. For clarity we can consider the following example

barcode FB-1 FB-2 FB-3
Cell 1 3 4 9
Cell 2 1 32 43
Cell 3 100 1 13

For Cell 1: 9/4 > 2, so FB-3 meets the FB.ratio.threshold. But: 9 < 10 so FB-3 does not meet the FB.count.threshold. For this cell the function returns “Not assignable”

For Cell 2: 43 and 32 > 10 so both FB-2 and FB-3 meet the FB.count.threshold. But: 43/32 < 2 so FB-3 does not meet the FB.ratio.threshold. Again the function returns “Not assignable”

For Cell 3: 100 > 10 and 100/32 > 2 so FB-1 meets both criteria. The function returns “FB-1”

Tweaking these parameters can help to make barcode assignments more inclusive, but also more susceptible to false assignments.

As a QC, we recommend verifying that variability in FB coverage across libraries and samples is consistent and that FB assignments match with expected numbers from e.g. pre-sorting by FACS.

#Running with separate FB input
vgm <- VDJ_GEX_matrix(GEX.out.directory.list = GEX.out.directory.list,
                      FB.out.directory.list = ,
                      FB.count.threshold = 10, 
                      FB.ratio.threshold = 2) 

#Running with FB GEX combined input
vgm <- VDJ_GEX_matrix(GEX.out.directory.list = GEX.cellranger.aggr.out.directory.list)

3.2.3 Feature Barcode filtering

In many cases, hashing barcodes are being combined with CITE-seq or other surface barcodes. For processing FB barcodes, all other barcodes need to be filtered out. For this the VGM allows excluding Feature barcodes by their names and a regex expression. In the example below we are filtering out all FBs that have “CITE” or “TetTCR” in their name.

#Running with separate FB input
vgm <- VDJ_GEX_matrix(GEX.out.directory.list = GEX.out.directory.list,
                      FB.out.directory.list = , 
                      FB.exclude.pattern = "(CITE|TetTCR)") 

#Running with FB GEX combined input
vgm <- VDJ_GEX_matrix(GEX.out.directory.list = GEX.cellranger.aggr.out.directory.list)

4.1. VDJ

The VGM allows to reformat and merge several dataframes from Cellranger outputs and additionally return aligned and trimmed receptor sequences.

#Basic run with VDJ only
vgm <- VDJ_GEX_matrix(VDJ.out.directory.list = VDJ.out.directory.list) 

4.1.1 Accomodating cells with aberrant chain numbers

Due to stochastical sampling, inter-cellular mRNA and biological peculiarities, a single cell barcode may be associated with one, two or more TCR or BCR contigs. A classical cell contains 1VDJ and 1VJ chain. Cells with only one chain are frequent, while chains with more than 2 are rare.

The VGM function and format is fully compatible with any combination of chains, without the need for cell filtering. Fields attributed to a missing chain will contain and empty string (““). In fields which contain information about 2 chains (e.g. VDJ_cdr3_aa in a cell with 2VDJ chains) different chains are separated by”;”

For filtering purposes two numeric columns containing the number of chains are included

head(subset(vgm[[1]], Nr_of_VJ_chains == 0))
##                                 barcode sample_id group_id clonotype_id_10x
## s1_CCGTACTGTCCGACGT s1_CCGTACTGTCCGACGT        s1        1      clonotype18
## s1_CGAGCCACAATGTAAG s1_CGAGCCACAATGTAAG        s1        1      clonotype44
## s1_GAAACTCAGGAATCGC s1_GAAACTCAGGAATCGC        s1        1      clonotype35
## s1_GTACTTTAGAGGGATA s1_GTACTTTAGAGGGATA        s1        1      clonotype11
## s1_GTAGGCCGTCCCGACA s1_GTAGGCCGTCCCGACA        s1        1      clonotype17
## s1_GTAGTCACAGTAAGAT s1_GTAGTCACAGTAAGAT        s1        1      clonotype20
##                     clonotype_id clonotype_frequency celltype Nr_of_VDJ_chains
## s1_CCGTACTGTCCGACGT  clonotype18                   1   B cell                1
## s1_CGAGCCACAATGTAAG  clonotype44                   1   B cell                1
## s1_GAAACTCAGGAATCGC  clonotype35                   1   B cell                1
## s1_GTACTTTAGAGGGATA  clonotype11                   1   B cell                1
## s1_GTAGGCCGTCCCGACA  clonotype17                   1   B cell                1
## s1_GTAGTCACAGTAAGAT  clonotype20                   1   B cell                1
##                     Nr_of_VJ_chains    VDJ_cdr3s_aa VJ_cdr3s_aa
## s1_CCGTACTGTCCGACGT               0   CARRNHPYYFDYW            
## s1_CGAGCCACAATGTAAG               0 CARETAQVPYYFDYW            
## s1_GAAACTCAGGAATCGC               0  CAIGHYYGSSSDVW            
## s1_GTACTTTAGAGGGATA               0     CALYGSSYDYW            
## s1_GTAGGCCGTCCCGACA               0    CVNGIYYYFDYW            
## s1_GTAGTCACAGTAAGAT               0    CARDSSGWFAYW            
##                                                      VDJ_cdr3s_nt VJ_cdr3s_nt
## s1_CCGTACTGTCCGACGT       TGTGCAAGACGGAACCACCCCTACTACTTTGACTACTGG            
## s1_CGAGCCACAATGTAAG TGTGCAAGAGAGACAGCTCAGGTTCCGTACTACTTTGACTACTGG            
## s1_GAAACTCAGGAATCGC    TGTGCAATAGGGCATTACTACGGTAGTAGCTCCGATGTCTGG            
## s1_GTACTTTAGAGGGATA             TGTGCCCTCTACGGTAGTAGCTACGACTACTGG            
## s1_GTAGGCCGTCCCGACA          TGTGTAAATGGGATTTATTACTACTTTGACTACTGG            
## s1_GTAGTCACAGTAAGAT          TGTGCAAGAGACAGCTCAGGCTGGTTTGCTTACTGG            
##                     VDJ_umis VJ_umis            VDJ_chain_contig
## s1_CCGTACTGTCCGACGT        2         CCGTACTGTCCGACGT-1_contig_1
## s1_CGAGCCACAATGTAAG       17         CGAGCCACAATGTAAG-1_contig_1
## s1_GAAACTCAGGAATCGC       12         GAAACTCAGGAATCGC-1_contig_1
## s1_GTACTTTAGAGGGATA       17         GTACTTTAGAGGGATA-1_contig_1
## s1_GTAGGCCGTCCCGACA       13         GTAGGCCGTCCCGACA-1_contig_1
## s1_GTAGTCACAGTAAGAT       14         GTAGTCACAGTAAGAT-1_contig_1
##                     VJ_chain_contig VDJ_chain VJ_chain VDJ_vgene VJ_vgene
## s1_CCGTACTGTCCGACGT                       IGH            IGHV4-1         
## s1_CGAGCCACAATGTAAG                       IGH           IGHV1-26         
## s1_GAAACTCAGGAATCGC                       IGH           IGHV1-74         
## s1_GTACTTTAGAGGGATA                       IGH            IGHV2-3         
## s1_GTAGGCCGTCCCGACA                       IGH            IGHV9-1         
## s1_GTAGTCACAGTAAGAT                       IGH           IGHV1-39         
##                     VDJ_dgene VDJ_jgene VJ_jgene VDJ_cgene VJ_cgene
## s1_CCGTACTGTCCGACGT               IGHJ2               IGHM         
## s1_CGAGCCACAATGTAAG   IGHD3-2     IGHJ2               IGHM         
## s1_GAAACTCAGGAATCGC   IGHD1-1     IGHJ1               IGHM         
## s1_GTACTTTAGAGGGATA               IGHJ2               IGHM         
## s1_GTAGGCCGTCCCGACA               IGHJ2               IGHM         
## s1_GTAGTCACAGTAAGAT   IGHD3-2     IGHJ3               IGHM         
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    VDJ_sequence_nt_raw
## s1_CCGTACTGTCCGACGT                                       GAAGCAAAGGGGATCAGCCCGAGATTCTCATTCAGTGATCAACACTGAACACACATCCCTTACCATGGATTTTGGGCTGATTTTTTTTATTGTTGCTCTTTTAAAAGGGGTCCAGTGTGAGGTGAAGCTTCTCCAGTCTGGAGGTGGCCTGGTGCAGCCTGGAGGATCCCTGAAACTCTCCTGTGCAGCCTCAGGAATCGATTTTAGTAGATACTGGATGAGTTGGGTTCGGCGGGCTCCAGGGAAAGGACTAGAATGGATTGGAGAAATTAATCCAGATAGCAGTACAATAAACTATGCACCATCTCTAAAGGATAAATTCATCATCTCCAGAGACAACGCCAAAAATACGCTGTACCTGCAAATGAGCAAAGTGAGATCTGAGGACACAGCCCTTTATTACTGTGCAAGACGGAACCACCCCTACTACTTTGACTACTGGGGCCAAGGCACCACTCTCACAGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA
## s1_CGAGCCACAATGTAAG                         TTGGGGACCCCTGAAAACAACATATGTACAATGTCCTCACCACAGACACTGAACACACTGACTCTAACCATGGGATGGAGCTGGATCTTTCTCTTTCTCCTGTCAGGAACTGCAGGTGTCCTCTCTGAGGTCCAGCTGCAACAATCTGGACCTGAGCTGGTGAAGCCTGGGGCTTCAGTGAAGATATCCTGTAAGGCTTCTGGATACACGTTCACTGACTACTACATGAACTGGGTGAAGCAGAGCCATGGAAAGAGCCTTGAGTGGATTGGAGATATTAATCCTAACAATGGTGGTACTAGCTACAACCAGAAGTTCAAGGGCAAGGCCACATTGACTGTAGACAAGTCCTCCAGCACAGCCTACATGGAGCTCCGCAGCCTGACATCTGAGGACTCTGCAGTCTATTACTGTGCAAGAGAGACAGCTCAGGTTCCGTACTACTTTGACTACTGGGGCCAAGGCACCACTCTCACAGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA
## s1_GAAACTCAGGAATCGC GTTACTCTGGAATCGCGTCAGACGTGTTTCTTTTTTGGGGAGAAAAACATGAGATCACTGTTCTCTCTACAGTTACTGAGCACACAGGACCTCACCATGAGATGGAGCTGTATCATCCTCTTCTTGGTAGCAACAGCTACAGGTGTCCACTCCCAGGTCCAACTGCAGCAGCCTGGGGCTGAACTGGTGAAGCCTGGGGCTTCAGTGAAGGTGTCCTGCAAGGCTTCTGGCTACACCTTCACCAGCTACTGGATGCACTGGGTGAAGCAGAGGCCTGGCCAAGGCCTTGAGTGGATTGGAAGGATTCATCCTTCTGATAGTGATACTAACTACAATCAAAAGTTCAAGGGCAAGGCCACATTGACTGTAGACAAATCCTCCAGCACAGCCTACATGCAGCTCAGCAGCCTGACATCTGAGGACTCTGCGGTCTATTACTGTGCAATAGGGCATTACTACGGTAGTAGCTCCGATGTCTGGGGCACAGGGACCACGGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA
## s1_GTACTTTAGAGGGATA                                                         CTAAAGGGGTTCTTATCTGGGGATCCTCTTCTCATAGAGCCTCCATCAGACCATGGCTGTCCTGGCACTGCTCCTCTGCCTGGTGACATTCCCAAGCTGTGTCCTGTCCCAGGTGCAGCTGAAGGAGTCAGGACCTGGCCTGGTGGCGCCCTCACAGAGCCTGTCCATCACATGCACTGTCTCAGGGTTCTCATTAACCAGCTATGGTGTAAGCTGGGTTCGCCAGCCTCCAGGAAAGGGTCTGGAGTGGCTGGGAGTAATATGGGGTGACGGGAGCACAAATTATCATTCAGCTCTCATATCCAGACTGAGCATCAGCAAGGATAACTCCAAGAGCCAAGTTTTCTTAAAACTGAACAGTCTGCAAACTGATGACACAGCCACGTACTACTGTGCCCTCTACGGTAGTAGCTACGACTACTGGGGCCAAGGCACCACTCTCACAGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA
## s1_GTAGGCCGTCCCGACA                                          TTTTGGGGAAGGGAGTGACCAGTTAGTCTTAAGGCACCACTGAGCCCAAGTCTTAGACATCATGGATTGGGTGTGGACCTTGCTATTCCTGATAGCAGCTGCCCAAAGTGCCCAAGCACAGATCCAGTTGGTGCAGTCTGGACCTGAGCTGAAGAAGCCTGGAGAGACAGTCAAGATCTCCTGCAAGGCTTCTGGGTATACCTTCACAGAATATCCAATGCACTGGGTGAAGCAGGCTCCAGGAAAGGGTTTCAAGTGGATGGGCATGATATACACCGACACTGGAGAGCCAACATATGCTGAAGAGTTCAAGGGACGGTTTGCCTTCTCTTTGGAGACCTCTGCCAGCACTGCCTATTTGCAGATCAACAACCTCAAAAATGAGGACACGGCTACATATTTCTGTGTAAATGGGATTTATTACTACTTTGACTACTGGGGCCAAGGCACCACTCTCACAGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA
## s1_GTAGTCACAGTAAGAT                                                                        ACTTATTTGGGGGAAGACACTGACTCTAACCATGGGATGGAGCTGGATCTTTCTCTTCCTCCTCTCAGGAACTGCAGGTGTCCACTCTGAGTTCCAGCTGCAGCAGTCTGGACCTGAGCTGGTGAAGCCTGGCGCTTCAGTGAAGATATCCTGCAAGGCTTCTGGTTACTCATTCACTGACTACAACATGAACTGGGTGAAGCAGAGCAATGGAAAGAGCCTTGAGTGGATTGGAGTAATTAATCCTAACTATGGTACTACTAGCTACAATCAGAAGTTCAAGGGCAAGGCCACATTGACTGTAGACCAATCTTCCAGCACAGCCTACATGCAGCTCAACAGCCTGACATCTGAGGACTCTGCAGTCTATTACTGTGCAAGAGACAGCTCAGGCTGGTTTGCTTACTGGGGCCAAGGGACTCTGGTCACTGTCTCTGCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA
##                     VJ_sequence_nt_raw VDJ_sequence_nt_trimmed
## s1_CCGTACTGTCCGACGT                                           
## s1_CGAGCCACAATGTAAG                                           
## s1_GAAACTCAGGAATCGC                                           
## s1_GTACTTTAGAGGGATA                                           
## s1_GTAGGCCGTCCCGACA                                           
## s1_GTAGTCACAGTAAGAT                                           
##                     VJ_sequence_nt_trimmed VDJ_sequence_aa VJ_sequence_aa
## s1_CCGTACTGTCCGACGT                                                      
## s1_CGAGCCACAATGTAAG                                                      
## s1_GAAACTCAGGAATCGC                                                      
## s1_GTACTTTAGAGGGATA                                                      
## s1_GTAGGCCGTCCCGACA                                                      
## s1_GTAGTCACAGTAAGAT                                                      
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            VDJ_raw_ref
## s1_CCGTACTGTCCGACGT GAAGCAAAGGGGATCAGCCCGAGATTCTCATTCAGTGATCAACACTGAACACACATCCCTTACCATGGATTTTGGGCTGATTTTTTTTATTGTTGCTCTTTTAAAAGGGGTCCAGTGTGAGGTGAAGCTTCTCCAGTCTGGAGGTGGCCTGGTGCAGCCTGGAGGATCCCTGAAACTCTCCTGTGCAGCCTCAGGAATCGATTTTAGTAGATACTGGATGAGTTGGGTTCGGCGGGCTCCAGGGAAAGGACTAGAATGGATTGGAGAAATTAATCCAGATAGCAGTACAATAAACTATGCACCATCTCTAAAGGATAAATTCATCATCTCCAGAGACAACGCCAAAAATACGCTGTACCTGCAAATGAGCAAAGTGAGATCTGAGGACACAGCCCTTTATTACTGTGCAAGACCACTACTTTGACTACTGGGGCCAAGGCACCACTCTCACAGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT
## s1_CGAGCCACAATGTAAG                                                ATGGGATGGAGCTGGATCTTTCTCTTTCTCCTGTCAGGAACTGCAGGTGTCCTCTCTGAGGTCCAGCTGCAACAATCTGGACCTGAGCTGGTGAAGCCTGGGGCTTCAGTGAAGATATCCTGTAAGGCTTCTGGATACACGTTCACTGACTACTACATGAACTGGGTGAAGCAGAGCCATGGAAAGAGCCTTGAGTGGATTGGAGATATTAATCCTAACAATGGTGGTACTAGCTACAACCAGAAGTTCAAGGGCAAGGCCACATTGACTGTAGACAAGTCCTCCAGCACAGCCTACATGGAGCTCCGCAGCCTGACATCTGAGGACTCTGCAGTCTATTACTGTGCAAGAAGACAGCTCAGGCTACACTACTTTGACTACTGGGGCCAAGGCACCACTCTCACAGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT
## s1_GAAACTCAGGAATCGC                                    ATGAGATGGAGCTGTATCATCCTCTTCTTGGTAGCAACAGCTACAGGTGTCCACTCCCAGGTCCAACTGCAGCAGCCTGGGGCTGAACTGGTGAAGCCTGGGGCTTCAGTGAAGGTGTCCTGCAAGGCTTCTGGCTACACCTTCACCAGCTACTGGATGCACTGGGTGAAGCAGAGGCCTGGCCAAGGCCTTGAGTGGATTGGAAGGATTCATCCTTCTGATAGTGATACTAACTACAATCAAAAGTTCAAGGGCAAGGCCACATTGACTGTAGACAAATCCTCCAGCACAGCCTACATGCAGCTCAGCAGCCTGACATCTGAGGACTCTGCGGTCTATTACTGTGCAATATTTATTACTACGGTAGTAGCTACCTACTGGTACTTCGATGTCTGGGGCACAGGGACCACGGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT
## s1_GTACTTTAGAGGGATA                                                                    ATGGCTGTCCTGGCACTGCTCCTCTGCCTGGTGACATTCCCAAGCTGTGTCCTGTCCCAGGTGCAGCTGAAGGAGTCAGGACCTGGCCTGGTGGCGCCCTCACAGAGCCTGTCCATCACATGCACTGTCTCAGGGTTCTCATTAACCAGCTATGGTGTAAGCTGGGTTCGCCAGCCTCCAGGAAAGGGTCTGGAGTGGCTGGGAGTAATATGGGGTGACGGGAGCACAAATTATCATTCAGCTCTCATATCCAGACTGAGCATCAGCAAGGATAACTCCAAGAGCCAAGTTTTCTTAAAACTGAACAGTCTGCAAACTGATGACACAGCCACGTACTACTGTGCCAAACTACTTTGACTACTGGGGCCAAGGCACCACTCTCACAGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT
## s1_GTAGGCCGTCCCGACA                                                                ATGGATTGGGTGTGGACCTTGCTATTCCTGATAGCAGCTGCCCAAAGTGCCCAAGCACAGATCCAGTTGGTGCAGTCTGGACCTGAGCTGAAGAAGCCTGGAGAGACAGTCAAGATCTCCTGCAAGGCTTCTGGGTATACCTTCACAGAATATCCAATGCACTGGGTGAAGCAGGCTCCAGGAAAGGGTTTCAAGTGGATGGGCATGATATACACCGACACTGGAGAGCCAACATATGCTGAAGAGTTCAAGGGACGGTTTGCCTTCTCTTTGGAGACCTCTGCCAGCACTGCCTATTTGCAGATCAACAACCTCAAAAATGAGGACACGGCTACATATTTCTGTGTAAGAACTACTTTGACTACTGGGGCCAAGGCACCACTCTCACAGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT
## s1_GTAGTCACAGTAAGAT                                                ATGGGATGGAGCTGGATCTTTCTCTTCCTCCTCTCAGGAACTGCAGGTGTCCACTCTGAGTTCCAGCTGCAGCAGTCTGGACCTGAGCTGGTGAAGCCTGGCGCTTCAGTGAAGATATCCTGCAAGGCTTCTGGTTACTCATTCACTGACTACAACATGAACTGGGTGAAGCAGAGCAATGGAAAGAGCCTTGAGTGGATTGGAGTAATTAATCCTAACTATGGTACTACTAGCTACAATCAGAAGTTCAAGGGCAAGGCCACATTGACTGTAGACCAATCTTCCAGCACAGCCTACATGCAGCTCAACAGCCTGACATCTGAGGACTCTGCAGTCTATTACTGTGCAAGAAGACAGCTCAGGCTACCCTGGTTTGCTTACTGGGGCCAAGGGACTCTGGTCACTGTCTCTGCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT
##                     VJ_raw_ref VDJ_trimmed_ref VJ_trimmed_ref
## s1_CCGTACTGTCCGACGT                                          
## s1_CGAGCCACAATGTAAG                                          
## s1_GAAACTCAGGAATCGC                                          
## s1_GTACTTTAGAGGGATA                                          
## s1_GTAGGCCGTCCCGACA                                          
## s1_GTAGTCACAGTAAGAT                                          
##                         VDJ_raw_consensus_id VJ_raw_consensus_id
## s1_CCGTACTGTCCGACGT clonotype18_concat_ref_1                    
## s1_CGAGCCACAATGTAAG clonotype44_concat_ref_1                    
## s1_GAAACTCAGGAATCGC clonotype35_concat_ref_1                    
## s1_GTACTTTAGAGGGATA clonotype11_concat_ref_1                    
## s1_GTAGGCCGTCCCGACA clonotype17_concat_ref_1                    
## s1_GTAGTCACAGTAAGAT clonotype20_concat_ref_1                    
##                         orig_barcode specifity affinity     batches
## s1_CCGTACTGTCCGACGT CCGTACTGTCCGACGT        NA       NA Unspecified
## s1_CGAGCCACAATGTAAG CGAGCCACAATGTAAG        NA       NA Unspecified
## s1_GAAACTCAGGAATCGC GAAACTCAGGAATCGC        NA       NA Unspecified
## s1_GTACTTTAGAGGGATA GTACTTTAGAGGGATA        NA       NA Unspecified
## s1_GTAGGCCGTCCCGACA GTAGGCCGTCCCGACA        NA       NA Unspecified
## s1_GTAGTCACAGTAAGAT GTAGTCACAGTAAGAT        NA       NA Unspecified
head(subset(vgm[[1]], Nr_of_VJ_chains == 2))
##                                 barcode sample_id group_id clonotype_id_10x
## s1_CACCTTGCAAACGCGA s1_CACCTTGCAAACGCGA        s1        1      clonotype28
## s1_CGATGGCAGGGTATCG s1_CGATGGCAGGGTATCG        s1        1       clonotype2
## s1_CTGATAGGTTCTGTTT s1_CTGATAGGTTCTGTTT        s1        1      clonotype57
## s1_GCTGCTTTCTCTGAGA s1_GCTGCTTTCTCTGAGA        s1        1      clonotype22
## s1_GTCACGGCACGCATCG s1_GTCACGGCACGCATCG        s1        1      clonotype60
## s1_TGACTAGTCGCCTGTT s1_TGACTAGTCGCCTGTT        s1        1      clonotype52
##                     clonotype_id clonotype_frequency celltype Nr_of_VDJ_chains
## s1_CACCTTGCAAACGCGA  clonotype28                   1   B cell                1
## s1_CGATGGCAGGGTATCG   clonotype2                   2   B cell                1
## s1_CTGATAGGTTCTGTTT  clonotype57                   1   B cell                1
## s1_GCTGCTTTCTCTGAGA  clonotype22                   1   B cell                2
## s1_GTCACGGCACGCATCG  clonotype60                   1   B cell                1
## s1_TGACTAGTCGCCTGTT  clonotype52                   1   B cell                1
##                     Nr_of_VJ_chains                VDJ_cdr3s_aa
## s1_CACCTTGCAAACGCGA               2               CARGAPNWYFDVW
## s1_CGATGGCAGGGTATCG               2                 CARWFAWFAYW
## s1_CTGATAGGTTCTGTTT               2           CAKNYGSSYSYWYFDVW
## s1_GCTGCTTTCTCTGAGA               2 CTGDYAMDYW;CTLITTVVAKDAMDYW
## s1_GTCACGGCACGCATCG               2           CARRPYYSNSHYAMDYW
## s1_TGACTAGTCGCCTGTT               2            CARDYYGSSLYYFDYW
##                                     VJ_cdr3s_aa
## s1_CACCTTGCAAACGCGA     CALWYSTHYVF;CWQGTHFPQTF
## s1_CGATGGCAGGGTATCG     CQNDYSYPLTF;CQQYSSYPYTF
## s1_CTGATAGGTTCTGTTT       CKQSYNLYTF;CQQSNSWLTF
## s1_GCTGCTTTCTCTGAGA      CQQWSSNPLTF;CWQGTHFPPF
## s1_GTCACGGCACGCATCG CALWYSNHLVF;CGVGDTIKEQFVYVF
## s1_TGACTAGTCGCCTGTT      CQQYWSTRTF;CLQYDNLLYTF
##                                                                                        VDJ_cdr3s_nt
## s1_CACCTTGCAAACGCGA                                         TGTGCAAGAGGGGCTCCCAACTGGTACTTCGATGTCTGG
## s1_CGATGGCAGGGTATCG                                               TGTGCAAGATGGTTTGCCTGGTTTGCTTACTGG
## s1_CTGATAGGTTCTGTTT                             TGTGCAAAAAACTATGGTAGTAGCTACAGCTACTGGTACTTCGATGTCTGG
## s1_GCTGCTTTCTCTGAGA TGTACCGGGGATTACGCTATGGACTACTGG;TGTACTCTCATTACTACGGTAGTAGCCAAGGATGCTATGGACTACTGG
## s1_GTCACGGCACGCATCG                             TGTGCAAGAAGGCCCTACTATAGTAACTCCCACTATGCTATGGACTACTGG
## s1_TGACTAGTCGCCTGTT                                TGTGCTAGAGATTACTACGGTAGTAGCTTGTACTACTTTGACTACTGG
##                                                                                         VJ_cdr3s_nt
## s1_CACCTTGCAAACGCGA             TGTGCTCTATGGTACAGCACCCATTATGTTTTC;TGCTGGCAAGGTACACATTTTCCTCAGACGTTC
## s1_CGATGGCAGGGTATCG             TGTCAGAATGATTATAGTTATCCGCTCACGTTC;TGTCAGCAATATAGCAGCTATCCGTACACGTTC
## s1_CTGATAGGTTCTGTTT                   TGCAAGCAATCTTATAATCTGTACACGTTC;TGTCAACAGAGTAACAGCTGGCTCACGTTC
## s1_GCTGCTTTCTCTGAGA                TGCCAGCAGTGGAGTAGTAACCCGCTCACGTTC;TGCTGGCAAGGTACACATTTTCCTCCGTTC
## s1_GTCACGGCACGCATCG TGTGCTCTATGGTACAGCAACCATTTGGTGTTC;TGTGGTGTGGGTGATACAATTAAGGAACAATTTGTGTATGTTTTC
## s1_TGACTAGTCGCCTGTT                TGTCAACAGTATTGGAGTACTCGGACGTTC;TGTCTACAGTATGATAATCTTCTGTACACGTTC
##                     VDJ_umis VJ_umis
## s1_CACCTTGCAAACGCGA       14   17;21
## s1_CGATGGCAGGGTATCG        2     5;5
## s1_CTGATAGGTTCTGTTT        7   23;15
## s1_GCTGCTTTCTCTGAGA    20;14   24;45
## s1_GTCACGGCACGCATCG       10   60;29
## s1_TGACTAGTCGCCTGTT       22   34;23
##                                                            VDJ_chain_contig
## s1_CACCTTGCAAACGCGA                             CACCTTGCAAACGCGA-1_contig_1
## s1_CGATGGCAGGGTATCG                             CGATGGCAGGGTATCG-1_contig_3
## s1_CTGATAGGTTCTGTTT                             CTGATAGGTTCTGTTT-1_contig_3
## s1_GCTGCTTTCTCTGAGA GCTGCTTTCTCTGAGA-1_contig_2;GCTGCTTTCTCTGAGA-1_contig_3
## s1_GTCACGGCACGCATCG                             GTCACGGCACGCATCG-1_contig_1
## s1_TGACTAGTCGCCTGTT                             TGACTAGTCGCCTGTT-1_contig_3
##                                                             VJ_chain_contig
## s1_CACCTTGCAAACGCGA CACCTTGCAAACGCGA-1_contig_2;CACCTTGCAAACGCGA-1_contig_3
## s1_CGATGGCAGGGTATCG CGATGGCAGGGTATCG-1_contig_1;CGATGGCAGGGTATCG-1_contig_2
## s1_CTGATAGGTTCTGTTT CTGATAGGTTCTGTTT-1_contig_1;CTGATAGGTTCTGTTT-1_contig_2
## s1_GCTGCTTTCTCTGAGA GCTGCTTTCTCTGAGA-1_contig_1;GCTGCTTTCTCTGAGA-1_contig_4
## s1_GTCACGGCACGCATCG GTCACGGCACGCATCG-1_contig_2;GTCACGGCACGCATCG-1_contig_3
## s1_TGACTAGTCGCCTGTT TGACTAGTCGCCTGTT-1_contig_1;TGACTAGTCGCCTGTT-1_contig_2
##                     VDJ_chain VJ_chain        VDJ_vgene            VJ_vgene
## s1_CACCTTGCAAACGCGA       IGH  IGL;IGK         IGHV1-18     IGLV2;IGKV1-135
## s1_CGATGGCAGGGTATCG       IGH  IGK;IGK         IGHV1-26   IGKV8-19;IGKV6-23
## s1_CTGATAGGTTCTGTTT       IGH  IGK;IGK         IGHV1-80   IGKV8-21;IGKV5-43
## s1_GCTGCTTTCTCTGAGA   IGH;IGH  IGK;IGK IGHV6-6;IGHV14-4  IGKV4-59;IGKV1-135
## s1_GTCACGGCACGCATCG       IGH  IGL;IGL         IGHV1-76         IGLV1;IGLV3
## s1_TGACTAGTCGCCTGTT       IGH  IGK;IGK         IGHV14-2 IGKV13-84;IGKV19-93
##                     VDJ_dgene   VDJ_jgene    VJ_jgene VDJ_cgene    VJ_cgene
## s1_CACCTTGCAAACGCGA   IGHD4-1       IGHJ1 IGLJ2;IGKJ1      IGHM  IGLC2;IGKC
## s1_CGATGGCAGGGTATCG                 IGHJ3 IGKJ5;IGKJ2      IGHM   IGKC;IGKC
## s1_CTGATAGGTTCTGTTT                 IGHJ1 IGKJ2;IGKJ5      IGHM   IGKC;IGKC
## s1_GCTGCTTTCTCTGAGA           IGHJ4;IGHJ4 IGKJ5;IGKJ2 IGHM;IGHM   IGKC;IGKC
## s1_GTCACGGCACGCATCG   IGHD2-5       IGHJ4 IGLJ1;IGLJ2      IGHM IGLC1;IGLC2
## s1_TGACTAGTCGCCTGTT                 IGHJ2 IGKJ1;IGKJ2      IGHM   IGKC;IGKC
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       VDJ_sequence_nt_raw
## s1_CACCTTGCAAACGCGA                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             TTTGGGGAACATATGTCCAATGTCCTCTCCACAGGCACTGAACACACTGACTCTAACCATGGGATGGAGCTGGATCTTTCTCCTCTTCCTGTCAGGAACTGCAGGTGTCCTCTCTGAGGTCCAGCTGCAACAGTCTGGACCTGAGCTGGTGAAGCCTGGGGCTTCAGTGAAGATACCCTGCAAGGCTTCTGGATACACATTCACTGACTACAACATGGACTGGGTGAAGCAGAGCCATGGAAAGAGCCTTGAGTGGATTGGAGATATTAATCCTAACAATGGTGGTACTATCTACAACCAGAAGTTCAAGGGCAAGGCCACATTGACTGTAGACAAGTCCTCCAGCACAGCCTACATGGAGCTCCGCAGCCTGACATCTGAGGACACTGCAGTCTATTACTGTGCAAGAGGGGCTCCCAACTGGTACTTCGATGTCTGGGGCACAGGGACCACGGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA
## s1_CGATGGCAGGGTATCG                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             TTATGGGGAATGTCCTCACCACAGACACTGAACACACTGACTCTAACCATGGGATGGAGCTGGATCTTTCTCTTTCTCCTGTCAGGAACTGCAGGTGTCCTCTCTGAGGTCCAGCTGCAACAATCTGGACCTGAGCTGGTGAAGCCTGGGGCTTCAGTGAAGATATCCTGTAAGGCTTCTGGATACACGTTCACTGACTACTACATGAACTGGGTGAAGCAGAGCCATGGAAAGAGCCTTGAGTGGATTGGAGATATTAATCCTAACAATGGTGGTACTAGCTACAACCAGAAGTTCAAGGGCAAGGCCACATTGACTGTAGACAAGTCCTCCAGCACAGCCTACATGGAGCTCCGCAGCCTGACATCTGAGGACTCTGCAGTCTATTACTGTGCAAGATGGTTTGCCTGGTTTGCTTACTGGGGCCAAGGGACTCTGGTCACTGTCTCTGCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA
## s1_CTGATAGGTTCTGTTT                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          TGGGGACAGTCCCTGAACACACTGACTCTAACCATGGAATGGCCTTTGATCTTTCTCTTCCTCCTGTCAGGAACTGCAGGTGTCCAATCCCAGGTTCAGCTGCAGCAGTCTGGGGCTGAGCTGGTGAAGCCTGGGGCCTCAGTGAAGATTTCCTGCAAAGCTTCTGGCTACGCATTCAGTAGCTACTGGATGAACTGGGTGAAGCAGAGGCCTGGAAAGGGTCTTGAGTGGATTGGACAGATTTATCCTGGAGATGGTGATACTAACTACAACGGAAAGTTCAAGGGCAAGGCCACACTGACTGCAGACAAATCCTCCAGCACAGCCTACATGCAGCTCAGCAGCCTGACCTCTGAGGACTCTGCGGTCTATTTCTGTGCAAAAAACTATGGTAGTAGCTACAGCTACTGGTACTTCGATGTCTGGGGCACAGGGACCACGGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA
## s1_GCTGCTTTCTCTGAGA TGGGGGACAGATGCACAAACCTGGACTCACAAGTTTTTCTCTTCAGTGACAAACACAGACATAGAACATTCACGATGTACTTGGGACTGAACTGTGTATTCATAGTTTTTCTCTTAAAAGGTGTCCAGAGTGAAGTGAAGCTTGAGGAGTCTGGAGGAGGCTTGGTGCAACCTGGAGGATCCATGAAACTCTCTTGTGCTGCCTCTGGATTCACTTTTAGTGACGCCTGGATGGACTGGGTCCGCCAGTCTCCAGAGAAGGGGCTTGAGTGGGTTGCTGAAATTAGAAACAAAGCTAATAATCATGCAACATACTATGCTGAGTCTGTGAAAGGGAGGTTCACCATCTCAAGAGATGATTCCAAAAGTAGTGTCTACCTGCAAATGAACAGCTTAAGAGCTGAAGACACTGGCATTTATTACTGTACCGGGGATTACGCTATGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA;TTATGGGGATGAACACTGTTTTCTCTACAGTCACTGAATCTCAATGTCCTTACAATGAAATGCAGCTGGGTCATCTTCTTCCTGATGGCAGTGGTTATAGGGGTCAATTCAGAGGTTCAGCTGCAGCAGTCTGGGGCTGAGCTTGTGAGGCCAGGGGCCTCAGTCAAGTTGTCCTGCACAGCTTCTGGCTTTAACATTAAAGACGACTATATGCACTGGGTGAAGCAGAGGCCTGAACAGGGCCTGGAGTGGATTGGATGGATTGATCCTGAGAATGGTGATACTGAATATGCCTCGAAGTTCCAGGGCAAGGCCACTATAACAGCAGACACATCCTCCAACACAGCCTACCTGCAGCTCAGCAGCCTGACATCTGAGGACACTGCCGTCTATTACTGTACTCTCATTACTACGGTAGTAGCCAAGGATGCTATGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA
## s1_GTCACGGCACGCATCG                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                TCTGTCACGGCACGCATCGTTTTATCCGTTTACTTTTATGGGGATGATCAGTGTCCTCTCTACACAGTCCCTGACGACACTGATTCTAACCATGGGATGGAGCTGGATCTTTTTCTTCCTCCTGTCAGGAACTGCAGGTGTCCACTGTCAGGTCCAGCTGAAGCAGTCTGGGGCTGAGCTGGTGAGGCCTGGGGCTTCAGTGAAGCTGTCCTGCAAGGCTTCTGGCTACACTTTCACTGACTACTATATAAACTGGGTGAAGCAGAGGCCTGGACAGGGACTTGAGTGGATTGCAAGGATTTATCCTGGAAGTGGTAATACTTACTACAATGAGAAGTTCAAGGGCAAGGCCACACTGACTGCAGAAAAATCCTCCAGCACTGCCTACATGCAGCTCAGCAGCCTGACATCTGAGGACTCTGCTGTCTATTTCTGTGCAAGAAGGCCCTACTATAGTAACTCCCACTATGCTATGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA
## s1_TGACTAGTCGCCTGTT                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       TATTTGGGGATGAACCCTGTCTTCTCTACAGCCACTGAATCTCAAGGTCCTTACAATGAAATGCAGCTGGATCATCTTCTTCCTGATGGCAGTGGTTACAGGGGTCAATTCAGAGGTTCAGCTGCAGCAGTCTGGGGCAGAGCTTGTGAAGCCAGGGGCCTCAGTCAAGTTGTCCTGCACAGCTTCTGGCTTCAACATTAAAGACTACTATATGCACTGGGTGAAGCAGAGGACTGAACAGGGCCTGGAGTGGATTGGAAGGATTGATCCTGAGGATGGTGAAACTAAATATGCCCCGAAATTCCAGGGCAAGGCCACTATAACAGCAGACACATCCTCCAACACAGCCTACCTGCAGCTCAGCAGCCTGACATCTGAGGACACTGCCGTCTATTACTGTGCTAGAGATTACTACGGTAGTAGCTTGTACTACTTTGACTACTGGGGCCAAGGCACCACTCTCACAGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             VJ_sequence_nt_raw
## s1_CACCTTGCAAACGCGA                             ACTCACCTTGCAAACGCGATCAACAGTTGTATCTTTTATGGGGGACCAATATTGAAAATAATAGACTTGGTTTGTGAATTATGGCCTGGACTTCACTTATACTCTCTCTCCTGGCTCTCTGCTCAGGAGCCAGTTCCCAGGCTGTTGTGACTCAGGAATCTGCACTCACCACATCACCTGGTGGAACAGTCATACTCACTTGTCGCTCAAGTACTGGGGCTGTTACAACTAGTAACTATGCCAACTGGGTCCAAGAAAAACCAGATCATTTATTCACTGGTCTAATAGGTGGTACCAGCAACCGAGCTCCAGGTGTTCCTGTCAGATTCTCAGGCTCCCTGATTGGAGACAAGGCTGCCCTCACCATCACAGGGGCACAGACTGAGGATGATGCAATGTATTTCTGTGCTCTATGGTACAGCACCCATTATGTTTTCGGCGGTGGAACCAAGGTCACTGTCCTAGGTCAGCCCAAGTCCACTCCCACTCTCACCGTGTTTCCACCTTCCTCTGAGGAGCTCAAGGAAAACAAAGCCACACTGGTGTGTCTGATTTCCAACTTTTCCCCGAGTGGTGTGACAGTGGCCTG;ACTGATCACTCTCCTATGTTCATTTCCTCAAAATGATGAGTCCTGCCCAGTTCCTGTTTCTGTTAGTGCTCTGGATTCGGGAAACCAACGGTGATGTTGTGATGACCCAGACTCCACTCACTTTGTCGGTTACCATTGGACAACCAGCCTCCATCTCTTGCAAGTCAAGTCAGAGCCTCTTAGATAGTGATGGAAAGACATATTTGAATTGGTTGTTACAGAGGCCAGGCCAGTCTCCAAAGCGCCTAATCTATCTGGTGTCTAAACTGGACTCTGGAGTCCCTGACAGGTTCACTGGCAGTGGATCAGGGACAGATTTCACACTGAAAATCAGCAGAGTGGAGGCTGAGGATTTGGGAGTTTATTATTGCTGGCAAGGTACACATTTTCCTCAGACGTTCGGTGGAGGCACCAAGCTGGAAATCAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC
## s1_CGATGGCAGGGTATCG                                                                                                               TTTATGGGGACATCTGAAAGGCAGGTGGAGCAAGATGGAATCACAGACTCAGGTCCTCATGTCCCTGCTGTTCTGGGTATCTGGTACCTGTGGGGACATTGTGATGACACAGTCTCCATCCTCCCTGACTGTGACAGCAGGAGAGAAGGTCACTATGAGCTGCAAGTCCAGTCAGAGTCTGTTAAACAGTGGAAATCAAAAGAACTACTTGACCTGGTACCAGCAGAAACCAGGGCAGCCTCCTAAACTGTTGATCTACTGGGCATCCACTAGGGAATCTGGGGTCCCTGATCGCTTCACAGGCAGTGGATCTGGAACAGATTTCACTCTCACCATCAGCAGTGTGCAGGCTGAAGACCTGGCAGTTTATTACTGTCAGAATGATTATAGTTATCCGCTCACGTTCGGTGCTGGGACCAAGCTGGAGCTGAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC;TTATGGGGAAATACATCAGACCAGCATGGGCATCAAGATGGAGACACATTCTCAGGTCTTTGTATACATGTTGCTGTGGTTGTCTGGTGTTGAAGGAGACATTGTGATGACCCAGTCTCACAAATTCATGTCCACATCAGTAGGAGACAGGGTCAGCATCACCTGCAAGGCCAGTCAGGATGTGGGTACTGCTGTAGCCTGGTATCAACAGAAACCAGGGCAATCTCCTAAACTACTGATTTACTGGGCATCCACCCGGCACACTGGAGTCCCTGATCGCTTCACAGGCAGTGGATCTGGGACAGATTTCACTCTCACCATTAGCAATGTGCAGTCTGAAGACTTGGCAGATTATTTCTGTCAGCAATATAGCAGCTATCCGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC
## s1_CTGATAGGTTCTGTTT                                                                                                                                        AGACAGGCAGTGGGAGCAAGATGGATTCACAGGCCCAGGTTCTTATATTGCTGCTGCTATGGGTATCTGGTACCTGTGGGGACATTGTGATGTCACAGTCTCCATCCTCCCTGGCTGTGTCAGCAGGAGAGAAGGTCACTATGAGCTGCAAATCCAGTCAGAGTCTGCTCAACAGTAGAACCCGAAAGAACTACTTGGCTTGGTACCAGCAGAAACCAGGGCAGTCTCCTAAACTGCTGATCTACTGGGCATCCACTAGGGAATCTGGGGTCCCTGATCGCTTCACAGGCAGTGGATCTGGGACAGATTTCACTCTCACCATCAGCAGTGTGCAGGCTGAAGACCTGGCAGTTTATTACTGCAAGCAATCTTATAATCTGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC;ATGAGCCACACAAACTCAGGGAAAGCTCGAAGATGGTTTTCACACCTCAGATACTTGGACTTATGCTTTTTTGGATTTCAGCCTCCAGAGGTGATATTGTGCTAACTCAGTCTCCAGCCACCCTGTCTGTGACTCCAGGAGATAGCGTCAGTCTTTCCTGCAGGGCCAGCCAAAGTATTAGCAACAACCTACACTGGTATCAACAAAAATCACATGAGTCTCCAAGGCTTCTCATCAAGTATGCTTCCCAGTCCATCTCTGGGATCCCCTCCAGGTTCAGTGGCAGTGGATCAGGGACAGATTTCACTCTCAGTATCAACAGTGTGGAGACTGAAGATTTTGGAATGTATTTCTGTCAACAGAGTAACAGCTGGCTCACGTTCGGTGCTGGGACCAAGCTGGAGCTGAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC
## s1_GCTGCTTTCTCTGAGA                                                                                           TTATGGGGAAAGTACTTATGAGAATAGCAGTAATTAGCTAGGGACCAAAATTCAAAGACAAAATGGATTTTCAAGTGCAGATTTTCAGCTTCCTGCTAATCAGTGCCTCAGTCATAATATCCAGAGGACAAATTGTTCTCACCCAGTCTCCAGCAATCATGTCTGCATCTCCAGGGGAGAAGGTCACCATGACCTGCAGTGCCAGCTCAAGTGTAAGTTACATGCACTGGTACCAGCAGAAGTCAGGCACCTCCCCCAAAAGATGGATTTATGACACATCCAAACTGGCTTCTGGAGTCCCTGCTCGCTTCAGTGGCAGTGGGTCTGGGACCTCTTACTCTCTCACAATCAGCAGCATGGAGGCTGAAGATGCTGCCACTTATTACTGCCAGCAGTGGAGTAGTAACCCGCTCACGTTCGGTGCTGGGACCAAGCTGGAGCTGAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC;ACTGATCACTCTCCTATGTTCATTTCCTCAAAATGATGAGTCCTGCCCAGTTCCTGTTTCTGTTAGTGCTCTGGATTCGGGAAACCAACGGTGATGTTGTGATGACCCAGACTCCACTCACTTTGTCGGTTACCATTGGACAACCAGCCTCCATCTCTTGCAAGTCAAGTCAGAGCCTCTTAGATAGTGATGGAAAGACATATTTGAATTGGTTGTTACAGAGGCCAGGCCAGTCTCCAAAGCGCCTAATCTATCTGGTGTCTAAACTGGACTCTGGAGTCCCTGACAGGTTCACTGGCAGTGGATCAGGGACAGATTTCACACTGAAAATCAGCAGAGTGGAGGCTGAGGATTTGGGAGTTTATTATTGCTGGCAAGGTACACATTTTCCTCCGTTCGGAGGGGGGACCAAGCTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC
## s1_GTCACGGCACGCATCG GGGGACCAATATTGAAAAGAATAGACCTGGTTTGTGAATTATGGCCTGGATTTCACTTATACTCTCTCTCCTGGCTCTCAGCTCAGGGGCCATTTCCCAGGCTGTTGTGACTCAGGAATCTGCACTCACCACATCACCTGGTGAAACAGTCACACTCACTTGTCGCTCAAGTACTGGGGCTGTTACAACTAGTAACTATGCCAACTGGGTCCAAGAAAAACCAGATCATTTATTCACTGGTCTAATAGGTGGTACCAACAACCGAGCTCCAGGTGTTCCTGCCAGATTCTCAGGCTCCCTGATTGGAGACAAGGCTGCCCTCACCATCACAGGGGCACAGACTGAGGATGAGGCAATATATTTCTGTGCTCTATGGTACAGCAACCATTTGGTGTTCGGTGGAGGAACCAAACTGACTGTCCTAGGCCAGCCCAAGTCTTCGCCATCAGTCACCCTGTTTCCACCTTCCTCTGAAGAGCTCGAGACTAACAAGGCCACACTGGTGTGTA;TCTGTCACGGCACGCATCGGACTATCAAATATCTTCTATGGGAGAGAGAACTACAACCTGTCTGTCTCAGCAGAGATCAGTAGTACCTGCATTATGGCCTGGACTCCTCTCTTCTTCTTCTTTGTTCTTCATTGCTCAGGTTCTTTCTCCCAACTTGTGCTCACTCAGTCATCTTCAGCCTCTTTCTCCCTGGGAGCCTCAGCAAAACTCACGTGCACCTTGAGTAGTCAGCACAGTACGTACACCATTGAATGGTATCAGCAACAGCCACTCAAGCCTCCTAAGTATGTGATGGAGCTTAAGAAAGATGGAAGCCACAGCACAGGTGATGGGATTCCTGATCGCTTCTCTGGATCCAGCTCTGGTGCTGATCGCTACCTTAGCATTTCCAACATCCAGCCTGAAGATGAAGCAATATACATCTGTGGTGTGGGTGATACAATTAAGGAACAATTTGTGTATGTTTTCGGCGGTGGAACCAAGGTCACTGTCCTAGGTCAGCCCAAGTCCACTCCCACTCTCACCGTGTTTCCACCTTCCTCTGAGGAGCTCAAGGAAAACAAAGCCACACTGGTGTGTCTGATTTCCAACTTTTCCCCGAGTGGTGTGACAGTGGCCTG
## s1_TGACTAGTCGCCTGTT                                                                                                           TGGGGAATGTCAGGTCACAGCAGAAACATGAAGTTTCCTTCTCAACTTCTGCTCTTACTGCTGTTTGGAATCCCAGGCATGATATGTGACATCCAGATGACACAATCTTCATCCTCCTTTTCTGTATCTCTAGGAGACAGAGTCACCATTACTTGCAAGGCAAGTGAGGACATATATAATCGGTTAGCCTGGTATCAGCAGAAACCAGGAAATGCTCCTAGGCTCTTAATATCTGGTGCAACCAGTTTGGAAACTGGGGTTCCTTCAAGATTCAGTGGCAGTGGATCTGGAAAGGATTACACTCTCAGCATTACCAGTCTTCAGACTGAAGATGTTGCTACTTATTACTGTCAACAGTATTGGAGTACTCGGACGTTCGGTGGAGGCACCAAGCTGGAAATCAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC;TGACTAGTCGCCTGTTCGTACTTCGTTTTATTATTTGGGAGTCATTCTTGGTCAGGAGACGTTGTAGAAATGAGACCGTCTATTCAGTTCCTGGGGCTCTTGTTGTTCTGGCTTCATGGTGCTCAGTGTGACATCCAGATGACACAGTCTCCATCCTCACTGTCTGCATCTCTGGGAGGCAAAGTCACCATCACTTGCAAGGCAAGCCAAGACATTAACAAGTATATAGCTTGGTACCAACACAAGCCTGGAAAAGGTCCTAGGCTGCTCATACATTACACATCTACATTACAGCCAGGCATCCCATCAAGGTTCAGTGGAAGTGGGTCTGGGAGAGATTATTCCTTCAGCATCAGCAACCTGGAGCCTGAAGATATTGCAACTTATTATTGTCTACAGTATGATAATCTTCTGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC
##                     VDJ_sequence_nt_trimmed VJ_sequence_nt_trimmed
## s1_CACCTTGCAAACGCGA                                               
## s1_CGATGGCAGGGTATCG                                               
## s1_CTGATAGGTTCTGTTT                                               
## s1_GCTGCTTTCTCTGAGA                                               
## s1_GTCACGGCACGCATCG                                               
## s1_TGACTAGTCGCCTGTT                                               
##                     VDJ_sequence_aa VJ_sequence_aa
## s1_CACCTTGCAAACGCGA                               
## s1_CGATGGCAGGGTATCG                               
## s1_CTGATAGGTTCTGTTT                               
## s1_GCTGCTTTCTCTGAGA                               
## s1_GTCACGGCACGCATCG                               
## s1_TGACTAGTCGCCTGTT                               
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    VDJ_raw_ref
## s1_CACCTTGCAAACGCGA        ATGGGATGGAGCTGGATCTTTCTCCTCTTCCTGTCAGGAACTGCAGGTGTCCTCTCTGAGGTCCAGCTGCAACAGTCTGGACCTGAGCTGGTGAAGCCTGGGGCTTCAGTGAAGATACCCTGCAAGGCTTCTGGATACACATTCACTGACTACAACATGGACTGGGTGAAGCAGAGCCATGGAAAGAGCCTTGAGTGGATTGGAGATATTAATCCTAACAATGGTGGTACTATCTACAACCAGAAGTTCAAGGGCAAGGCCACATTGACTGTAGACAAGTCCTCCAGCACAGCCTACATGGAGCTCCGCAGCCTGACATCTGAGGACACTGCAGTCTATTACTGTGCAAGACTAACTGGGACCTACTGGTACTTCGATGTCTGGGGCACAGGGACCACGGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT
## s1_CGATGGCAGGGTATCG                        ATGGGATGGAGCTGGATCTTTCTCTTTCTCCTGTCAGGAACTGCAGGTGTCCTCTCTGAGGTCCAGCTGCAACAATCTGGACCTGAGCTGGTGAAGCCTGGGGCTTCAGTGAAGATATCCTGTAAGGCTTCTGGATACACGTTCACTGACTACTACATGAACTGGGTGAAGCAGAGCCATGGAAAGAGCCTTGAGTGGATTGGAGATATTAATCCTAACAATGGTGGTACTAGCTACAACCAGAAGTTCAAGGGCAAGGCCACATTGACTGTAGACAAGTCCTCCAGCACAGCCTACATGGAGCTCCGCAGCCTGACATCTGAGGACTCTGCAGTCTATTACTGTGCAAGACCTGGTTTGCTTACTGGGGCCAAGGGACTCTGGTCACTGTCTCTGCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT
## s1_CTGATAGGTTCTGTTT                   ATGGAATGGCCTTTGATCTTTCTCTTCCTCCTGTCAGGAACTGCAGGTGTCCAATCCCAGGTTCAGCTGCAGCAGTCTGGGGCTGAGCTGGTGAAGCCTGGGGCCTCAGTGAAGATTTCCTGCAAAGCTTCTGGCTACGCATTCAGTAGCTACTGGATGAACTGGGTGAAGCAGAGGCCTGGAAAGGGTCTTGAGTGGATTGGACAGATTTATCCTGGAGATGGTGATACTAACTACAACGGAAAGTTCAAGGGCAAGGCCACACTGACTGCAGACAAATCCTCCAGCACAGCCTACATGCAGCTCAGCAGCCTGACCTCTGAGGACTCTGCGGTCTATTTCTGTGCAAGACTACTGGTACTTCGATGTCTGGGGCACAGGGACCACGGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT
## s1_GCTGCTTTCTCTGAGA            ATGTACTTGGGACTGAACTGTGTATTCATAGTTTTTCTCTTAAAAGGTGTCCAGAGTGAAGTGAAGCTTGAGGAGTCTGGAGGAGGCTTGGTGCAACCTGGAGGATCCATGAAACTCTCTTGTGCTGCCTCTGGATTCACTTTTAGTGACGCCTGGATGGACTGGGTCCGCCAGTCTCCAGAGAAGGGGCTTGAGTGGGTTGCTGAAATTAGAAACAAAGCTAATAATCATGCAACATACTATGCTGAGTCTGTGAAAGGGAGGTTCACCATCTCAAGAGATGATTCCAAAAGTAGTGTCTACCTGCAAATGAACAGCTTAAGAGCTGAAGACACTGGCATTTATTACTGTACCAGGATTACTATGCTATGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT
## s1_GTCACGGCACGCATCG ATGGGATGGAGCTGGATCTTTTTCTTCCTCCTGTCAGGAACTGCAGGTGTCCACTGTCAGGTCCAGCTGAAGCAGTCTGGGGCTGAGCTGGTGAGGCCTGGGGCTTCAGTGAAGCTGTCCTGCAAGGCTTCTGGCTACACTTTCACTGACTACTATATAAACTGGGTGAAGCAGAGGCCTGGACAGGGACTTGAGTGGATTGCAAGGATTTATCCTGGAAGTGGTAATACTTACTACAATGAGAAGTTCAAGGGCAAGGCCACACTGACTGCAGAAAAATCCTCCAGCACTGCCTACATGCAGCTCAGCAGCCTGACATCTGAGGACTCTGCTGTCTATTTCTGTGCAAGACCTACTATAGTAACTACATTACTATGCTATGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT
## s1_TGACTAGTCGCCTGTT                        ATGAAATGCAGCTGGATCATCTTCTTCCTGATGGCAGTGGTTACAGGGGTCAATTCAGAGGTTCAGCTGCAGCAGTCTGGGGCAGAGCTTGTGAAGCCAGGGGCCTCAGTCAAGTTGTCCTGCACAGCTTCTGGCTTCAACATTAAAGACTACTATATGCACTGGGTGAAGCAGAGGACTGAACAGGGCCTGGAGTGGATTGGAAGGATTGATCCTGAGGATGGTGAAACTAAATATGCCCCGAAATTCCAGGGCAAGGCCACTATAACAGCAGACACATCCTCCAACACAGCCTACCTGCAGCTCAGCAGCCTGACATCTGAGGACACTGCCGTCTATTACTGTGCTAGAACTACTTTGACTACTGGGGCCAAGGCACCACTCTCACAGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     VJ_raw_ref
## s1_CACCTTGCAAACGCGA                                            ATGGCCTGGACTTCACTTATACTCTCTCTCCTGGCTCTCTGCTCAGGAGCCAGTTCCCAGGCTGTTGTGACTCAGGAATCTGCACTCACCACATCACCTGGTGGAACAGTCATACTCACTTGTCGCTCAAGTACTGGGGCTGTTACAACTAGTAACTATGCCAACTGGGTCCAAGAAAAACCAGATCATTTATTCACTGGTCTAATAGGTGGTACCAGCAACCGAGCTCCAGGTGTTCCTGTCAGATTCTCAGGCTCCCTGATTGGAGACAAGGCTGCCCTCACCATCACAGGGGCACAGACTGAGGATGATGCAATGTATTTCTGTGCTCTATGGTACAGCACCCATTTCTTATGTTTTCGGCGGTGGAACCAAGGTCACTGTCCTAGGTCAGCCCAAGTCCACTCCCACTCTCACCGTGTTTCCACCTTCCTCTGAGGAGCTCAAGGAAAACAAAGCCACACTGGTGTGTCTGATTTCCAACTTTTCCCCGAGTGGTGTGACAGTGGCCTGGAAGGCAAATGGTACACCTATCACCCAGGGTGTGGACACTTCAAATCCCACCAAAGAGGGCAACAAGTTCATGGCCAGCAGCTTCCTACATTTGACATCGGACCAGTGGAGATCTCACAACAGTTTTACCTGTCAAGTTACACATGAAGGGGACACTGTGGAGAAGAGTCTGTCTCCTGCAGAATGTCTC
## s1_CGATGGCAGGGTATCG                        ATGGAATCACAGACTCAGGTCCTCATGTCCCTGCTGTTCTGGGTATCTGGTACCTGTGGGGACATTGTGATGACACAGTCTCCATCCTCCCTGACTGTGACAGCAGGAGAGAAGGTCACTATGAGCTGCAAGTCCAGTCAGAGTCTGTTAAACAGTGGAAATCAAAAGAACTACTTGACCTGGTACCAGCAGAAACCAGGGCAGCCTCCTAAACTGTTGATCTACTGGGCATCCACTAGGGAATCTGGGGTCCCTGATCGCTTCACAGGCAGTGGATCTGGAACAGATTTCACTCTCACCATCAGCAGTGTGCAGGCTGAAGACCTGGCAGTTTATTACTGTCAGAATGATTATAGTTATCCTCCGCTCACGTTCGGTGCTGGGACCAAGCTGGAGCTGAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGAACAACTTCTACCCCAAAGACATCAATGTCAAGTGGAAGATTGATGGCAGTGAACGACAAAATGGCGTCCTGAACAGTTGGACTGATCAGGACAGCAAAGACAGCACCTACAGCATGAGCAGCACCCTCACGTTGACCAAGGACGAGTATGAACGACATAACAGCTATACCTGTGAGGCCACTCACAAGACATCAACTTCACCCATTGTCAAGAGCTTCAACAGGAATGAGTGT
## s1_CTGATAGGTTCTGTTT      AGACAGGCAGTGGGAGCAAGATGGATTCACAGGCCCAGGTTCTTATATTGCTGCTGCTATGGGTATCTGGTACCTGTGGGGACATTGTGATGTCACAGTCTCCATCCTCCCTGGCTGTGTCAGCAGGAGAGAAGGTCACTATGAGCTGCAAATCCAGTCAGAGTCTGCTCAACAGTAGAACCCGAAAGAACTACTTGGCTTGGTACCAGCAGAAACCAGGGCAGTCTCCTAAACTGCTGATCTACTGGGCATCCACTAGGGAATCTGGGGTCCCTGATCGCTTCACAGGCAGTGGATCTGGGACAGATTTCACTCTCACCATCAGCAGTGTGCAGGCTGAAGACCTGGCAGTTTATTACTGCAAGCAATCTTATAATCTTCCTGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGAACAACTTCTACCCCAAAGACATCAATGTCAAGTGGAAGATTGATGGCAGTGAACGACAAAATGGCGTCCTGAACAGTTGGACTGATCAGGACAGCAAAGACAGCACCTACAGCATGAGCAGCACCCTCACGTTGACCAAGGACGAGTATGAACGACATAACAGCTATACCTGTGAGGCCACTCACAAGACATCAACTTCACCCATTGTCAAGAGCTTCAACAGGAATGAGTGT
## s1_GCTGCTTTCTCTGAGA                                     ATGGATTTTCAAGTGCAGATTTTCAGCTTCCTGCTAATCAGTGCCTCAGTCATAATATCCAGAGGACAAATTGTTCTCACCCAGTCTCCAGCAATCATGTCTGCATCTCCAGGGGAGAAGGTCACCATGACCTGCAGTGCCAGCTCAAGTGTAAGTTACATGCACTGGTACCAGCAGAAGTCAGGCACCTCCCCCAAAAGATGGATTTATGACACATCCAAACTGGCTTCTGGAGTCCCTGCTCGCTTCAGTGGCAGTGGGTCTGGGACCTCTTACTCTCTCACAATCAGCAGCATGGAGGCTGAAGATGCTGCCACTTATTACTGCCAGCAGTGGAGTAGTAACCCACCCAGCTCACGTTCGGTGCTGGGACCAAGCTGGAGCTGAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGAACAACTTCTACCCCAAAGACATCAATGTCAAGTGGAAGATTGATGGCAGTGAACGACAAAATGGCGTCCTGAACAGTTGGACTGATCAGGACAGCAAAGACAGCACCTACAGCATGAGCAGCACCCTCACGTTGACCAAGGACGAGTATGAACGACATAACAGCTATACCTGTGAGGCCACTCACAAGACATCAACTTCACCCATTGTCAAGAGCTTCAACAGGAATGAGTGT
## s1_GTCACGGCACGCATCG GCTGACCAATATTGAAAAGAATAGACCTGGTTTGTGAATTATGGCCTGGATTTCACTTATACTCTCTCTCCTGGCTCTCAGCTCAGGGGCCATTTCCCAGGCTGTTGTGACTCAGGAATCTGCACTCACCACATCACCTGGTGAAACAGTCACACTCACTTGTCGCTCAAGTACTGGGGCTGTTACAACTAGTAACTATGCCAACTGGGTCCAAGAAAAACCAGATCATTTATTCACTGGTCTAATAGGTGGTACCAACAACCGAGCTCCAGGTGTTCCTGCCAGATTCTCAGGCTCCCTGATTGGAGACAAGGCTGCCCTCACCATCACAGGGGCACAGACTGAGGATGAGGCAATATATTTCTGTGCTCTATGGTACAGCAACCATTTCCTGGGTGTTCGGTGGAGGAACCAAACTGACTGTCCTAGGCCAGCCCAAGTCTTCGCCATCAGTCACCCTGTTTCCACCTTCCTCTGAAGAGCTCGAGACTAACAAGGCCACACTGGTGTGTACGATCACTGATTTCTACCCAGGTGTGGTGACAGTGGACTGGAAGGTAGATGGTACCCCTGTCACTCAGGGTATGGAGACAACCCAGCCTTCCAAACAGAGCAACAACAAGTACATGGCTAGCAGCTACCTGACCCTGACAGCAAGAGCATGGGAAAGGCATAGCAGTTACAGCTGCCAGGTCACTCATGAAGGTCACACTGTGGAGAAGAGTTTGTCCCGTGCTGACTGTTCC
## s1_TGACTAGTCGCCTGTT                                          ATGAAGTTTCCTTCTCAACTTCTGCTCTTACTGCTGTTTGGAATCCCAGGCATGATATGTGACATCCAGATGACACAATCTTCATCCTCCTTTTCTGTATCTCTAGGAGACAGAGTCACCATTACTTGCAAGGCAAGTGAGGACATATATAATCGGTTAGCCTGGTATCAGCAGAAACCAGGAAATGCTCCTAGGCTCTTAATATCTGGTGCAACCAGTTTGGAAACTGGGGTTCCTTCAAGATTCAGTGGCAGTGGATCTGGAAAGGATTACACTCTCAGCATTACCAGTCTTCAGACTGAAGATGTTGCTACTTATTACTGTCAACAGTATTGGAGTACTCCTCCGTGGACGTTCGGTGGAGGCACCAAGCTGGAAATCAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGAACAACTTCTACCCCAAAGACATCAATGTCAAGTGGAAGATTGATGGCAGTGAACGACAAAATGGCGTCCTGAACAGTTGGACTGATCAGGACAGCAAAGACAGCACCTACAGCATGAGCAGCACCCTCACGTTGACCAAGGACGAGTATGAACGACATAACAGCTATACCTGTGAGGCCACTCACAAGACATCAACTTCACCCATTGTCAAGAGCTTCAACAGGAATGAGTGT
##                     VDJ_trimmed_ref VJ_trimmed_ref     VDJ_raw_consensus_id
## s1_CACCTTGCAAACGCGA                                clonotype28_concat_ref_1
## s1_CGATGGCAGGGTATCG                                 clonotype2_concat_ref_1
## s1_CTGATAGGTTCTGTTT                                clonotype57_concat_ref_1
## s1_GCTGCTTTCTCTGAGA                                clonotype22_concat_ref_1
## s1_GTCACGGCACGCATCG                                clonotype60_concat_ref_1
## s1_TGACTAGTCGCCTGTT                                clonotype52_concat_ref_1
##                          VJ_raw_consensus_id     orig_barcode specifity
## s1_CACCTTGCAAACGCGA clonotype28_concat_ref_3 CACCTTGCAAACGCGA        NA
## s1_CGATGGCAGGGTATCG  clonotype2_concat_ref_2 CGATGGCAGGGTATCG        NA
## s1_CTGATAGGTTCTGTTT clonotype57_concat_ref_2 CTGATAGGTTCTGTTT        NA
## s1_GCTGCTTTCTCTGAGA clonotype22_concat_ref_3 GCTGCTTTCTCTGAGA        NA
## s1_GTCACGGCACGCATCG clonotype60_concat_ref_2 GTCACGGCACGCATCG        NA
## s1_TGACTAGTCGCCTGTT clonotype52_concat_ref_3 TGACTAGTCGCCTGTT        NA
##                     affinity     batches
## s1_CACCTTGCAAACGCGA       NA Unspecified
## s1_CGATGGCAGGGTATCG       NA Unspecified
## s1_CTGATAGGTTCTGTTT       NA Unspecified
## s1_GCTGCTTTCTCTGAGA       NA Unspecified
## s1_GTCACGGCACGCATCG       NA Unspecified
## s1_TGACTAGTCGCCTGTT       NA Unspecified

4.1.2 Filtering chains by UMI counts

While filtering out all cells with aberrant chain numbers is often used during processing of VDJ data, Platypus offers a format which can accomodate and integrate these cells into analysis. (See VDJ_clonotype()). A third option was proposed by Zhang W et al. (Sci Adv. 2021 10.1126/sciadv.abf5835): To choose between excess chains, by the count of unique molecular identifier of each contig (UMIs). The VGM function implements this strategy with two parameters: 1. select.excess.chains.by.umi.count Is a boolean. Once set to TRUE the VGM will filter excess chains based on UMI count

  1. excess.chain.confidence.count.threshold is an integer that allows to tweak filtering. It defaults to 1000 (any large value would achive the same). To illustrate the filtering behavior we can consider the following example with cells having 2 VJ chains

select.excess.chains.by.umi.count = T

barcode Nr_of_VJ_chains VJ_UMIs
Cell 1 2 1;1
Cell 2 2 1;5
Cell 3 2 3;3

FOR: excess.chain.confidence.count.threshold = 1000

Cell 1 -> both chains are below the threshold and are therefore subject to filtering. Given that both chains have the same UMI count, a one contig is eliminated at random. Cell 2 -> both chains are below the threshold and subject to filtering. Chain 2 has the higher UMI and is therefore kept Cell 3 -> proceeds as for Cell 1

FOR: excess.chain.confidence.count.threshold = 3

Cell 1 -> same as above Cell 2 -> chain 1 is below threshold and subject to filtering. Chain 2 is above threshold and therefore considered a high confidence chain and not filtered Cell 3 -> Both chains are equal or above threshold, both are considered high confident and no chain is filtered.

UMI distribution within VDJ datasets can vary. To optimize this filtering parameter we recommend investigating UMI frequencies in cells with double chains.

vgm <- VDJ_GEX_matrix(VDJ.out.directory.list = VDJ.out.directory.list,
                      select.excess.chains.by.umi.count = T,
                      excess.chain.confidence.count.threshold = 1000) 

4.2 VDJ trimming and aligning and translating

It may be useful to obtain full-length sequences (e.g., for phylogenetics or experimental expression and validation, see Pogson M. et al. Nat Comm. 2016, 10.1038/ncomms12535) from single-cell sequencing data. We have therefore added an option to recover both trimmed and untrimmed sequences as determined by the cellranger vdj function. If the user sets the trim.and.align parameter to TRUE, this will return processed VDJ contig strings as detailed below:

  1. Trimming: The raw nt contig sequences are trimmed to the start of the V segment and the end of the J segment. (VDJ/VJ_sequence_nt_trimmed) These trimmed sequences contain signal peptide information. In the case that a user does not want this, we currently use MIXCR to realign and extract the sequence that spans FR1 to FR4 for both heavy and light chains. This can be done using the VDJ_call_MiXCR() function within Platypus, although it does require the user to download mixcr locally.

  2. Translation: The trimmed contigs are translated to yield full amino acid sequences. (VDJ/VJ_sequence_aa)

  3. Realigning to reference: The untrimmed germline sequences (as determined by 10x genomics) for each clonotype consensus sequence are returned are returned when trim.and.align = T. This can be used on IMGT or with MIXCR to further provide a shorter reference sequence that covers just the FR1 to FR4. Furthermore the function aligns trimmed contig sequences to reference contigs using Biostrings::pairwiseAlignment with alignment = “local”; see optimization parameters in section 4.1.2). This results in a reference sequence from the start of the V segment to the end of the J segment. (VDJ/VJ_trimmed_ref) Most of the time this trimmed germline sequence will be out of frame given the CDR3 deletions and insertions. In the case that a user wants an in-frame germline (e.g. for expression), the CDR3 will likely need to be replace. We are currently developing a pipeline that supplies CDR3s of those recovered sequences that are closest to germline - so stay tuned :)

The trimmed contigs are aligned to the sequences of the concat_ref.fasta file and the reference is trimed accordingly. (This is done using: Biostrings::pairwiseAlignment with alignment = “local”; see optimization parameters in section 4.1.2)

vgm <- VDJ_GEX_matrix(VDJ.out.directory.list = VDJ.out.directory.list, 
                      trim.and.align = T) 

vgm[[1]][1,c("VDJ_sequence_nt_raw", "VDJ_sequence_nt_trimmed", "VDJ_sequence_aa", "VDJ_trimmed_ref")]
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              VDJ_sequence_nt_raw
## s1_AAACGGGGTTTAGGAA TGGGAAGTGTGCAGCCATGGGCAGGCTTACTTCTTCATTCCTGTTACTGATTGTCCCTGCATATGTCCTGTCCCAGGTTACTCTGAAAGAGTCTGGCCCTGGGATATTGCAGCCCTCCCAGACCCTCAGTCTGACTTGTTCTTTCTCTGGGTTTTCACTGAGCACTTTTGGTATGGGTGTAGGCTGGATTCGTCAGCCTTCAGGGAAGGGTCTGGAGTGGCTGGCACACATTTGGTGGGATGATGATAAGTACTATAACCCAGCCCTGAAGAGTCGGCTCACAATCTCCAAGGATACCTCCAAAAACCAGGTATTCCTCAAGATCGCCAATGTGGACACTGCAGATACTGCCACATACTACTGTGCTCGAATAGGATATGCTATGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA
##                                                                                                                                                                                                                                                                                                                                                                                                                      VDJ_sequence_nt_trimmed
## s1_AAACGGGGTTTAGGAA ATGGGCAGGCTTACTTCTTCATTCCTGTTACTGATTGTCCCTGCATATGTCCTGTCCCAGGTTACTCTGAAAGAGTCTGGCCCTGGGATATTGCAGCCCTCCCAGACCCTCAGTCTGACTTGTTCTTTCTCTGGGTTTTCACTGAGCACTTTTGGTATGGGTGTAGGCTGGATTCGTCAGCCTTCAGGGAAGGGTCTGGAGTGGCTGGCACACATTTGGTGGGATGATGATAAGTACTATAACCCAGCCCTGAAGAGTCGGCTCACAATCTCCAAGGATACCTCCAAAAACCAGGTATTCCTCAAGATCGCCAATGTGGACACTGCAGATACTGCCACATACTACTGTGCTCGAATAGGATATGCTATGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCA
##                                                                                                                                              VDJ_sequence_aa
## s1_AAACGGGGTTTAGGAA MGRLTSSFLLLIVPAYVLSQVTLKESGPGILQPSQTLSLTCSFSGFSLSTFGMGVGWIRQPSGKGLEWLAHIWWDDDKYYNPALKSRLTISKDTSKNQVFLKIANVDTADTATYYCARIGYAMDYWGQGTSVTVSS
##                                                                                                                                                                                                                                                                                                                                                                                                                                 VDJ_trimmed_ref
## s1_AAACGGGGTTTAGGAA ATGGGCAGGCTTACTTCTTCATTCCTGTTACTGATTGTCCCTGCATATGTCCTGTCCCAGGTTACTCTGAAAGAGTCTGGCCCTGGGATATTGCAGCCCTCCCAGACCCTCAGTCTGACTTGTTCTTTCTCTGGGTTTTCACTGAGCACTTTTGGTATGGGTGTAGGCTGGATTCGTCAGCCTTCAGGGAAGGGTCTGGAGTGGCTGGCACACATTTGGTGGGATGATGATAAGTACTATAACCCAGCCCTGAAGAGTCGGCTCACAATCTCCAAGGATACCTCCAAAAACCAGGTATTCCTCAAGATCGCCAATGTGGACACTGCAGATACTGCCACATACTACTGTGCTCGAATAGATTACTATGCTATGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCA

4.2.1 Optimizing parameters of VDJ aligning

A key difficulty of TCR and BCR sequences, is that CDR3-containing junction regions almost never matches known references. This leaves alignment gaps and can lead to the highest scoring alignment to not contain the junction-following J segment. The VGM therefore allows to tweak the alignment parameters gap.opening.cost and gap.extension.cost

vgm <- VDJ_GEX_matrix(VDJ.out.directory.list = VDJ.out.directory.list, 
                      trim.and.align = T, 
                      gap.opening.cost = 10, 
                      gap.extension.cost = 4) 

4.2.2 Optimizing alignment runtime

To accelerate alignment the VGM allows for multicore processing via the functions mclapply or parlapply depending on the operating system from the Parallel package. Given initiation times of multicore processes, we only recommend using it for datasets with >2000 cells.

The parameter numcores can be used to set the number of cores used. This defaults to all available cores, therefore, setting a limit when running the function on a cluster is crucial.

By default parallel.processing is set to “none” and the function uses standard lapply

#For LINUX and WINDOWS users
vgm <- VDJ_GEX_matrix(VDJ.out.directory.list = VDJ.out.directory.list, 
                      trim.and.align = T, 
                      parallel.processing = "parlapply",
                      numcores = 8) 

#For MAC users
vgm <- VDJ_GEX_matrix(VDJ.out.directory.list = VDJ.out.directory.list, 
                      trim.and.align = T, 
                      parallel.processing = "mclapply",
                      numcores = 8) 

5. Conclusion

We hope that this comprehensive vignette of the VGM function enables any user to employ the full functionality of this function. If you have any issues, requests or ideas concerning existing or new features, please reach out to us via Github or Email.

6. Version information

## R version 4.0.5 (2021-03-31)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19044)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252   
## [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
## [5] LC_TIME=German_Germany.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] Platypus_3.4.0     SeuratObject_4.0.4 Seurat_4.1.0       forcats_0.5.1     
##  [5] stringr_1.4.0      purrr_0.3.4        readr_2.1.2        tidyr_1.1.3       
##  [9] tibble_3.1.2       ggplot2_3.3.5      tidyverse_1.3.1    dplyr_1.0.7       
## 
## loaded via a namespace (and not attached):
##   [1] readxl_1.3.1          backports_1.4.1       systemfonts_1.0.4    
##   [4] plyr_1.8.6            igraph_1.2.6          lazyeval_0.2.2       
##   [7] splines_4.0.5         listenv_0.8.0         scattermore_0.7      
##  [10] digest_0.6.27         useful_1.2.6          htmltools_0.5.2      
##  [13] fansi_0.5.0           magrittr_2.0.1        memoise_2.0.1        
##  [16] tensor_1.5            cluster_2.1.2         ROCR_1.0-11          
##  [19] tzdb_0.2.0            Biostrings_2.58.0     globals_0.14.0       
##  [22] modelr_0.1.8          matrixStats_0.59.0    pkgdown_2.0.2        
##  [25] spatstat.sparse_2.0-0 colorspace_2.0-2      rvest_1.0.2          
##  [28] ggrepel_0.9.1         textshaping_0.3.6     haven_2.4.3          
##  [31] xfun_0.27             crayon_1.5.0          jsonlite_1.7.2       
##  [34] spatstat.data_2.1-2   survival_3.2-11       zoo_1.8-9            
##  [37] glue_1.4.2            polyclip_1.10-0       gtable_0.3.0         
##  [40] zlibbioc_1.36.0       XVector_0.30.0        seqinr_4.2-8         
##  [43] leiden_0.3.9          future.apply_1.8.1    BiocGenerics_0.36.1  
##  [46] abind_1.4-5           scales_1.1.1          DBI_1.1.2            
##  [49] miniUI_0.1.1.1        Rcpp_1.0.7            viridisLite_0.4.0    
##  [52] xtable_1.8-4          reticulate_1.20       spatstat.core_2.2-0  
##  [55] stats4_4.0.5          htmlwidgets_1.5.4     httr_1.4.2           
##  [58] RColorBrewer_1.1-2    ellipsis_0.3.2        ica_1.0-2            
##  [61] farver_2.1.0          pkgconfig_2.0.3       uwot_0.1.10          
##  [64] sass_0.4.0            dbplyr_2.1.1          deldir_0.2-10        
##  [67] utf8_1.2.1            labeling_0.4.2        tidyselect_1.1.1     
##  [70] rlang_0.4.10          reshape2_1.4.4        later_1.2.0          
##  [73] munsell_0.5.0         cellranger_1.1.0      tools_4.0.5          
##  [76] cachem_1.0.6          cli_3.1.1             generics_0.1.2       
##  [79] ade4_1.7-18           broom_0.7.12          ggridges_0.5.3       
##  [82] evaluate_0.14         fastmap_1.1.0         yaml_2.2.1           
##  [85] ragg_1.2.1            goftest_1.2-2         knitr_1.37           
##  [88] fs_1.5.2              fitdistrplus_1.1-6    RANN_2.6.1           
##  [91] pbapply_1.5-0         future_1.24.0         nlme_3.1-152         
##  [94] mime_0.11             xml2_1.3.3            compiler_4.0.5       
##  [97] rstudioapi_0.13       plotly_4.10.0         png_0.1-7            
## [100] spatstat.utils_2.2-0  reprex_2.0.1          bslib_0.3.1          
## [103] stringi_1.7.4         highr_0.9             RSpectra_0.16-0      
## [106] desc_1.4.0            lattice_0.20-44       Matrix_1.3-4         
## [109] vctrs_0.3.8           pillar_1.7.0          lifecycle_1.0.1      
## [112] spatstat.geom_2.2-0   lmtest_0.9-38         jquerylib_0.1.4      
## [115] RcppAnnoy_0.0.18      data.table_1.14.0     cowplot_1.1.1        
## [118] irlba_2.3.3           httpuv_1.6.1          patchwork_1.1.1      
## [121] R6_2.5.1              promises_1.2.0.1      KernSmooth_2.23-20   
## [124] gridExtra_2.3         IRanges_2.24.1        parallelly_1.30.0    
## [127] codetools_0.2-18      MASS_7.3-54           assertthat_0.2.1     
## [130] rprojroot_2.0.2       withr_2.4.3           sctransform_0.3.3    
## [133] S4Vectors_0.28.1      mgcv_1.8-36           parallel_4.0.5       
## [136] hms_1.1.1             grid_4.0.5            rpart_4.1-15         
## [139] rmarkdown_2.11        Rtsne_0.15            shiny_1.7.1          
## [142] lubridate_1.8.0