vignettes/vgm_overview.Rmd
vgm_overview.Rmd
The VGM is the central data object of the current iteration of Platypus that is produced by the VDJ_GEX_matrix function. The main idea was to create a single object that contains all relevant immune information that can be supplied to all other functions in our package. This additionally provides the benefit that in the case of custom data formats (e.g. pre-existing Seurat objects or non 10x single-cell repertoire information), the user can adapt the necessary column names and still use the downstream functions of Platypus.
For downstream examples of how the VGM interacts with other Platypus functions, please refer to the Platypus Quickstart vignette. This vignette will describe how we create the VGM from the output of 10x genomics cellranger and how this can be modulated by the various function arguments.
As the function processes both GEX and VDJ, this vignette is devided into 3 parts: 2. General settings 3.1 Gene expression (GEX) 3.2. Feature Barcodes (FB) 4. Immune receptor repertoire (VDJ)
Most examples use the yermanos2021a dataset from PlatypusDB, which contains B and T cells (GEX + VDJ) and is also featured in the quickstart vignette due to the low number of cells. For more information, please refer to the corresponding publication: https://doi.org/10.1098/rspb.2020.2793
The VGM takes three different input formats that will be covered in the respective sections: 1. Local Paths to cellranger output files (covered below) 2. Data.in list input from either PlatypusDB_fetch() or PlatypusDB_load_from_disk() (covered in the PlatypusDB vignette) This is for the case where a user would like to download raw PlatypusDB datasets and integrate these with local data. 3. A processed Seurat object as Seurat.in (Covered in section 3.1.4) This is for the case that a user would like to use an existing seurat object, which may be desired when using custom normalization and integration methods for GEX data.
Local paths should be provided to cellranger directories which… …for GEX: Corresponds to the “outs” folder from cellranger count function. Under default parameters, the directory supplied as input to the VGM function should contain the filtered_feature_annotations folder from 10x cellranger count. …for VDJ: Corresponds to the “outs” folder from the cellranger vdj function. Under default parameters, the directory supplied as input to the VGM function should contain files such as clonotypes.csv 10x cellranger vdj.
Below is an example of a basic run. The user will need to change the input directory to their own local output files from cellranger.
#Creating a list with local paths to cellranger directories
VDJ.out.directory.list <-
list("C:/Users/PlatypusDB/yermanos2021b__VDJ_RAW/Aged.CNS.pool.3m.Bcell.S1", "C:/Users/PlatypusDB/yermanos2021b__VDJ_RAW/Aged.CNS.pool.12m.Bcell.S2")
GEX.out.directory.list <-
list("C:/Users/PlatypusDB/yermanos2021b__GEX_RAW/Aged.CNS.pool.3m.Bcell.S1", "C:/Users/PlatypusDB/yermanos2021b__GEX_RAW/Aged.CNS.pool.12m.Bcell.S2")
#Running the VDJ_GEX_matrix function
vgm <- VDJ_GEX_matrix(VDJ.out.directory.list = VDJ.out.directory.list,
GEX.out.directory.list = GEX.out.directory.list,
verbose = T) #For more detailed runtime messages
## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session
The function can also be run with just GEX or VDJ input. The output object will have the same format as if both GEX and VDJ folders were supplied as input to ensure compatibility with all downstream functions. If this is needed, please provide only the desired input
#Only VDJ run
vgm <- VDJ_GEX_matrix(VDJ.out.directory.list = VDJ.out.directory.list)
#Only GEX run
vgm <- VDJ_GEX_matrix(GEX.out.directory.list = GEX.out.directory.list)
Irregardless of the input, the output format stays the same. This allows compatibility with all other downstream Platypus as well as custom functions.
names(vgm)
## [1] "VDJ" "GEX" "VDJ.GEX.stats" "Running params"
## [5] "sessionInfo"
The VGM output is a list of 5 elements as seen above. Certain downstream functions may add additional list elements but will always maintain the first 5.
VDJ is a data.frame with standard column output
head(vgm[[1]])
## barcode sample_id group_id clonotype_id_10x clonotype_id
## 1 s1_AAACGGGGTTTAGGAA s1 1 clonotype7 clonotype7
## 2 s1_AAGGCAGTCTCTGCTG s1 1 clonotype58 clonotype58
## 3 s1_ACAGCTACAGTCGTGC s1 1 clonotype65 clonotype65
## 4 s1_ACTTGTTGTACTTAGC s1 1 clonotype43 clonotype43
## 5 s1_AGAGCTTGTCATGCAT s1 1 clonotype33 clonotype33
## 6 s1_AGAGTGGAGGAGTCTG s1 1 clonotype49 clonotype49
## clonotype_frequency celltype Nr_of_VDJ_chains Nr_of_VJ_chains
## 1 1 B cell 1 1
## 2 1 B cell 1 1
## 3 1 B cell 1 1
## 4 1 B cell 1 1
## 5 1 B cell 1 1
## 6 1 B cell 1 1
## VDJ_cdr3s_aa VJ_cdr3s_aa
## 1 CARIGYAMDYW CQQGNTLPPTF
## 2 CARDFTTVVARGYFDVW CQQDYSSPWTF
## 3 CARGITTVVAYYYAMDYW CLQYDNLYTF
## 4 CTTGPYDYYAMDYW CQQHYSTPYTF
## 5 CARPHDYDGVDYW CSQSTHVPPWTF
## 6 CARRYYSNYAWFAYW CQQWSSYPYTF
## VDJ_cdr3s_nt
## 1 TGTGCTCGAATAGGATATGCTATGGACTACTGG
## 2 TGTGCAAGAGACTTTACTACGGTAGTAGCCCGGGGGTACTTCGATGTCTGG
## 3 TGTGCAAGAGGGATTACTACGGTAGTAGCTTATTACTATGCTATGGACTACTGG
## 4 TGTACTACGGGCCCTTATGATTACTATGCTATGGACTACTGG
## 5 TGTGCAAGGCCCCATGATTACGACGGAGTTGACTACTGG
## 6 TGTGCAAGACGCTACTATAGTAACTACGCCTGGTTTGCTTACTGG
## VJ_cdr3s_nt VDJ_umis VJ_umis
## 1 TGCCAACAGGGTAATACGCTTCCTCCGACGTTC 2 8
## 2 TGTCAGCAGGATTATAGCTCTCCGTGGACGTTC 1 11
## 3 TGTCTACAGTATGATAATCTGTACACGTTC 51 163
## 4 TGTCAGCAACATTATAGCACTCCGTACACGTTC 2 9
## 5 TGCTCTCAAAGTACACATGTTCCTCCGTGGACGTTC 20 58
## 6 TGCCAGCAGTGGAGTAGTTACCCGTACACGTTC 7 8
## VDJ_chain_contig VJ_chain_contig VDJ_chain VJ_chain
## 1 AAACGGGGTTTAGGAA-1_contig_2 AAACGGGGTTTAGGAA-1_contig_1 IGH IGK
## 2 AAGGCAGTCTCTGCTG-1_contig_2 AAGGCAGTCTCTGCTG-1_contig_1 IGH IGK
## 3 ACAGCTACAGTCGTGC-1_contig_1 ACAGCTACAGTCGTGC-1_contig_2 IGH IGK
## 4 ACTTGTTGTACTTAGC-1_contig_1 ACTTGTTGTACTTAGC-1_contig_2 IGH IGK
## 5 AGAGCTTGTCATGCAT-1_contig_2 AGAGCTTGTCATGCAT-1_contig_1 IGH IGK
## 6 AGAGTGGAGGAGTCTG-1_contig_2 AGAGTGGAGGAGTCTG-1_contig_1 IGH IGK
## VDJ_vgene VJ_vgene VDJ_dgene VDJ_jgene VJ_jgene VDJ_cgene VJ_cgene
## 1 IGHV8-8 IGKV10-96 IGHJ4 IGKJ1 IGHM IGKC
## 2 IGHV1-9 IGKV6-32 IGHJ1 IGKJ1 IGHD IGKC
## 3 IGHV1-26 IGKV19-93 IGHJ4 IGKJ2 IGHD IGKC
## 4 IGHV14-1 IGKV8-24 IGHJ4 IGKJ2 IGHD IGKC
## 5 IGHV5-17 IGKV1-110 IGHJ2 IGKJ1 IGHM IGKC
## 6 IGHV1-9 IGKV4-55 IGHD2-5 IGHJ3 IGKJ2 IGHM IGKC
## VDJ_sequence_nt_raw
## 1 TGGGAAGTGTGCAGCCATGGGCAGGCTTACTTCTTCATTCCTGTTACTGATTGTCCCTGCATATGTCCTGTCCCAGGTTACTCTGAAAGAGTCTGGCCCTGGGATATTGCAGCCCTCCCAGACCCTCAGTCTGACTTGTTCTTTCTCTGGGTTTTCACTGAGCACTTTTGGTATGGGTGTAGGCTGGATTCGTCAGCCTTCAGGGAAGGGTCTGGAGTGGCTGGCACACATTTGGTGGGATGATGATAAGTACTATAACCCAGCCCTGAAGAGTCGGCTCACAATCTCCAAGGATACCTCCAAAAACCAGGTATTCCTCAAGATCGCCAATGTGGACACTGCAGATACTGCCACATACTACTGTGCTCGAATAGGATATGCTATGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA
## 2 TTGGGGAGTGTCCTCTCCAAAGTCCTTGAACATAGACTCTAACCATGGAATGGACCTGGGTCTTTCTCTTCCTCCTGTCAGTAACTGCAGGTGTCCACTCCCAGGTTCAGCTGCAGCAGTCTGGAGCTGAGCTGATGAAGCCTGGGGCCTCAGTGAAGCTTTCCTGCAAGGCTACTGGCTACACATTCACTGGCTACTGGATAGAGTGGGTAAAGCAGAGGCCTGGACATGGCCTTGAGTGGATTGGAGAGATTTTACCTGGAAGTGGTAGTACTAACTACAATGAGAAGTTCAAGGGCAAGGCCACATTCACTGCAGATACATCCTCCAACACAGCCTACATGCAACTCAGCAGCCTGACAACTGAGGACTCTGCCATCTATTACTGTGCAAGAGACTTTACTACGGTAGTAGCCCGGGGGTACTTCGATGTCTGGGGCACAGGGACCACGGTCACCGTCTCCTCAGGTAATGAAAAGGGACCTGACATGTTCCTCCTCTCAGAGTGCAAAGCCCCAGAGGAAAATGAAAAGATAAACCTGGGCTGTTTAGTAATTGGAAGTCAGCCACTGAAAATCAGCTGGGAGCCAAAGAAGTCAAGTATAGTTGAACATGTCTTCCCCTCTGAAATGAGAAATGGCAATTATACAATGGTCCTCCAGGTCACTGTGCTGGCCTC
## 3 GGGAACATATGTACAATGTCCTCACCACAGACACTGAACACACTGACTCTAACCATGGGATGGAGCTGGATCTTTCTCTTTCTCCTGTCAGGAACTGCAGGTGTCCTCTCTGAGGTCCAGCTGCAACAATCTGGACCTGAGCTGGTGAAGCCTGGGGCTTCAGTGAAGATATCCTGTAAGGCTTCTGGATACACGTTCACTGACTACTACATGAACTGGGTGAAGCAGAGCCATGGAAAGAGCCTTGAGTGGATTGGAGATATTAATCCTAACAATGGTGGTACTAGCTACAACCAGAAGTTCAAGGGCAAGGCCACATTGACTGTAGACAAGTCCTCCAGCACAGCCTACATGGAGCTCCGCAGCCTGACATCTGAGGACTCTGCAGTCTATTACTGTGCAAGAGGGATTACTACGGTAGTAGCTTATTACTATGCTATGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCAGGTAATGAAAAGGGACCTGACATGTTCCTCCTCTCAGAGTGCAAAGCCCCAGAGGAAAATGAAAAGATAAACCTGGGCTGTTTAGTAATTGGAAGTCAGCCACTGAAAATCAGCTGGGAGCCAAAGAAGTCAAGTATAGTTGAACATGTCTTCCCCTCTGAAATGAGAAATGGCAATTATACAATGGTCCTCCAGGTCACTGTGCTGGCCTC
## 4 GGGACTCAACTTCCTTCTTCTCCAGCCAGAATGTCCTTATGTAAGAAAGATCCTGTATGCAAATCATGTGAGACTGTGATGATTAATATAGGGATATCCACACCAAACATCATATGAGCCCTGTCTTCTCTACAGCCACTGAATCTCAAGATCCTTACAATGAAATGCAGCTGGGTCATCTTCTTCCTGATGGCAGTGGTTACAGGGGTCAATTCAGAGGTTCAGCTGCAGCAGTCTGGGGCAGAGCTTGTGAGGCCAGGGGCCTCAGTCAAGTTGTCCTGCACAGCTTCTGGCTTCAACATTAAAGACTACTATATGCACTGGGTGAAGCAGAGGCCTGAACAGGGCCTGGAGTGGATTGGAAGGATTGATCCTGAGGATGGTGATACTGAATATGCCCCGAAGTTCCAGGGCAAGGCCACTATGACTGCAGACACATCCTCCAACACAGCCTACCTGCAGCTCAGCAGCCTGACATCTGAGGACACTGCCGTCTATTACTGTACTACGGGCCCTTATGATTACTATGCTATGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCAGGTAATGAAAAGGGACCTGACATGTTCCTCCTCTCAGAGTGCAAAGCCCCAGAGGAAAATGAAAAGATAAACCTGGGCTGTTTAGTAATTGGAAGTCAGCCACTGAAAATCAGCTGGGAGCCAAAGAAGTCAAGTATAGTTGAACATGTCTTCCCCTCTGAAATGAGAAATGGCAATTATACAATGGTCCTCCAGGTCACTGTGCTGGCCTC
## 5 TGGATTCCCAGGTCCTCACATTCAGTGATCAGCACTGAACACAGACCACTCACCATGGACTCCAGGCTCAATTTAGTTTTCCTTGTCCTTATTTTAAAAGGTGTCCAGTGTGAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTAGTGAAGCCTGGAGGGTCCCTGAAACTCTCCTGTGCAGCCTCTGGATTCACTTTCAGTGACTATGGAATGCACTGGGTTCGTCAGGCTCCAGAGAAGGGGCTGGAGTGGGTTGCATACATTAGTAGTGGCAGTAGTACCATCTACTATGCAGACACAGTGAAGGGCCGATTCACCATCTCCAGAGACAATGCCAAGAACACCCTGTTCCTGCAAATGACCAGTCTGAGGTCTGAGGACACGGCCATGTATTACTGTGCAAGGCCCCATGATTACGACGGAGTTGACTACTGGGGCCAAGGCACCACTCTCACAGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA
## 6 TGGGGAGCATATGATCAGTGTCCTCTCCAAAGTCCTTGAACATAGACTCTAACCATGGAATGGACCTGGGTCTTTCTCTTCCTCCTGTCAGTAACTGCAGGTGTCCACTCCCAGGTTCAGCTGCAGCAGTCTGGAGCTGAGCTGATGAAGCCTGGGGCCTCAGTGAAGCTTTCCTGCAAGGCTACTGGCTACACATTCACTGGCTACTGGATAGAGTGGGTAAAGCAGAGGCCTGGACATGGCCTTGAGTGGATTGGAGAGATTTTACCTGGAAGTGGTAGTACTAACTACAATGAGAAGTTCAAGGGCAAGGCCACATTCACTGCAGATACATCCTCCAACACAGCCTACATGCAACTCAGCAGCCTGACAACTGAGGACTCTGCCATCTATTACTGTGCAAGACGCTACTATAGTAACTACGCCTGGTTTGCTTACTGGGGCCAAGGGACTCTGGTCACTGTCTCTGCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA
## VJ_sequence_nt_raw
## 1 TTGGGGCATTGTAATTGAAGTCAAGACTCAGCCTGGACATGATGTCCTCTGCTCAGTTCCTTGGTCTCCTGTTGCTCTGTTTTCAAGGTACCAGATGTGATATCCAGATGACACAGACTACATCCTCCCTGTCTGCCTCTCTGGGAGACAGAGTCACCATCAGTTGCAGGGCAAGTCAGGACATTAGCAATTATTTAAACTGGTATCAGCAGAAACCAGATGGAACTGTTAAACTCCTGATCTACTACACATCAAGATTACACTCAGGAGTCCCATCAAGGTTCAGTGGCAGTGGGTCTGGAACAGATTATTCTCTCACCATTAGCAACCTGGAGCAAGAAGATATTGCCACTTACTTTTGCCAACAGGGTAATACGCTTCCTCCGACGTTCGGTGGAGGCACCAAGCTGGAAATCAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC
## 2 GGCAGGCAAGGGCATCAAGATGAAGTCACAGACCCAGGTCTTCGTATTTCTACTGCTCTGTGTGTCTGGTGCTCATGGGAGTATTGTGATGACCCAGACTCCCAAATTCCTGCTTGTATCAGCAGGAGACAGGGTTACCATAACCTGCAAGGCCAGTCAGAGTGTGAGTAATGATGTAGCTTGGTACCAACAGAAGCCAGGGCAGTCTCCTAAACTGCTGATATACTATGCATCCAATCGCTACACTGGAGTCCCTGATCGCTTCACTGGCAGTGGATATGGGACGGATTTCACTTTCACCATCAGCACTGTGCAGGCTGAAGACCTGGCAGTTTATTTCTGTCAGCAGGATTATAGCTCTCCGTGGACGTTCGGTGGAGGCACCAAGCTGGAAATCAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC
## 3 TTATTTGGGGAGTCATTCTTGGTCAGGAGACGTTGTAGAAATGAGACCGTCTATTCAGTTCCTGGGGCTCTTGTTGTTCTGGCTTCATGGTGCTCAGTGTGACATCCAGATGACACAGTCTCCATCCTCACTGTCTGCATCTCTGGGAGGCAAAGTCACCATCACTTGCAAGGCAAGCCAAGACATTAACAAGTATATAGCTTGGTACCAACACAAGCCTGGAAAAGGTCCTAGGCTGCTCATACATTACACATCTACATTACAGCCAGGCATCCCATCAAGGTTCAGTGGAAGTGGGTCTGGGAGAGATTATTCCTTCAGCATCAGCAACCTGGAGCCTGAAGATATTGCAACTTATTATTGTCTACAGTATGATAATCTGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC
## 4 TTCCGATCTACTTGTTGTACTTAGCCTGTGTTCACATTCTTATTTGGGGAGTGTTGCTGGTGTCCAGGCATGATGAGCATCAGACAGGCTGGGCAGCAAGATGGAATCACAGACCCAGGTCCTCATGTTTCTTCTGCTCTGGGTATCTGGTGCCTGTGCAGACATTGTGATGACACAGTCTCCATCCTCCCTGGCTATGTCAGTAGGACAGAAGGTCACTATGAGCTGCAAGTCCAGTCAGAGCCTTTTAAATAGTAGCAATCAAAAGAACTATTTGGCCTGGTACCAGCAGAAACCAGGACAGTCTCCTAAACTTCTGGTATACTTTGCATCCACTAGGGAATCTGGGGTCCCTGATCGCTTCATAGGCAGTGGATCTGGGACAGATTTCACTCTTACCATCAGCAGTGTGCAGGCTGAAGACCTGGCAGATTACTTCTGTCAGCAACATTATAGCACTCCGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC
## 5 AGAGCTTGTCATGCATTTTGGGCCTATTTCTTTTTTGGGGACTGATCAGTCTCCTCAGGCTGTCTCCTCAGGTTGCCTCCTCAAAATGAAGTTGCCTGTTAGGCTGTTGGTGCTGATGTTCTGGATTCCTGCTTCCAGCAGTGATGTTGTGATGACCCAAACTCCACTCTCCCTGCCTGTCAGTCTTGGAGATCAAGCCTCCATCTCTTGCAGATCTAGTCAGAGCCTTGTACACAGTAATGGAAACACCTATTTACATTGGTACCTGCAGAAGCCAGGCCAGTCTCCAAAGCTCCTGATCTACAAAGTTTCCAACCGATTTTCTGGGGTCCCAGACAGGTTCAGTGGCAGTGGATCAGGGACAGATTTCACACTCAAGATCAGCAGAGTGGAGGCTGAGGATCTGGGAGTTTATTTCTGCTCTCAAAGTACACATGTTCCTCCGTGGACGTTCGGTGGAGGCACCAAGCTGGAAATCAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC
## 6 TTGGGGACTTATGAGAATAGTAGTAATTAGCTAGGGACCAAAGTTCAAAGACAAAATGGATTTTCAAGTGCAGATTTTCAGCTTCCTGCTAATCAGTGCCTCAGTCATACTGTCCAGAGGACAAATTGTTCTCACCCAGTCTCCAGCAATCATGTCTGCATCTCCAGGGGAGAAGGTCACCATGACCTGCAGTGCCAGCTCAAGTGTAAGTTACATGTACTGGTACCAGCAGAAGCCAGGATCCTCCCCCAGACTCCTGATTTATGACACATCCAACCTGGCTTCTGGAGTCCCTGTTCGCTTCAGTGGCAGTGGGTCTGGGACCTCTTACTCTCTCACAATCAGCCGAATGGAGGCTGAAGATGCTGCCACTTATTACTGCCAGCAGTGGAGTAGTTACCCGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC
## VDJ_sequence_nt_trimmed VJ_sequence_nt_trimmed VDJ_sequence_aa VJ_sequence_aa
## 1
## 2
## 3
## 4
## 5
## 6
## VDJ_raw_ref
## 1 ATGGGCAGGCTTACTTCTTCATTCCTGTTACTGATTGTCCCTGCATATGTCCTGTCCCAGGTTACTCTGAAAGAGTCTGGCCCTGGGATATTGCAGCCCTCCCAGACCCTCAGTCTGACTTGTTCTTTCTCTGGGTTTTCACTGAGCACTTTTGGTATGGGTGTAGGCTGGATTCGTCAGCCTTCAGGGAAGGGTCTGGAGTGGCTGGCACACATTTGGTGGGATGATGATAAGTACTATAACCCAGCCCTGAAGAGTCGGCTCACAATCTCCAAGGATACCTCCAAAAACCAGGTATTCCTCAAGATCGCCAATGTGGACACTGCAGATACTGCCACATACTACTGTGCTCGAATAGATTACTATGCTATGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT
## 2 ATGGAATGGACCTGGGTCTTTCTCTTCCTCCTGTCAGTAACTGCAGGTGTCCACTCCCAGGTTCAGCTGCAGCAGTCTGGAGCTGAGCTGATGAAGCCTGGGGCCTCAGTGAAGCTTTCCTGCAAGGCTACTGGCTACACATTCACTGGCTACTGGATAGAGTGGGTAAAGCAGAGGCCTGGACATGGCCTTGAGTGGATTGGAGAGATTTTACCTGGAAGTGGTAGTACTAACTACAATGAGAAGTTCAAGGGCAAGGCCACATTCACTGCAGATACATCCTCCAACACAGCCTACATGCAACTCAGCAGCCTGACAACTGAGGACTCTGCCATCTATTACTGTGCAAGACTACTGGTACTTCGATGTCTGGGGCACAGGGACCACGGTCACCGTCTCCTCAGGTAATGAAAAGGGACCTGACATGTTCCTCCTCTCAGAGTGCAAAGCCCCAGAGGAAAATGAAAAGATAAACCTGGGCTGTTTAGTAATTGGAAGTCAGCCACTGAAAATCAGCTGGGAGCCAAAGAAGTCAAGTATAGTTGAACATGTCTTCCCCTCTGAAATGAGAAATGGCAATTATACAATGGTCCTCCAGGTCACTGTGCTGGCCTCAGAACTGAACCTCAACCACACTTGCACCATAAATAAACCCAAAAGGAAAGAAAAACCTTTCAAGTTTCCTGAGTCATGGGATTCCCAGTCCTCTAAGAGAGTCACTCCAACTCTCCAAGCAAAGAATCACTCCACAGAAGCCACCAAAGCTATTACCACCAAAAAGGACATAGAAGGGGCCATGGCACCCAGCAACCTCACTGTGAACATCCTGACCACATCCACCCATCCTGAGATGTCATCTTGGCTCCTGTGTGAAGTATCTGGCTTCTTCCCCGAAAATATCCACCTCATGTGGCTGAGTGTCCACAGTAAAATGAAGTCTACAAACTTTGTCACTGCAAACCCCACCCCCCAGCCTGGGGGCACATTCCAGACCTGGAGTGTCCTGAGACTACCAGTCGCTCTGAGCTCATCACTTGACACTTACACATGTGTGGTGGAACATGAGGCCTCAAAGACAAAGCTTAATGCCAGCAAGAGCCTAGCAATTAGTGGATGCTACCACCTCCTGCCTGAGTCAGACGGTCCTTCCAGGAGACCTGATGGTCCTGCCCTTGCC
## 3 ATGGGATGGAGCTGGATCTTTCTCTTTCTCCTGTCAGGAACTGCAGGTGTCCTCTCTGAGGTCCAGCTGCAACAATCTGGACCTGAGCTGGTGAAGCCTGGGGCTTCAGTGAAGATATCCTGTAAGGCTTCTGGATACACGTTCACTGACTACTACATGAACTGGGTGAAGCAGAGCCATGGAAAGAGCCTTGAGTGGATTGGAGATATTAATCCTAACAATGGTGGTACTAGCTACAACCAGAAGTTCAAGGGCAAGGCCACATTGACTGTAGACAAGTCCTCCAGCACAGCCTACATGGAGCTCCGCAGCCTGACATCTGAGGACTCTGCAGTCTATTACTGTGCAAGAATTACTATGCTATGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCAGGTAATGAAAAGGGACCTGACATGTTCCTCCTCTCAGAGTGCAAAGCCCCAGAGGAAAATGAAAAGATAAACCTGGGCTGTTTAGTAATTGGAAGTCAGCCACTGAAAATCAGCTGGGAGCCAAAGAAGTCAAGTATAGTTGAACATGTCTTCCCCTCTGAAATGAGAAATGGCAATTATACAATGGTCCTCCAGGTCACTGTGCTGGCCTCAGAACTGAACCTCAACCACACTTGCACCATAAATAAACCCAAAAGGAAAGAAAAACCTTTCAAGTTTCCTGAGTCATGGGATTCCCAGTCCTCTAAGAGAGTCACTCCAACTCTCCAAGCAAAGAATCACTCCACAGAAGCCACCAAAGCTATTACCACCAAAAAGGACATAGAAGGGGCCATGGCACCCAGCAACCTCACTGTGAACATCCTGACCACATCCACCCATCCTGAGATGTCATCTTGGCTCCTGTGTGAAGTATCTGGCTTCTTCCCCGAAAATATCCACCTCATGTGGCTGAGTGTCCACAGTAAAATGAAGTCTACAAACTTTGTCACTGCAAACCCCACCCCCCAGCCTGGGGGCACATTCCAGACCTGGAGTGTCCTGAGACTACCAGTCGCTCTGAGCTCATCACTTGACACTTACACATGTGTGGTGGAACATGAGGCCTCAAAGACAAAGCTTAATGCCAGCAAGAGCCTAGCAATTAGTGGATGCTACCACCTCCTGCCTGAGTCAGACGGTCCTTCCAGGAGACCTGATGGTCCTGCCCTTGCC
## 4 ATGAAATGCAGCTGGGTCATCTTCTTCCTGATGGCAGTGGTTACAGGGGTCAATTCAGAGGTTCAGCTGCAGCAGTCTGGGGCAGAGCTTGTGAGGCCAGGGGCCTCAGTCAAGTTGTCCTGCACAGCTTCTGGCTTCAACATTAAAGACTACTATATGCACTGGGTGAAGCAGAGGCCTGAACAGGGCCTGGAGTGGATTGGAAGGATTGATCCTGAGGATGGTGATACTGAATATGCCCCGAAGTTCCAGGGCAAGGCCACTATGACTGCAGACACATCCTCCAACACAGCCTACCTGCAGCTCAGCAGCCTGACATCTGAGGACACTGCCGTCTATTACTGTACTACAATTACTATGCTATGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCAGGTAATGAAAAGGGACCTGACATGTTCCTCCTCTCAGAGTGCAAAGCCCCAGAGGAAAATGAAAAGATAAACCTGGGCTGTTTAGTAATTGGAAGTCAGCCACTGAAAATCAGCTGGGAGCCAAAGAAGTCAAGTATAGTTGAACATGTCTTCCCCTCTGAAATGAGAAATGGCAATTATACAATGGTCCTCCAGGTCACTGTGCTGGCCTCAGAACTGAACCTCAACCACACTTGCACCATAAATAAACCCAAAAGGAAAGAAAAACCTTTCAAGTTTCCTGAGTCATGGGATTCCCAGTCCTCTAAGAGAGTCACTCCAACTCTCCAAGCAAAGAATCACTCCACAGAAGCCACCAAAGCTATTACCACCAAAAAGGACATAGAAGGGGCCATGGCACCCAGCAACCTCACTGTGAACATCCTGACCACATCCACCCATCCTGAGATGTCATCTTGGCTCCTGTGTGAAGTATCTGGCTTCTTCCCCGAAAATATCCACCTCATGTGGCTGAGTGTCCACAGTAAAATGAAGTCTACAAACTTTGTCACTGCAAACCCCACCCCCCAGCCTGGGGGCACATTCCAGACCTGGAGTGTCCTGAGACTACCAGTCGCTCTGAGCTCATCACTTGACACTTACACATGTGTGGTGGAACATGAGGCCTCAAAGACAAAGCTTAATGCCAGCAAGAGCCTAGCAATTAGTGGATGCTACCACCTCCTGCCTGAGTCAGACGGTCCTTCCAGGAGACCTGATGGTCCTGCCCTTGCC
## 5 TGGATTCCCAGGTCCTCACATTCAGTGATCAGCACTGAACACAGACCACTCACCATGGACTCCAGGCTCAATTTAGTTTTCCTTGTCCTTATTTTAAAAGGTGTCCAGTGTGAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTAGTGAAGCCTGGAGGGTCCCTGAAACTCTCCTGTGCAGCCTCTGGATTCACTTTCAGTGACTATGGAATGCACTGGGTTCGTCAGGCTCCAGAGAAGGGGCTGGAGTGGGTTGCATACATTAGTAGTGGCAGTAGTACCATCTACTATGCAGACACAGTGAAGGGCCGATTCACCATCTCCAGAGACAATGCCAAGAACACCCTGTTCCTGCAAATGACCAGTCTGAGGTCTGAGGACACGGCCATGTATTACTGTGCAAGGACTACTTTGACTACTGGGGCCAAGGCACCACTCTCACAGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT
## 6 ATGGAATGGACCTGGGTCTTTCTCTTCCTCCTGTCAGTAACTGCAGGTGTCCACTCCCAGGTTCAGCTGCAGCAGTCTGGAGCTGAGCTGATGAAGCCTGGGGCCTCAGTGAAGCTTTCCTGCAAGGCTACTGGCTACACATTCACTGGCTACTGGATAGAGTGGGTAAAGCAGAGGCCTGGACATGGCCTTGAGTGGATTGGAGAGATTTTACCTGGAAGTGGTAGTACTAACTACAATGAGAAGTTCAAGGGCAAGGCCACATTCACTGCAGATACATCCTCCAACACAGCCTACATGCAACTCAGCAGCCTGACAACTGAGGACTCTGCCATCTATTACTGTGCAAGACCTACTATAGTAACTACCCTGGTTTGCTTACTGGGGCCAAGGGACTCTGGTCACTGTCTCTGCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT
## VJ_raw_ref
## 1 ATGATGTCCTCTGCTCAGTTCCTTGGTCTCCTGTTGCTCTGTTTTCAAGGTACCAGATGTGATATCCAGATGACACAGACTACATCCTCCCTGTCTGCCTCTCTGGGAGACAGAGTCACCATCAGTTGCAGGGCAAGTCAGGACATTAGCAATTATTTAAACTGGTATCAGCAGAAACCAGATGGAACTGTTAAACTCCTGATCTACTACACATCAAGATTACACTCAGGAGTCCCATCAAGGTTCAGTGGCAGTGGGTCTGGAACAGATTATTCTCTCACCATTAGCAACCTGGAGCAAGAAGATATTGCCACTTACTTTTGCCAACAGGGTAATACGCTTCCTCCGTGGACGTTCGGTGGAGGCACCAAGCTGGAAATCAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGAACAACTTCTACCCCAAAGACATCAATGTCAAGTGGAAGATTGATGGCAGTGAACGACAAAATGGCGTCCTGAACAGTTGGACTGATCAGGACAGCAAAGACAGCACCTACAGCATGAGCAGCACCCTCACGTTGACCAAGGACGAGTATGAACGACATAACAGCTATACCTGTGAGGCCACTCACAAGACATCAACTTCACCCATTGTCAAGAGCTTCAACAGGAATGAGTGT
## 2 GGCAGGCAAGGGCATCAAGATGAAGTCACAGACCCAGGTCTTCGTATTTCTACTGCTCTGTGTGTCTGGTGCTCATGGGAGTATTGTGATGACCCAGACTCCCAAATTCCTGCTTGTATCAGCAGGAGACAGGGTTACCATAACCTGCAAGGCCAGTCAGAGTGTGAGTAATGATGTAGCTTGGTACCAACAGAAGCCAGGGCAGTCTCCTAAACTGCTGATATACTATGCATCCAATCGCTACACTGGAGTCCCTGATCGCTTCACTGGCAGTGGATATGGGACGGATTTCACTTTCACCATCAGCACTGTGCAGGCTGAAGACCTGGCAGTTTATTTCTGTCAGCAGGATTATAGCTCTCCTCCGTGGACGTTCGGTGGAGGCACCAAGCTGGAAATCAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGAACAACTTCTACCCCAAAGACATCAATGTCAAGTGGAAGATTGATGGCAGTGAACGACAAAATGGCGTCCTGAACAGTTGGACTGATCAGGACAGCAAAGACAGCACCTACAGCATGAGCAGCACCCTCACGTTGACCAAGGACGAGTATGAACGACATAACAGCTATACCTGTGAGGCCACTCACAAGACATCAACTTCACCCATTGTCAAGAGCTTCAACAGGAATGAGTGT
## 3 ATGAGACCGTCTATTCAGTTCCTGGGGCTCTTGTTGTTCTGGCTTCATGGTGCTCAGTGTGACATCCAGATGACACAGTCTCCATCCTCACTGTCTGCATCTCTGGGAGGCAAAGTCACCATCACTTGCAAGGCAAGCCAAGACATTAACAAGTATATAGCTTGGTACCAACACAAGCCTGGAAAAGGTCCTAGGCTGCTCATACATTACACATCTACATTACAGCCAGGCATCCCATCAAGGTTCAGTGGAAGTGGGTCTGGGAGAGATTATTCCTTCAGCATCAGCAACCTGGAGCCTGAAGATATTGCAACTTATTATTGTCTACAGTATGATAATCTTCTACCTGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGAACAACTTCTACCCCAAAGACATCAATGTCAAGTGGAAGATTGATGGCAGTGAACGACAAAATGGCGTCCTGAACAGTTGGACTGATCAGGACAGCAAAGACAGCACCTACAGCATGAGCAGCACCCTCACGTTGACCAAGGACGAGTATGAACGACATAACAGCTATACCTGTGAGGCCACTCACAAGACATCAACTTCACCCATTGTCAAGAGCTTCAACAGGAATGAGTGT
## 4 ATGGAATCACAGACCCAGGTCCTCATGTTTCTTCTGCTCTGGGTATCTGGTGCCTGTGCAGACATTGTGATGACACAGTCTCCATCCTCCCTGGCTATGTCAGTAGGACAGAAGGTCACTATGAGCTGCAAGTCCAGTCAGAGCCTTTTAAATAGTAGCAATCAAAAGAACTATTTGGCCTGGTACCAGCAGAAACCAGGACAGTCTCCTAAACTTCTGGTATACTTTGCATCCACTAGGGAATCTGGGGTCCCTGATCGCTTCATAGGCAGTGGATCTGGGACAGATTTCACTCTTACCATCAGCAGTGTGCAGGCTGAAGACCTGGCAGATTACTTCTGTCAGCAACATTATAGCACTCCTCCTGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGAACAACTTCTACCCCAAAGACATCAATGTCAAGTGGAAGATTGATGGCAGTGAACGACAAAATGGCGTCCTGAACAGTTGGACTGATCAGGACAGCAAAGACAGCACCTACAGCATGAGCAGCACCCTCACGTTGACCAAGGACGAGTATGAACGACATAACAGCTATACCTGTGAGGCCACTCACAAGACATCAACTTCACCCATTGTCAAGAGCTTCAACAGGAATGAGTGT
## 5 ATGAAGTTGCCTGTTAGGCTGTTGGTGCTGATGTTCTGGATTCCTGCTTCCAGCAGTGATGTTGTGATGACCCAAACTCCACTCTCCCTGCCTGTCAGTCTTGGAGATCAAGCCTCCATCTCTTGCAGATCTAGTCAGAGCCTTGTACACAGTAATGGAAACACCTATTTACATTGGTACCTGCAGAAGCCAGGCCAGTCTCCAAAGCTCCTGATCTACAAAGTTTCCAACCGATTTTCTGGGGTCCCAGACAGGTTCAGTGGCAGTGGATCAGGGACAGATTTCACACTCAAGATCAGCAGAGTGGAGGCTGAGGATCTGGGAGTTTATTTCTGCTCTCAAAGTACACATGTTCCTCCGTGGACGTTCGGTGGAGGCACCAAGCTGGAAATCAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGAACAACTTCTACCCCAAAGACATCAATGTCAAGTGGAAGATTGATGGCAGTGAACGACAAAATGGCGTCCTGAACAGTTGGACTGATCAGGACAGCAAAGACAGCACCTACAGCATGAGCAGCACCCTCACGTTGACCAAGGACGAGTATGAACGACATAACAGCTATACCTGTGAGGCCACTCACAAGACATCAACTTCACCCATTGTCAAGAGCTTCAACAGGAATGAGTGT
## 6 ATGGATTTTCAAGTGCAGATTTTCAGCTTCCTGCTAATCAGTGCCTCAGTCATACTGTCCAGAGGACAAATTGTTCTCACCCAGTCTCCAGCAATCATGTCTGCATCTCCAGGGGAGAAGGTCACCATGACCTGCAGTGCCAGCTCAAGTGTAAGTTACATGTACTGGTACCAGCAGAAGCCAGGATCCTCCCCCAGACTCCTGATTTATGACACATCCAACCTGGCTTCTGGAGTCCCTGTTCGCTTCAGTGGCAGTGGGTCTGGGACCTCTTACTCTCTCACAATCAGCCGAATGGAGGCTGAAGATGCTGCCACTTATTACTGCCAGCAGTGGAGTAGTTACCCACCCATGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGAACAACTTCTACCCCAAAGACATCAATGTCAAGTGGAAGATTGATGGCAGTGAACGACAAAATGGCGTCCTGAACAGTTGGACTGATCAGGACAGCAAAGACAGCACCTACAGCATGAGCAGCACCCTCACGTTGACCAAGGACGAGTATGAACGACATAACAGCTATACCTGTGAGGCCACTCACAAGACATCAACTTCACCCATTGTCAAGAGCTTCAACAGGAATGAGTGT
## VDJ_trimmed_ref VJ_trimmed_ref VDJ_raw_consensus_id
## 1 clonotype7_concat_ref_1
## 2 clonotype58_concat_ref_1
## 3 clonotype65_concat_ref_1
## 4 clonotype43_concat_ref_1
## 5 clonotype33_concat_ref_1
## 6 clonotype49_concat_ref_1
## VJ_raw_consensus_id orig_barcode specifity affinity GEX_available
## 1 clonotype7_concat_ref_2 AAACGGGGTTTAGGAA NA NA FALSE
## 2 clonotype58_concat_ref_2 AAGGCAGTCTCTGCTG NA NA TRUE
## 3 clonotype65_concat_ref_2 ACAGCTACAGTCGTGC NA NA TRUE
## 4 clonotype43_concat_ref_2 ACTTGTTGTACTTAGC NA NA TRUE
## 5 clonotype33_concat_ref_2 AGAGCTTGTCATGCAT NA NA TRUE
## 6 clonotype49_concat_ref_2 AGAGTGGAGGAGTCTG NA NA TRUE
## orig.ident seurat_clusters PC_1 PC_2 UMAP_1 UMAP_2
## 1 <NA> <NA> NA NA NA NA
## 2 SeuratProject 5 -6.257892 -2.522445 -4.813108 0.6205567
## 3 SeuratProject 0 -6.375512 -5.164284 -6.385532 1.9866081
## 4 SeuratProject 0 -7.476580 -4.585612 -7.730432 0.4828904
## 5 SeuratProject 0 -6.710098 -6.034031 -6.045865 3.3426221
## 6 SeuratProject 0 -6.667460 -2.615907 -5.995199 0.3335993
## tSNE_1 tSNE_2 batches
## 1 NA NA Unspecified
## 2 2.188638 -3.411929 Unspecified
## 3 6.511564 -7.350143 Unspecified
## 4 2.640636 -11.282818 Unspecified
## 5 9.943012 -6.504101 Unspecified
## 6 1.924303 -6.548645 Unspecified
The GEX object vgm[[2]] is a Seurat object. Metadata can be accessed as shown below. Depending on integration parameters (Section 2.3), the GEX object can also contain information from the immune receptor VDJ data.
## [1] "orig.ident" "nCount_RNA"
## [3] "nFeature_RNA" "orig_barcode"
## [5] "VDJ_available" "sample_id"
## [7] "group_id" "percent.mt"
## [9] "RNA_snn_res.0.5" "seurat_clusters"
## [11] "clonotype_id_10x" "clonotype_id"
## [13] "clonotype_frequency" "celltype"
## [15] "Nr_of_VDJ_chains" "Nr_of_VJ_chains"
## [17] "VDJ_cdr3s_aa" "VJ_cdr3s_aa"
## [19] "VDJ_cdr3s_nt" "VJ_cdr3s_nt"
## [21] "VDJ_umis" "VJ_umis"
## [23] "VDJ_chain_contig" "VJ_chain_contig"
## [25] "VDJ_chain" "VJ_chain"
## [27] "VDJ_vgene" "VJ_vgene"
## [29] "VDJ_dgene" "VDJ_jgene"
## [31] "VJ_jgene" "VDJ_cgene"
## [33] "VJ_cgene" "VDJ_sequence_nt_raw"
## [35] "VJ_sequence_nt_raw" "VDJ_sequence_nt_trimmed"
## [37] "VJ_sequence_nt_trimmed" "VDJ_sequence_aa"
## [39] "VJ_sequence_aa" "VDJ_raw_ref"
## [41] "VJ_raw_ref" "VDJ_trimmed_ref"
## [43] "VJ_trimmed_ref" "VDJ_raw_consensus_id"
## [45] "VJ_raw_consensus_id" "specifity"
## [47] "affinity" "PC_1"
## [49] "PC_2" "UMAP_1"
## [51] "UMAP_2" "tSNE_1"
## [53] "tSNE_2" "batches"
VDJ.GEX.stats is a table containing statistics about the processed datasets. This is useful for QC. Many of the values in this dataframe are imported from the metrics.csv tables provided by Cellranger. In case these tables are not available, the output will contain NA values.
The generation of this table can be toggled by setting get.VDJ.stats = F
names(vgm[[3]])
## [1] "Repertoir path"
## [2] "Sample name"
## [3] "Nr unique barcodes"
## [4] "Nr barcodes is_cell"
## [5] "Nr cells 1VDJ 1VJ"
## [6] "Nr cells 1VDJ 0VJ"
## [7] "Nr cells 0VDJ 1VJ"
## [8] "Nr cells 2 or more VDJ 1VJ"
## [9] "Nr cells 1VDJ 2 or more VJ"
## [10] "Nr cells 2 or more VDJ 2 or more VJ"
## [11] "Nr cells full_length"
## [12] "Nr cells productive"
## [13] "Nr cells high_confidence"
## [14] "Nr cells all true"
## [15] "Nr cells all true and 1VDJ 1VJ"
## [16] "Nr clonotypes"
## [17] "Nr clonotypes 1VDJ 1VJ"
## [18] "Nr clonotypes < 1VDJ 1VJ"
## [19] "Nr clonotypes > 1VDJ 1VJ"
## [20] "% Nr unique barcodes"
## [21] "% Nr barcodes is_cell"
## [22] "% Nr cells 1VDJ 1VJ"
## [23] "% Nr cells 1VDJ 0VJ"
## [24] "% Nr cells 0VDJ 1VJ"
## [25] "% Nr cells 2 or more VDJ 1VJ"
## [26] "% Nr cells 1VDJ 2 or more VJ"
## [27] "% Nr cells 2 or more VDJ 2 or more VJ"
## [28] "% Nr cells full_length"
## [29] "% Nr cells productive"
## [30] "% Nr cells high_confidence"
## [31] "% Nr cells all true"
## [32] "% Nr cells all true and 1VDJ 1VJ"
## [33] "% Nr clonotypes"
## [34] "% Nr clonotypes 1VDJ 1VJ"
## [35] "% Nr clonotypes < 1VDJ 1VJ"
## [36] "% Nr clonotypes > 1VDJ 1VJ"
## [37] "Estimated.Number.of.Cells"
## [38] "Mean.Read.Pairs.per.Cell"
## [39] "Number.of.Cells.With.Productive.V.J.Spanning.Pair"
## [40] "Number.of.Read.Pairs"
## [41] "Valid.Barcodes"
## [42] "Q30.Bases.in.Barcode"
## [43] "Q30.Bases.in.RNA.Read.1"
## [44] "Q30.Bases.in.RNA.Read.2"
## [45] "Q30.Bases.in.UMI"
## [46] "Reads.Mapped.to.Any.V.D.J.Gene"
## [47] "Reads.Mapped.to.IGH"
## [48] "Reads.Mapped.to.IGK"
## [49] "Reads.Mapped.to.IGL"
## [50] "Mean.Used.Read.Pairs.per.Cell"
## [51] "Fraction.Reads.in.Cells"
## [52] "Median.IGH.UMIs.per.Cell"
## [53] "Median.IGK.UMIs.per.Cell"
## [54] "Median.IGL.UMIs.per.Cell"
## [55] "Cells.With.Productive.V.J.Spanning.Pair"
## [56] "Cells.With.Productive.V.J.Spanning..IGK..IGH..Pair"
## [57] "Cells.With.Productive.V.J.Spanning..IGL..IGH..Pair"
## [58] "Paired.Clonotype.Diversity"
## [59] "Cells.With.IGH.Contig"
## [60] "Cells.With.IGK.Contig"
## [61] "Cells.With.IGL.Contig"
## [62] "Cells.With.CDR3.annotated.IGH.Contig"
## [63] "Cells.With.CDR3.annotated.IGK.Contig"
## [64] "Cells.With.CDR3.annotated.IGL.Contig"
## [65] "Cells.With.V.J.Spanning.IGH.Contig"
## [66] "Cells.With.V.J.Spanning.IGK.Contig"
## [67] "Cells.With.V.J.Spanning.IGL.Contig"
## [68] "Cells.With.Productive.IGH.Contig"
## [69] "Cells.With.Productive.IGK.Contig"
## [70] "Cells.With.Productive.IGL.Contig"
## [71] "rep_id"
vgm <- VDJ_GEX_matrix(VDJ.out.directory.list = VDJ.out.directory.list,
GEX.out.directory.list = GEX.out.directory.list,
get.VDJ.stats = F) #Turn off VDJ stats
The VGM also stores the parameter used during function call in case a user saves a VGM but deletes/overwrites the initial code….although this should not happen…right?
When the VGM is called under default parameters, the function input arguments can be located in the fourth list element.
vgm[[4]]
## sample.path.vdj
## "C:/Users/PlatypusDB/yermanos2021b__VDJ_RAW/Aged.CNS.pool.3m.Bcell.S1 ; C:/Users/PlatypusDB/yermanos2021b__VDJ_RAW/Aged.CNS.pool.12m.Bcell.S2"
## samples.paths.GEX
## "C:/Users/PlatypusDB/yermanos2021b__GEX_RAW/Aged.CNS.pool.3m.Bcell.S1 ; C:/Users/PlatypusDB/yermanos2021b__GEX_RAW/Aged.CNS.pool.12m.Bcell.S2"
## FB.out.directory.list
## "none"
## GEX.read.h5
## "FALSE"
## VDJ.combine
## "TRUE"
## GEX.integrate
## "TRUE"
## integrate.GEX.to.VDJ
## "TRUE"
## integrate.VDJ.to.GEX
## "TRUE"
## exclude.GEX.not.in.VDJ
## "FALSE"
## filter.overlapping.barcodes.GEX
## "TRUE"
## filter.overlapping.barcodes.VDJ
## "TRUE"
## exclude.on.cell.state.markers
## "none"
## exclude.on.barcodes (TRUE if barcodes provided)
## "TRUE"
## get.VDJ.stats
## "TRUE"
## numcores
## "1"
## trim.and.align
## "FALSE"
## append.raw.reference
## "TRUE"
## select.excess.chains.by.umi.count
## "FALSE"
## excess.chain.confidence.count.threshold
## "1000"
## gap.opening.cost,
## "10"
## gap.extension.cost
## "4"
## parallel.processing
## "none"
## integration.method
## "scale.data"
## VDJ.gene.filter
## "TRUE"
## mito.filter
## "20"
## norm.scale.factor
## "10000"
## n.feature.rna
## "0"
## n.count.rna.min
## "0"
## n.count.rna.max
## "Inf"
## n.variable.features
## "2000"
## cluster.resolution
## "0.5"
## neighbor.dim
## "1;2;3;4;5;6;7;8;9;10"
## mds.dim
## "1;2;3;4;5;6;7;8;9;10"
## subsample.barcodes
## "FALSE"
## group.id
## "1;2"
## FB.count.threshold
## "10"
## FB.ratio.threshold
## "2"
The fifth element of the VGM contains the utils::sessionInfo() output to record the versions of R and accompanying packages used during the VGM creation.
class(vgm[[5]])
## [1] "sessionInfo"
A key feature of Platypus is the direct pairing of VDJ and GEX data.
This is currently achieved by VGM combining the relevant data and
metadata from VDJ (vgm[[1]]) and GEX (vgm[[2]]) objects using the cell
barcode and sample_id information.
Several parameters control this integration:
#Running the VDJ_GEX_matrix function
vgm <- VDJ_GEX_matrix(VDJ.out.directory.list = VDJ.out.directory.list,
GEX.out.directory.list = GEX.out.directory.list,
VDJ.combine = T, #Whether to combine all samples into one VDJ dataframe (is highly recommended)
GEX.integrate = T, #Whether to integrate all GEX samples. For integration methods see GEX section.
integrate.GEX.to.VDJ = T, #Whether to copy GEX metadata into VDJdataframe
integrate.VDJ.to.GEX = T, #Whether to copy VDJ data into GEX
exclude.GEX.not.in.VDJ = F) #Whether to exclude cells in GEX, for which no VDJ data is available. Set this to TRUE if you only want gene expression information for those cells with immune receptor sequences.
In some cases, cell barcodes may not be unique across samples. This may occur by pure chance, barcode hopping during library construction and sequencing, or due to low diversity of barcodes during capture. The VGM deals with this in two ways. Firstly, a sample-id prefix is appended to every barcode.
vgm[[1]]$barcode[1]
## [1] "s1_AAACGGGGTTTAGGAA"
colnames(vgm[[2]])[1]
## [1] "s1_AAAGATGAGTCCGGTC"
Second, the duplicated barcodes can be filtered out to prevent the emergence of unlikely public clones. This is necessary, for example, if a public clone is discovered in two distinct VDJ samples and have the exact same cell barcode. Public clones containing identical cell barcodes are highly unlikely given the massive potential barcode space. If this filtering is set to TRUE, the function will prompt a callback with the number of filtered cells.
#Running the VDJ_GEX_matrix function
vgm <- VDJ_GEX_matrix(VDJ.out.directory.list = VDJ.out.directory.list,
GEX.out.directory.list = GEX.out.directory.list,
filter.overlapping.barcodes.GEX = T,
filter.overlapping.barcodes.VDJ = T)
The VGM function attempts to simplify the standard GEX processing and integration functions common to Seurat and Harmony packages, given the majority of immunological studies use this pipeline. Although we are not making the statement that all gene expression datasets should be processed using identical parameters, we find that this function simplifies the standard copy-paste from the Seurat website.
#Running the VDJ_GEX_matrix function
vgm <- VDJ_GEX_matrix(GEX.out.directory.list = GEX.out.directory.list,
VDJ.gene.filter = T, #Remove all VDJ genes from GEX before clustering.
mito.filter = 20, #Remove all cells with a higher % of reads mapped to Mitochondiral genes. Data gathered via Seurat::PercentageFeatureSet(., pattern = "^MT-")
n.count.rna.min = 0, #Remove all cells with a total RNA count below this
n.count.rna.max = Inf, #Remove all cells with a total RNA count above this
n.feature.rna = 0) #Remove all cells with a gene count lower than this
The default settings are meant to be inclusive, so to not impose filtering to on the dataset which is not directly apparent to the user.
The VGM also offers the option to filter cells based on their gene expression profiles. Removing unwanted cells before clustering can result in more accurate conclusions concerning the cell types of interest. For example, it is likely better to filter out non B and T cells if integrating and analyzing repertoire features such as clonal expansion.
The input format for the exclude.on.cell.state.markers argument is the same as to the GEX_phenotype function. In the example below, we filter out all CD14 positive cells as well as all CD3 epsilon and gamma double negative cells.
#Running the VDJ_GEX_matrix function
vgm <- VDJ_GEX_matrix(GEX.out.directory.list = GEX.out.directory.list,
exclude.on.cell.state.markers = c("CD14+", "CD3E-;CD3G-")) #Remove all cells with a total RNA count above this
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at -2.091
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 0.49627
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 2.8445e-015
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 0.090619
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 118
## Number of edges: 3533
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.6521
## Number of communities: 3
## Elapsed time: 0 seconds
#Plotting this confirms the filtering
Seurat::FeaturePlot(vgm[[2]], c("CD14","CD3E","CD3G"))
## Warning in Seurat::FeaturePlot(vgm[[2]], c("CD14", "CD3E", "CD3G")): All cells
## have the same value (0) of CD14.
As a workflow we first recommend clustering all cells and running a differential gene expression analysis by cluster using the GEX_cluster_genes() function, which reveals the cluster-defining genes of each cluster. In case that unwanted cells are present (e.g. a contamination of Neutrophils in a B cell dataset), the cluster signatures can be used to identify the best genes for filtering via the initial VGM call.
The VGM offers four methods of GEX dataset integration if GEX.integrate is set to TRUE:
1.”scale.data” integration is based on Seurat logNormalize followed by the ScaleData function based on found variable features (set using n.variable.features)
“anchors” is an integration method which uses similar cell states to align to datasets. Extensive documentation is provided here: https://satijalab.org/seurat/articles/integration_introduction.html
“sct” employs the SCTransform function from Seurat
“harmony” uses the Bioconductor package Harmony. This may be better for larger datasets given the runtime and required memory of anchors and sct.
#Running the VDJ_GEX_matrix function
vgm <- VDJ_GEX_matrix(GEX.out.directory.list = GEX.out.directory.list,
GEX.integrate = T,
integration.method = "scale.data") #Default
BiocManager::install("harmony")
vgm <- VDJ_GEX_matrix(GEX.out.directory.list = GEX.out.directory.list,
GEX.integrate = T,
integration.method = "harmony")
Indipendently of integration method, GEX data is scaled and variable features and PCA dimensions are used to calculate low dimensional embedding Three parameters control these:
#Running the VDJ_GEX_matrix function
vgm <- VDJ_GEX_matrix(GEX.out.directory.list = GEX.out.directory.list,
GEX.integrate = T,
integration.method = "scale.data", #Default
norm.scale.factor = 10000, #passed to Seurat::NormalizeData
n.variable.features = 2000, #Passed to Seurat::FindVariableFeatures
cluster.resolution = 0.5, #Passed to Seurat::FindClusters
neighbor.dim = c(1:10), #Passed to Seurat::FindNeighbors
mds.dim = c(1:10)) #Passed to Seurat::RunTSNE and Seurat::RunUMAP
The VGM offers simple and flexible GEX processing, but remains one of many options for GEX processing. We therefore made it possible to use the VDJ processing and VDJ-GEX integration capabilities of the VGM function with an already processed Seurat object.
For a Seurat object to be compatible as input, it must contain two metadata columns: 1. sample_id with sample ids from s1,s2,s3 to sn of character or factor class. These must be in the same order as VDJ.out.directory list elements. 2. group_id of character class.
This input Seurat object will not be processed concerning normalisation or dimensional embeddings. Nonetheless the following filtering operations are still available, as shown below
#Running the VDJ_GEX_matrix function
<- VDJ_GEX_matrix(Seurat.in = preprocessed_GEX,
vgm exclude.GEX.not.in.VDJ = F,
integrate.GEX.to.VDJ = T,
integrate.VDJ.to.GEX = T,
filter.overlapping.barcodes.GEX = T,
exclude.on.cell.state.markers = c("CD3E+")
GEX.integrate = T)
Platypus supports feature barcode technology (also referred to as hashing barcodes), but at this time does not support CITE-seq data. (This is on the Platypus-Team To-Do list)
Feature Barcode (FB) data may be imported in two different modes, depending on Cellranger processing proceedures.
FB data processed indipendently of GEX data via Cellranger count will yield an output folder structure which is identical to GEX. These output directories can be fed into the VGM using the FB.out.directory.list
FB data can also be processed in combination with GEX using Cellranger multi and aggr. This yields a single folder structure with both GEX and FB matrices contained within the same output files. In this case, the function will determine the input type (GEX of FB) of each matrix based on the numbers of Features. Any matrix below 100 features is regarded FB and every matrix above that as GEX.
FB.out.directory.list <-
list("~path_to_CellrangerCount_outs_directory",
"~path_to_CellrangerCount_outs_directory")
#Running with separate FB input
vgm <- VDJ_GEX_matrix(GEX.out.directory.list = GEX.out.directory.list,
FB.out.directory.list = FB.out.directory.list)
#Running with FB GEX combined input
vgm <- VDJ_GEX_matrix(GEX.out.directory.list = GEX.cellranger.aggr.out.directory.list)
The concept of feature barcodes lies in the attribution of a certain sample or group barcode to a cell given the number of sequenced counts of that barcode. While this does work well in most cases, FB data can be noisy and a subset of cells may be difficult to confidently attribute to a single barcode. The VGM assignes FBs to cells by two criteria:
FB.count.threshold determines how many counts for any FB are neccessary to be considered. This defaults to 10. For example, in case a cell has counts < 10 for all FBs, no single FB will be assigned (Function returns “Not assignable”).
FB.ratio.threshold determines the minimum ratio between the most frequent and second most frequent FB for the most frequent to be confidently assigned. This defaults to 2. For clarity we can consider the following example
barcode | FB-1 | FB-2 | FB-3 |
---|---|---|---|
Cell 1 | 3 | 4 | 9 |
Cell 2 | 1 | 32 | 43 |
Cell 3 | 100 | 1 | 13 |
For Cell 1: 9/4 > 2, so FB-3 meets the FB.ratio.threshold. But: 9 < 10 so FB-3 does not meet the FB.count.threshold. For this cell the function returns “Not assignable”
For Cell 2: 43 and 32 > 10 so both FB-2 and FB-3 meet the FB.count.threshold. But: 43/32 < 2 so FB-3 does not meet the FB.ratio.threshold. Again the function returns “Not assignable”
For Cell 3: 100 > 10 and 100/32 > 2 so FB-1 meets both criteria. The function returns “FB-1”
Tweaking these parameters can help to make barcode assignments more inclusive, but also more susceptible to false assignments.
As a QC, we recommend verifying that variability in FB coverage across libraries and samples is consistent and that FB assignments match with expected numbers from e.g. pre-sorting by FACS.
#Running with separate FB input
vgm <- VDJ_GEX_matrix(GEX.out.directory.list = GEX.out.directory.list,
FB.out.directory.list = ,
FB.count.threshold = 10,
FB.ratio.threshold = 2)
#Running with FB GEX combined input
vgm <- VDJ_GEX_matrix(GEX.out.directory.list = GEX.cellranger.aggr.out.directory.list)
In many cases, hashing barcodes are being combined with CITE-seq or other surface barcodes. For processing FB barcodes, all other barcodes need to be filtered out. For this the VGM allows excluding Feature barcodes by their names and a regex expression. In the example below we are filtering out all FBs that have “CITE” or “TetTCR” in their name.
#Running with separate FB input
vgm <- VDJ_GEX_matrix(GEX.out.directory.list = GEX.out.directory.list,
FB.out.directory.list = ,
FB.exclude.pattern = "(CITE|TetTCR)")
#Running with FB GEX combined input
vgm <- VDJ_GEX_matrix(GEX.out.directory.list = GEX.cellranger.aggr.out.directory.list)
The VGM allows to reformat and merge several dataframes from Cellranger outputs and additionally return aligned and trimmed receptor sequences.
#Basic run with VDJ only
vgm <- VDJ_GEX_matrix(VDJ.out.directory.list = VDJ.out.directory.list)
Due to stochastical sampling, inter-cellular mRNA and biological peculiarities, a single cell barcode may be associated with one, two or more TCR or BCR contigs. A classical cell contains 1VDJ and 1VJ chain. Cells with only one chain are frequent, while chains with more than 2 are rare.
The VGM function and format is fully compatible with any combination of chains, without the need for cell filtering. Fields attributed to a missing chain will contain and empty string (““). In fields which contain information about 2 chains (e.g. VDJ_cdr3_aa in a cell with 2VDJ chains) different chains are separated by”;”
For filtering purposes two numeric columns containing the number of chains are included
## barcode sample_id group_id clonotype_id_10x
## s1_CCGTACTGTCCGACGT s1_CCGTACTGTCCGACGT s1 1 clonotype18
## s1_CGAGCCACAATGTAAG s1_CGAGCCACAATGTAAG s1 1 clonotype44
## s1_GAAACTCAGGAATCGC s1_GAAACTCAGGAATCGC s1 1 clonotype35
## s1_GTACTTTAGAGGGATA s1_GTACTTTAGAGGGATA s1 1 clonotype11
## s1_GTAGGCCGTCCCGACA s1_GTAGGCCGTCCCGACA s1 1 clonotype17
## s1_GTAGTCACAGTAAGAT s1_GTAGTCACAGTAAGAT s1 1 clonotype20
## clonotype_id clonotype_frequency celltype Nr_of_VDJ_chains
## s1_CCGTACTGTCCGACGT clonotype18 1 B cell 1
## s1_CGAGCCACAATGTAAG clonotype44 1 B cell 1
## s1_GAAACTCAGGAATCGC clonotype35 1 B cell 1
## s1_GTACTTTAGAGGGATA clonotype11 1 B cell 1
## s1_GTAGGCCGTCCCGACA clonotype17 1 B cell 1
## s1_GTAGTCACAGTAAGAT clonotype20 1 B cell 1
## Nr_of_VJ_chains VDJ_cdr3s_aa VJ_cdr3s_aa
## s1_CCGTACTGTCCGACGT 0 CARRNHPYYFDYW
## s1_CGAGCCACAATGTAAG 0 CARETAQVPYYFDYW
## s1_GAAACTCAGGAATCGC 0 CAIGHYYGSSSDVW
## s1_GTACTTTAGAGGGATA 0 CALYGSSYDYW
## s1_GTAGGCCGTCCCGACA 0 CVNGIYYYFDYW
## s1_GTAGTCACAGTAAGAT 0 CARDSSGWFAYW
## VDJ_cdr3s_nt VJ_cdr3s_nt
## s1_CCGTACTGTCCGACGT TGTGCAAGACGGAACCACCCCTACTACTTTGACTACTGG
## s1_CGAGCCACAATGTAAG TGTGCAAGAGAGACAGCTCAGGTTCCGTACTACTTTGACTACTGG
## s1_GAAACTCAGGAATCGC TGTGCAATAGGGCATTACTACGGTAGTAGCTCCGATGTCTGG
## s1_GTACTTTAGAGGGATA TGTGCCCTCTACGGTAGTAGCTACGACTACTGG
## s1_GTAGGCCGTCCCGACA TGTGTAAATGGGATTTATTACTACTTTGACTACTGG
## s1_GTAGTCACAGTAAGAT TGTGCAAGAGACAGCTCAGGCTGGTTTGCTTACTGG
## VDJ_umis VJ_umis VDJ_chain_contig
## s1_CCGTACTGTCCGACGT 2 CCGTACTGTCCGACGT-1_contig_1
## s1_CGAGCCACAATGTAAG 17 CGAGCCACAATGTAAG-1_contig_1
## s1_GAAACTCAGGAATCGC 12 GAAACTCAGGAATCGC-1_contig_1
## s1_GTACTTTAGAGGGATA 17 GTACTTTAGAGGGATA-1_contig_1
## s1_GTAGGCCGTCCCGACA 13 GTAGGCCGTCCCGACA-1_contig_1
## s1_GTAGTCACAGTAAGAT 14 GTAGTCACAGTAAGAT-1_contig_1
## VJ_chain_contig VDJ_chain VJ_chain VDJ_vgene VJ_vgene
## s1_CCGTACTGTCCGACGT IGH IGHV4-1
## s1_CGAGCCACAATGTAAG IGH IGHV1-26
## s1_GAAACTCAGGAATCGC IGH IGHV1-74
## s1_GTACTTTAGAGGGATA IGH IGHV2-3
## s1_GTAGGCCGTCCCGACA IGH IGHV9-1
## s1_GTAGTCACAGTAAGAT IGH IGHV1-39
## VDJ_dgene VDJ_jgene VJ_jgene VDJ_cgene VJ_cgene
## s1_CCGTACTGTCCGACGT IGHJ2 IGHM
## s1_CGAGCCACAATGTAAG IGHD3-2 IGHJ2 IGHM
## s1_GAAACTCAGGAATCGC IGHD1-1 IGHJ1 IGHM
## s1_GTACTTTAGAGGGATA IGHJ2 IGHM
## s1_GTAGGCCGTCCCGACA IGHJ2 IGHM
## s1_GTAGTCACAGTAAGAT IGHD3-2 IGHJ3 IGHM
## VDJ_sequence_nt_raw
## s1_CCGTACTGTCCGACGT GAAGCAAAGGGGATCAGCCCGAGATTCTCATTCAGTGATCAACACTGAACACACATCCCTTACCATGGATTTTGGGCTGATTTTTTTTATTGTTGCTCTTTTAAAAGGGGTCCAGTGTGAGGTGAAGCTTCTCCAGTCTGGAGGTGGCCTGGTGCAGCCTGGAGGATCCCTGAAACTCTCCTGTGCAGCCTCAGGAATCGATTTTAGTAGATACTGGATGAGTTGGGTTCGGCGGGCTCCAGGGAAAGGACTAGAATGGATTGGAGAAATTAATCCAGATAGCAGTACAATAAACTATGCACCATCTCTAAAGGATAAATTCATCATCTCCAGAGACAACGCCAAAAATACGCTGTACCTGCAAATGAGCAAAGTGAGATCTGAGGACACAGCCCTTTATTACTGTGCAAGACGGAACCACCCCTACTACTTTGACTACTGGGGCCAAGGCACCACTCTCACAGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA
## s1_CGAGCCACAATGTAAG TTGGGGACCCCTGAAAACAACATATGTACAATGTCCTCACCACAGACACTGAACACACTGACTCTAACCATGGGATGGAGCTGGATCTTTCTCTTTCTCCTGTCAGGAACTGCAGGTGTCCTCTCTGAGGTCCAGCTGCAACAATCTGGACCTGAGCTGGTGAAGCCTGGGGCTTCAGTGAAGATATCCTGTAAGGCTTCTGGATACACGTTCACTGACTACTACATGAACTGGGTGAAGCAGAGCCATGGAAAGAGCCTTGAGTGGATTGGAGATATTAATCCTAACAATGGTGGTACTAGCTACAACCAGAAGTTCAAGGGCAAGGCCACATTGACTGTAGACAAGTCCTCCAGCACAGCCTACATGGAGCTCCGCAGCCTGACATCTGAGGACTCTGCAGTCTATTACTGTGCAAGAGAGACAGCTCAGGTTCCGTACTACTTTGACTACTGGGGCCAAGGCACCACTCTCACAGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA
## s1_GAAACTCAGGAATCGC GTTACTCTGGAATCGCGTCAGACGTGTTTCTTTTTTGGGGAGAAAAACATGAGATCACTGTTCTCTCTACAGTTACTGAGCACACAGGACCTCACCATGAGATGGAGCTGTATCATCCTCTTCTTGGTAGCAACAGCTACAGGTGTCCACTCCCAGGTCCAACTGCAGCAGCCTGGGGCTGAACTGGTGAAGCCTGGGGCTTCAGTGAAGGTGTCCTGCAAGGCTTCTGGCTACACCTTCACCAGCTACTGGATGCACTGGGTGAAGCAGAGGCCTGGCCAAGGCCTTGAGTGGATTGGAAGGATTCATCCTTCTGATAGTGATACTAACTACAATCAAAAGTTCAAGGGCAAGGCCACATTGACTGTAGACAAATCCTCCAGCACAGCCTACATGCAGCTCAGCAGCCTGACATCTGAGGACTCTGCGGTCTATTACTGTGCAATAGGGCATTACTACGGTAGTAGCTCCGATGTCTGGGGCACAGGGACCACGGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA
## s1_GTACTTTAGAGGGATA CTAAAGGGGTTCTTATCTGGGGATCCTCTTCTCATAGAGCCTCCATCAGACCATGGCTGTCCTGGCACTGCTCCTCTGCCTGGTGACATTCCCAAGCTGTGTCCTGTCCCAGGTGCAGCTGAAGGAGTCAGGACCTGGCCTGGTGGCGCCCTCACAGAGCCTGTCCATCACATGCACTGTCTCAGGGTTCTCATTAACCAGCTATGGTGTAAGCTGGGTTCGCCAGCCTCCAGGAAAGGGTCTGGAGTGGCTGGGAGTAATATGGGGTGACGGGAGCACAAATTATCATTCAGCTCTCATATCCAGACTGAGCATCAGCAAGGATAACTCCAAGAGCCAAGTTTTCTTAAAACTGAACAGTCTGCAAACTGATGACACAGCCACGTACTACTGTGCCCTCTACGGTAGTAGCTACGACTACTGGGGCCAAGGCACCACTCTCACAGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA
## s1_GTAGGCCGTCCCGACA TTTTGGGGAAGGGAGTGACCAGTTAGTCTTAAGGCACCACTGAGCCCAAGTCTTAGACATCATGGATTGGGTGTGGACCTTGCTATTCCTGATAGCAGCTGCCCAAAGTGCCCAAGCACAGATCCAGTTGGTGCAGTCTGGACCTGAGCTGAAGAAGCCTGGAGAGACAGTCAAGATCTCCTGCAAGGCTTCTGGGTATACCTTCACAGAATATCCAATGCACTGGGTGAAGCAGGCTCCAGGAAAGGGTTTCAAGTGGATGGGCATGATATACACCGACACTGGAGAGCCAACATATGCTGAAGAGTTCAAGGGACGGTTTGCCTTCTCTTTGGAGACCTCTGCCAGCACTGCCTATTTGCAGATCAACAACCTCAAAAATGAGGACACGGCTACATATTTCTGTGTAAATGGGATTTATTACTACTTTGACTACTGGGGCCAAGGCACCACTCTCACAGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA
## s1_GTAGTCACAGTAAGAT ACTTATTTGGGGGAAGACACTGACTCTAACCATGGGATGGAGCTGGATCTTTCTCTTCCTCCTCTCAGGAACTGCAGGTGTCCACTCTGAGTTCCAGCTGCAGCAGTCTGGACCTGAGCTGGTGAAGCCTGGCGCTTCAGTGAAGATATCCTGCAAGGCTTCTGGTTACTCATTCACTGACTACAACATGAACTGGGTGAAGCAGAGCAATGGAAAGAGCCTTGAGTGGATTGGAGTAATTAATCCTAACTATGGTACTACTAGCTACAATCAGAAGTTCAAGGGCAAGGCCACATTGACTGTAGACCAATCTTCCAGCACAGCCTACATGCAGCTCAACAGCCTGACATCTGAGGACTCTGCAGTCTATTACTGTGCAAGAGACAGCTCAGGCTGGTTTGCTTACTGGGGCCAAGGGACTCTGGTCACTGTCTCTGCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA
## VJ_sequence_nt_raw VDJ_sequence_nt_trimmed
## s1_CCGTACTGTCCGACGT
## s1_CGAGCCACAATGTAAG
## s1_GAAACTCAGGAATCGC
## s1_GTACTTTAGAGGGATA
## s1_GTAGGCCGTCCCGACA
## s1_GTAGTCACAGTAAGAT
## VJ_sequence_nt_trimmed VDJ_sequence_aa VJ_sequence_aa
## s1_CCGTACTGTCCGACGT
## s1_CGAGCCACAATGTAAG
## s1_GAAACTCAGGAATCGC
## s1_GTACTTTAGAGGGATA
## s1_GTAGGCCGTCCCGACA
## s1_GTAGTCACAGTAAGAT
## VDJ_raw_ref
## s1_CCGTACTGTCCGACGT GAAGCAAAGGGGATCAGCCCGAGATTCTCATTCAGTGATCAACACTGAACACACATCCCTTACCATGGATTTTGGGCTGATTTTTTTTATTGTTGCTCTTTTAAAAGGGGTCCAGTGTGAGGTGAAGCTTCTCCAGTCTGGAGGTGGCCTGGTGCAGCCTGGAGGATCCCTGAAACTCTCCTGTGCAGCCTCAGGAATCGATTTTAGTAGATACTGGATGAGTTGGGTTCGGCGGGCTCCAGGGAAAGGACTAGAATGGATTGGAGAAATTAATCCAGATAGCAGTACAATAAACTATGCACCATCTCTAAAGGATAAATTCATCATCTCCAGAGACAACGCCAAAAATACGCTGTACCTGCAAATGAGCAAAGTGAGATCTGAGGACACAGCCCTTTATTACTGTGCAAGACCACTACTTTGACTACTGGGGCCAAGGCACCACTCTCACAGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT
## s1_CGAGCCACAATGTAAG ATGGGATGGAGCTGGATCTTTCTCTTTCTCCTGTCAGGAACTGCAGGTGTCCTCTCTGAGGTCCAGCTGCAACAATCTGGACCTGAGCTGGTGAAGCCTGGGGCTTCAGTGAAGATATCCTGTAAGGCTTCTGGATACACGTTCACTGACTACTACATGAACTGGGTGAAGCAGAGCCATGGAAAGAGCCTTGAGTGGATTGGAGATATTAATCCTAACAATGGTGGTACTAGCTACAACCAGAAGTTCAAGGGCAAGGCCACATTGACTGTAGACAAGTCCTCCAGCACAGCCTACATGGAGCTCCGCAGCCTGACATCTGAGGACTCTGCAGTCTATTACTGTGCAAGAAGACAGCTCAGGCTACACTACTTTGACTACTGGGGCCAAGGCACCACTCTCACAGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT
## s1_GAAACTCAGGAATCGC ATGAGATGGAGCTGTATCATCCTCTTCTTGGTAGCAACAGCTACAGGTGTCCACTCCCAGGTCCAACTGCAGCAGCCTGGGGCTGAACTGGTGAAGCCTGGGGCTTCAGTGAAGGTGTCCTGCAAGGCTTCTGGCTACACCTTCACCAGCTACTGGATGCACTGGGTGAAGCAGAGGCCTGGCCAAGGCCTTGAGTGGATTGGAAGGATTCATCCTTCTGATAGTGATACTAACTACAATCAAAAGTTCAAGGGCAAGGCCACATTGACTGTAGACAAATCCTCCAGCACAGCCTACATGCAGCTCAGCAGCCTGACATCTGAGGACTCTGCGGTCTATTACTGTGCAATATTTATTACTACGGTAGTAGCTACCTACTGGTACTTCGATGTCTGGGGCACAGGGACCACGGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT
## s1_GTACTTTAGAGGGATA ATGGCTGTCCTGGCACTGCTCCTCTGCCTGGTGACATTCCCAAGCTGTGTCCTGTCCCAGGTGCAGCTGAAGGAGTCAGGACCTGGCCTGGTGGCGCCCTCACAGAGCCTGTCCATCACATGCACTGTCTCAGGGTTCTCATTAACCAGCTATGGTGTAAGCTGGGTTCGCCAGCCTCCAGGAAAGGGTCTGGAGTGGCTGGGAGTAATATGGGGTGACGGGAGCACAAATTATCATTCAGCTCTCATATCCAGACTGAGCATCAGCAAGGATAACTCCAAGAGCCAAGTTTTCTTAAAACTGAACAGTCTGCAAACTGATGACACAGCCACGTACTACTGTGCCAAACTACTTTGACTACTGGGGCCAAGGCACCACTCTCACAGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT
## s1_GTAGGCCGTCCCGACA ATGGATTGGGTGTGGACCTTGCTATTCCTGATAGCAGCTGCCCAAAGTGCCCAAGCACAGATCCAGTTGGTGCAGTCTGGACCTGAGCTGAAGAAGCCTGGAGAGACAGTCAAGATCTCCTGCAAGGCTTCTGGGTATACCTTCACAGAATATCCAATGCACTGGGTGAAGCAGGCTCCAGGAAAGGGTTTCAAGTGGATGGGCATGATATACACCGACACTGGAGAGCCAACATATGCTGAAGAGTTCAAGGGACGGTTTGCCTTCTCTTTGGAGACCTCTGCCAGCACTGCCTATTTGCAGATCAACAACCTCAAAAATGAGGACACGGCTACATATTTCTGTGTAAGAACTACTTTGACTACTGGGGCCAAGGCACCACTCTCACAGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT
## s1_GTAGTCACAGTAAGAT ATGGGATGGAGCTGGATCTTTCTCTTCCTCCTCTCAGGAACTGCAGGTGTCCACTCTGAGTTCCAGCTGCAGCAGTCTGGACCTGAGCTGGTGAAGCCTGGCGCTTCAGTGAAGATATCCTGCAAGGCTTCTGGTTACTCATTCACTGACTACAACATGAACTGGGTGAAGCAGAGCAATGGAAAGAGCCTTGAGTGGATTGGAGTAATTAATCCTAACTATGGTACTACTAGCTACAATCAGAAGTTCAAGGGCAAGGCCACATTGACTGTAGACCAATCTTCCAGCACAGCCTACATGCAGCTCAACAGCCTGACATCTGAGGACTCTGCAGTCTATTACTGTGCAAGAAGACAGCTCAGGCTACCCTGGTTTGCTTACTGGGGCCAAGGGACTCTGGTCACTGTCTCTGCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT
## VJ_raw_ref VDJ_trimmed_ref VJ_trimmed_ref
## s1_CCGTACTGTCCGACGT
## s1_CGAGCCACAATGTAAG
## s1_GAAACTCAGGAATCGC
## s1_GTACTTTAGAGGGATA
## s1_GTAGGCCGTCCCGACA
## s1_GTAGTCACAGTAAGAT
## VDJ_raw_consensus_id VJ_raw_consensus_id
## s1_CCGTACTGTCCGACGT clonotype18_concat_ref_1
## s1_CGAGCCACAATGTAAG clonotype44_concat_ref_1
## s1_GAAACTCAGGAATCGC clonotype35_concat_ref_1
## s1_GTACTTTAGAGGGATA clonotype11_concat_ref_1
## s1_GTAGGCCGTCCCGACA clonotype17_concat_ref_1
## s1_GTAGTCACAGTAAGAT clonotype20_concat_ref_1
## orig_barcode specifity affinity batches
## s1_CCGTACTGTCCGACGT CCGTACTGTCCGACGT NA NA Unspecified
## s1_CGAGCCACAATGTAAG CGAGCCACAATGTAAG NA NA Unspecified
## s1_GAAACTCAGGAATCGC GAAACTCAGGAATCGC NA NA Unspecified
## s1_GTACTTTAGAGGGATA GTACTTTAGAGGGATA NA NA Unspecified
## s1_GTAGGCCGTCCCGACA GTAGGCCGTCCCGACA NA NA Unspecified
## s1_GTAGTCACAGTAAGAT GTAGTCACAGTAAGAT NA NA Unspecified
## barcode sample_id group_id clonotype_id_10x
## s1_CACCTTGCAAACGCGA s1_CACCTTGCAAACGCGA s1 1 clonotype28
## s1_CGATGGCAGGGTATCG s1_CGATGGCAGGGTATCG s1 1 clonotype2
## s1_CTGATAGGTTCTGTTT s1_CTGATAGGTTCTGTTT s1 1 clonotype57
## s1_GCTGCTTTCTCTGAGA s1_GCTGCTTTCTCTGAGA s1 1 clonotype22
## s1_GTCACGGCACGCATCG s1_GTCACGGCACGCATCG s1 1 clonotype60
## s1_TGACTAGTCGCCTGTT s1_TGACTAGTCGCCTGTT s1 1 clonotype52
## clonotype_id clonotype_frequency celltype Nr_of_VDJ_chains
## s1_CACCTTGCAAACGCGA clonotype28 1 B cell 1
## s1_CGATGGCAGGGTATCG clonotype2 2 B cell 1
## s1_CTGATAGGTTCTGTTT clonotype57 1 B cell 1
## s1_GCTGCTTTCTCTGAGA clonotype22 1 B cell 2
## s1_GTCACGGCACGCATCG clonotype60 1 B cell 1
## s1_TGACTAGTCGCCTGTT clonotype52 1 B cell 1
## Nr_of_VJ_chains VDJ_cdr3s_aa
## s1_CACCTTGCAAACGCGA 2 CARGAPNWYFDVW
## s1_CGATGGCAGGGTATCG 2 CARWFAWFAYW
## s1_CTGATAGGTTCTGTTT 2 CAKNYGSSYSYWYFDVW
## s1_GCTGCTTTCTCTGAGA 2 CTGDYAMDYW;CTLITTVVAKDAMDYW
## s1_GTCACGGCACGCATCG 2 CARRPYYSNSHYAMDYW
## s1_TGACTAGTCGCCTGTT 2 CARDYYGSSLYYFDYW
## VJ_cdr3s_aa
## s1_CACCTTGCAAACGCGA CALWYSTHYVF;CWQGTHFPQTF
## s1_CGATGGCAGGGTATCG CQNDYSYPLTF;CQQYSSYPYTF
## s1_CTGATAGGTTCTGTTT CKQSYNLYTF;CQQSNSWLTF
## s1_GCTGCTTTCTCTGAGA CQQWSSNPLTF;CWQGTHFPPF
## s1_GTCACGGCACGCATCG CALWYSNHLVF;CGVGDTIKEQFVYVF
## s1_TGACTAGTCGCCTGTT CQQYWSTRTF;CLQYDNLLYTF
## VDJ_cdr3s_nt
## s1_CACCTTGCAAACGCGA TGTGCAAGAGGGGCTCCCAACTGGTACTTCGATGTCTGG
## s1_CGATGGCAGGGTATCG TGTGCAAGATGGTTTGCCTGGTTTGCTTACTGG
## s1_CTGATAGGTTCTGTTT TGTGCAAAAAACTATGGTAGTAGCTACAGCTACTGGTACTTCGATGTCTGG
## s1_GCTGCTTTCTCTGAGA TGTACCGGGGATTACGCTATGGACTACTGG;TGTACTCTCATTACTACGGTAGTAGCCAAGGATGCTATGGACTACTGG
## s1_GTCACGGCACGCATCG TGTGCAAGAAGGCCCTACTATAGTAACTCCCACTATGCTATGGACTACTGG
## s1_TGACTAGTCGCCTGTT TGTGCTAGAGATTACTACGGTAGTAGCTTGTACTACTTTGACTACTGG
## VJ_cdr3s_nt
## s1_CACCTTGCAAACGCGA TGTGCTCTATGGTACAGCACCCATTATGTTTTC;TGCTGGCAAGGTACACATTTTCCTCAGACGTTC
## s1_CGATGGCAGGGTATCG TGTCAGAATGATTATAGTTATCCGCTCACGTTC;TGTCAGCAATATAGCAGCTATCCGTACACGTTC
## s1_CTGATAGGTTCTGTTT TGCAAGCAATCTTATAATCTGTACACGTTC;TGTCAACAGAGTAACAGCTGGCTCACGTTC
## s1_GCTGCTTTCTCTGAGA TGCCAGCAGTGGAGTAGTAACCCGCTCACGTTC;TGCTGGCAAGGTACACATTTTCCTCCGTTC
## s1_GTCACGGCACGCATCG TGTGCTCTATGGTACAGCAACCATTTGGTGTTC;TGTGGTGTGGGTGATACAATTAAGGAACAATTTGTGTATGTTTTC
## s1_TGACTAGTCGCCTGTT TGTCAACAGTATTGGAGTACTCGGACGTTC;TGTCTACAGTATGATAATCTTCTGTACACGTTC
## VDJ_umis VJ_umis
## s1_CACCTTGCAAACGCGA 14 17;21
## s1_CGATGGCAGGGTATCG 2 5;5
## s1_CTGATAGGTTCTGTTT 7 23;15
## s1_GCTGCTTTCTCTGAGA 20;14 24;45
## s1_GTCACGGCACGCATCG 10 60;29
## s1_TGACTAGTCGCCTGTT 22 34;23
## VDJ_chain_contig
## s1_CACCTTGCAAACGCGA CACCTTGCAAACGCGA-1_contig_1
## s1_CGATGGCAGGGTATCG CGATGGCAGGGTATCG-1_contig_3
## s1_CTGATAGGTTCTGTTT CTGATAGGTTCTGTTT-1_contig_3
## s1_GCTGCTTTCTCTGAGA GCTGCTTTCTCTGAGA-1_contig_2;GCTGCTTTCTCTGAGA-1_contig_3
## s1_GTCACGGCACGCATCG GTCACGGCACGCATCG-1_contig_1
## s1_TGACTAGTCGCCTGTT TGACTAGTCGCCTGTT-1_contig_3
## VJ_chain_contig
## s1_CACCTTGCAAACGCGA CACCTTGCAAACGCGA-1_contig_2;CACCTTGCAAACGCGA-1_contig_3
## s1_CGATGGCAGGGTATCG CGATGGCAGGGTATCG-1_contig_1;CGATGGCAGGGTATCG-1_contig_2
## s1_CTGATAGGTTCTGTTT CTGATAGGTTCTGTTT-1_contig_1;CTGATAGGTTCTGTTT-1_contig_2
## s1_GCTGCTTTCTCTGAGA GCTGCTTTCTCTGAGA-1_contig_1;GCTGCTTTCTCTGAGA-1_contig_4
## s1_GTCACGGCACGCATCG GTCACGGCACGCATCG-1_contig_2;GTCACGGCACGCATCG-1_contig_3
## s1_TGACTAGTCGCCTGTT TGACTAGTCGCCTGTT-1_contig_1;TGACTAGTCGCCTGTT-1_contig_2
## VDJ_chain VJ_chain VDJ_vgene VJ_vgene
## s1_CACCTTGCAAACGCGA IGH IGL;IGK IGHV1-18 IGLV2;IGKV1-135
## s1_CGATGGCAGGGTATCG IGH IGK;IGK IGHV1-26 IGKV8-19;IGKV6-23
## s1_CTGATAGGTTCTGTTT IGH IGK;IGK IGHV1-80 IGKV8-21;IGKV5-43
## s1_GCTGCTTTCTCTGAGA IGH;IGH IGK;IGK IGHV6-6;IGHV14-4 IGKV4-59;IGKV1-135
## s1_GTCACGGCACGCATCG IGH IGL;IGL IGHV1-76 IGLV1;IGLV3
## s1_TGACTAGTCGCCTGTT IGH IGK;IGK IGHV14-2 IGKV13-84;IGKV19-93
## VDJ_dgene VDJ_jgene VJ_jgene VDJ_cgene VJ_cgene
## s1_CACCTTGCAAACGCGA IGHD4-1 IGHJ1 IGLJ2;IGKJ1 IGHM IGLC2;IGKC
## s1_CGATGGCAGGGTATCG IGHJ3 IGKJ5;IGKJ2 IGHM IGKC;IGKC
## s1_CTGATAGGTTCTGTTT IGHJ1 IGKJ2;IGKJ5 IGHM IGKC;IGKC
## s1_GCTGCTTTCTCTGAGA IGHJ4;IGHJ4 IGKJ5;IGKJ2 IGHM;IGHM IGKC;IGKC
## s1_GTCACGGCACGCATCG IGHD2-5 IGHJ4 IGLJ1;IGLJ2 IGHM IGLC1;IGLC2
## s1_TGACTAGTCGCCTGTT IGHJ2 IGKJ1;IGKJ2 IGHM IGKC;IGKC
## VDJ_sequence_nt_raw
## s1_CACCTTGCAAACGCGA TTTGGGGAACATATGTCCAATGTCCTCTCCACAGGCACTGAACACACTGACTCTAACCATGGGATGGAGCTGGATCTTTCTCCTCTTCCTGTCAGGAACTGCAGGTGTCCTCTCTGAGGTCCAGCTGCAACAGTCTGGACCTGAGCTGGTGAAGCCTGGGGCTTCAGTGAAGATACCCTGCAAGGCTTCTGGATACACATTCACTGACTACAACATGGACTGGGTGAAGCAGAGCCATGGAAAGAGCCTTGAGTGGATTGGAGATATTAATCCTAACAATGGTGGTACTATCTACAACCAGAAGTTCAAGGGCAAGGCCACATTGACTGTAGACAAGTCCTCCAGCACAGCCTACATGGAGCTCCGCAGCCTGACATCTGAGGACACTGCAGTCTATTACTGTGCAAGAGGGGCTCCCAACTGGTACTTCGATGTCTGGGGCACAGGGACCACGGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA
## s1_CGATGGCAGGGTATCG TTATGGGGAATGTCCTCACCACAGACACTGAACACACTGACTCTAACCATGGGATGGAGCTGGATCTTTCTCTTTCTCCTGTCAGGAACTGCAGGTGTCCTCTCTGAGGTCCAGCTGCAACAATCTGGACCTGAGCTGGTGAAGCCTGGGGCTTCAGTGAAGATATCCTGTAAGGCTTCTGGATACACGTTCACTGACTACTACATGAACTGGGTGAAGCAGAGCCATGGAAAGAGCCTTGAGTGGATTGGAGATATTAATCCTAACAATGGTGGTACTAGCTACAACCAGAAGTTCAAGGGCAAGGCCACATTGACTGTAGACAAGTCCTCCAGCACAGCCTACATGGAGCTCCGCAGCCTGACATCTGAGGACTCTGCAGTCTATTACTGTGCAAGATGGTTTGCCTGGTTTGCTTACTGGGGCCAAGGGACTCTGGTCACTGTCTCTGCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA
## s1_CTGATAGGTTCTGTTT TGGGGACAGTCCCTGAACACACTGACTCTAACCATGGAATGGCCTTTGATCTTTCTCTTCCTCCTGTCAGGAACTGCAGGTGTCCAATCCCAGGTTCAGCTGCAGCAGTCTGGGGCTGAGCTGGTGAAGCCTGGGGCCTCAGTGAAGATTTCCTGCAAAGCTTCTGGCTACGCATTCAGTAGCTACTGGATGAACTGGGTGAAGCAGAGGCCTGGAAAGGGTCTTGAGTGGATTGGACAGATTTATCCTGGAGATGGTGATACTAACTACAACGGAAAGTTCAAGGGCAAGGCCACACTGACTGCAGACAAATCCTCCAGCACAGCCTACATGCAGCTCAGCAGCCTGACCTCTGAGGACTCTGCGGTCTATTTCTGTGCAAAAAACTATGGTAGTAGCTACAGCTACTGGTACTTCGATGTCTGGGGCACAGGGACCACGGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA
## s1_GCTGCTTTCTCTGAGA TGGGGGACAGATGCACAAACCTGGACTCACAAGTTTTTCTCTTCAGTGACAAACACAGACATAGAACATTCACGATGTACTTGGGACTGAACTGTGTATTCATAGTTTTTCTCTTAAAAGGTGTCCAGAGTGAAGTGAAGCTTGAGGAGTCTGGAGGAGGCTTGGTGCAACCTGGAGGATCCATGAAACTCTCTTGTGCTGCCTCTGGATTCACTTTTAGTGACGCCTGGATGGACTGGGTCCGCCAGTCTCCAGAGAAGGGGCTTGAGTGGGTTGCTGAAATTAGAAACAAAGCTAATAATCATGCAACATACTATGCTGAGTCTGTGAAAGGGAGGTTCACCATCTCAAGAGATGATTCCAAAAGTAGTGTCTACCTGCAAATGAACAGCTTAAGAGCTGAAGACACTGGCATTTATTACTGTACCGGGGATTACGCTATGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA;TTATGGGGATGAACACTGTTTTCTCTACAGTCACTGAATCTCAATGTCCTTACAATGAAATGCAGCTGGGTCATCTTCTTCCTGATGGCAGTGGTTATAGGGGTCAATTCAGAGGTTCAGCTGCAGCAGTCTGGGGCTGAGCTTGTGAGGCCAGGGGCCTCAGTCAAGTTGTCCTGCACAGCTTCTGGCTTTAACATTAAAGACGACTATATGCACTGGGTGAAGCAGAGGCCTGAACAGGGCCTGGAGTGGATTGGATGGATTGATCCTGAGAATGGTGATACTGAATATGCCTCGAAGTTCCAGGGCAAGGCCACTATAACAGCAGACACATCCTCCAACACAGCCTACCTGCAGCTCAGCAGCCTGACATCTGAGGACACTGCCGTCTATTACTGTACTCTCATTACTACGGTAGTAGCCAAGGATGCTATGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA
## s1_GTCACGGCACGCATCG TCTGTCACGGCACGCATCGTTTTATCCGTTTACTTTTATGGGGATGATCAGTGTCCTCTCTACACAGTCCCTGACGACACTGATTCTAACCATGGGATGGAGCTGGATCTTTTTCTTCCTCCTGTCAGGAACTGCAGGTGTCCACTGTCAGGTCCAGCTGAAGCAGTCTGGGGCTGAGCTGGTGAGGCCTGGGGCTTCAGTGAAGCTGTCCTGCAAGGCTTCTGGCTACACTTTCACTGACTACTATATAAACTGGGTGAAGCAGAGGCCTGGACAGGGACTTGAGTGGATTGCAAGGATTTATCCTGGAAGTGGTAATACTTACTACAATGAGAAGTTCAAGGGCAAGGCCACACTGACTGCAGAAAAATCCTCCAGCACTGCCTACATGCAGCTCAGCAGCCTGACATCTGAGGACTCTGCTGTCTATTTCTGTGCAAGAAGGCCCTACTATAGTAACTCCCACTATGCTATGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA
## s1_TGACTAGTCGCCTGTT TATTTGGGGATGAACCCTGTCTTCTCTACAGCCACTGAATCTCAAGGTCCTTACAATGAAATGCAGCTGGATCATCTTCTTCCTGATGGCAGTGGTTACAGGGGTCAATTCAGAGGTTCAGCTGCAGCAGTCTGGGGCAGAGCTTGTGAAGCCAGGGGCCTCAGTCAAGTTGTCCTGCACAGCTTCTGGCTTCAACATTAAAGACTACTATATGCACTGGGTGAAGCAGAGGACTGAACAGGGCCTGGAGTGGATTGGAAGGATTGATCCTGAGGATGGTGAAACTAAATATGCCCCGAAATTCCAGGGCAAGGCCACTATAACAGCAGACACATCCTCCAACACAGCCTACCTGCAGCTCAGCAGCCTGACATCTGAGGACACTGCCGTCTATTACTGTGCTAGAGATTACTACGGTAGTAGCTTGTACTACTTTGACTACTGGGGCCAAGGCACCACTCTCACAGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA
## VJ_sequence_nt_raw
## s1_CACCTTGCAAACGCGA ACTCACCTTGCAAACGCGATCAACAGTTGTATCTTTTATGGGGGACCAATATTGAAAATAATAGACTTGGTTTGTGAATTATGGCCTGGACTTCACTTATACTCTCTCTCCTGGCTCTCTGCTCAGGAGCCAGTTCCCAGGCTGTTGTGACTCAGGAATCTGCACTCACCACATCACCTGGTGGAACAGTCATACTCACTTGTCGCTCAAGTACTGGGGCTGTTACAACTAGTAACTATGCCAACTGGGTCCAAGAAAAACCAGATCATTTATTCACTGGTCTAATAGGTGGTACCAGCAACCGAGCTCCAGGTGTTCCTGTCAGATTCTCAGGCTCCCTGATTGGAGACAAGGCTGCCCTCACCATCACAGGGGCACAGACTGAGGATGATGCAATGTATTTCTGTGCTCTATGGTACAGCACCCATTATGTTTTCGGCGGTGGAACCAAGGTCACTGTCCTAGGTCAGCCCAAGTCCACTCCCACTCTCACCGTGTTTCCACCTTCCTCTGAGGAGCTCAAGGAAAACAAAGCCACACTGGTGTGTCTGATTTCCAACTTTTCCCCGAGTGGTGTGACAGTGGCCTG;ACTGATCACTCTCCTATGTTCATTTCCTCAAAATGATGAGTCCTGCCCAGTTCCTGTTTCTGTTAGTGCTCTGGATTCGGGAAACCAACGGTGATGTTGTGATGACCCAGACTCCACTCACTTTGTCGGTTACCATTGGACAACCAGCCTCCATCTCTTGCAAGTCAAGTCAGAGCCTCTTAGATAGTGATGGAAAGACATATTTGAATTGGTTGTTACAGAGGCCAGGCCAGTCTCCAAAGCGCCTAATCTATCTGGTGTCTAAACTGGACTCTGGAGTCCCTGACAGGTTCACTGGCAGTGGATCAGGGACAGATTTCACACTGAAAATCAGCAGAGTGGAGGCTGAGGATTTGGGAGTTTATTATTGCTGGCAAGGTACACATTTTCCTCAGACGTTCGGTGGAGGCACCAAGCTGGAAATCAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC
## s1_CGATGGCAGGGTATCG TTTATGGGGACATCTGAAAGGCAGGTGGAGCAAGATGGAATCACAGACTCAGGTCCTCATGTCCCTGCTGTTCTGGGTATCTGGTACCTGTGGGGACATTGTGATGACACAGTCTCCATCCTCCCTGACTGTGACAGCAGGAGAGAAGGTCACTATGAGCTGCAAGTCCAGTCAGAGTCTGTTAAACAGTGGAAATCAAAAGAACTACTTGACCTGGTACCAGCAGAAACCAGGGCAGCCTCCTAAACTGTTGATCTACTGGGCATCCACTAGGGAATCTGGGGTCCCTGATCGCTTCACAGGCAGTGGATCTGGAACAGATTTCACTCTCACCATCAGCAGTGTGCAGGCTGAAGACCTGGCAGTTTATTACTGTCAGAATGATTATAGTTATCCGCTCACGTTCGGTGCTGGGACCAAGCTGGAGCTGAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC;TTATGGGGAAATACATCAGACCAGCATGGGCATCAAGATGGAGACACATTCTCAGGTCTTTGTATACATGTTGCTGTGGTTGTCTGGTGTTGAAGGAGACATTGTGATGACCCAGTCTCACAAATTCATGTCCACATCAGTAGGAGACAGGGTCAGCATCACCTGCAAGGCCAGTCAGGATGTGGGTACTGCTGTAGCCTGGTATCAACAGAAACCAGGGCAATCTCCTAAACTACTGATTTACTGGGCATCCACCCGGCACACTGGAGTCCCTGATCGCTTCACAGGCAGTGGATCTGGGACAGATTTCACTCTCACCATTAGCAATGTGCAGTCTGAAGACTTGGCAGATTATTTCTGTCAGCAATATAGCAGCTATCCGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC
## s1_CTGATAGGTTCTGTTT AGACAGGCAGTGGGAGCAAGATGGATTCACAGGCCCAGGTTCTTATATTGCTGCTGCTATGGGTATCTGGTACCTGTGGGGACATTGTGATGTCACAGTCTCCATCCTCCCTGGCTGTGTCAGCAGGAGAGAAGGTCACTATGAGCTGCAAATCCAGTCAGAGTCTGCTCAACAGTAGAACCCGAAAGAACTACTTGGCTTGGTACCAGCAGAAACCAGGGCAGTCTCCTAAACTGCTGATCTACTGGGCATCCACTAGGGAATCTGGGGTCCCTGATCGCTTCACAGGCAGTGGATCTGGGACAGATTTCACTCTCACCATCAGCAGTGTGCAGGCTGAAGACCTGGCAGTTTATTACTGCAAGCAATCTTATAATCTGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC;ATGAGCCACACAAACTCAGGGAAAGCTCGAAGATGGTTTTCACACCTCAGATACTTGGACTTATGCTTTTTTGGATTTCAGCCTCCAGAGGTGATATTGTGCTAACTCAGTCTCCAGCCACCCTGTCTGTGACTCCAGGAGATAGCGTCAGTCTTTCCTGCAGGGCCAGCCAAAGTATTAGCAACAACCTACACTGGTATCAACAAAAATCACATGAGTCTCCAAGGCTTCTCATCAAGTATGCTTCCCAGTCCATCTCTGGGATCCCCTCCAGGTTCAGTGGCAGTGGATCAGGGACAGATTTCACTCTCAGTATCAACAGTGTGGAGACTGAAGATTTTGGAATGTATTTCTGTCAACAGAGTAACAGCTGGCTCACGTTCGGTGCTGGGACCAAGCTGGAGCTGAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC
## s1_GCTGCTTTCTCTGAGA TTATGGGGAAAGTACTTATGAGAATAGCAGTAATTAGCTAGGGACCAAAATTCAAAGACAAAATGGATTTTCAAGTGCAGATTTTCAGCTTCCTGCTAATCAGTGCCTCAGTCATAATATCCAGAGGACAAATTGTTCTCACCCAGTCTCCAGCAATCATGTCTGCATCTCCAGGGGAGAAGGTCACCATGACCTGCAGTGCCAGCTCAAGTGTAAGTTACATGCACTGGTACCAGCAGAAGTCAGGCACCTCCCCCAAAAGATGGATTTATGACACATCCAAACTGGCTTCTGGAGTCCCTGCTCGCTTCAGTGGCAGTGGGTCTGGGACCTCTTACTCTCTCACAATCAGCAGCATGGAGGCTGAAGATGCTGCCACTTATTACTGCCAGCAGTGGAGTAGTAACCCGCTCACGTTCGGTGCTGGGACCAAGCTGGAGCTGAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC;ACTGATCACTCTCCTATGTTCATTTCCTCAAAATGATGAGTCCTGCCCAGTTCCTGTTTCTGTTAGTGCTCTGGATTCGGGAAACCAACGGTGATGTTGTGATGACCCAGACTCCACTCACTTTGTCGGTTACCATTGGACAACCAGCCTCCATCTCTTGCAAGTCAAGTCAGAGCCTCTTAGATAGTGATGGAAAGACATATTTGAATTGGTTGTTACAGAGGCCAGGCCAGTCTCCAAAGCGCCTAATCTATCTGGTGTCTAAACTGGACTCTGGAGTCCCTGACAGGTTCACTGGCAGTGGATCAGGGACAGATTTCACACTGAAAATCAGCAGAGTGGAGGCTGAGGATTTGGGAGTTTATTATTGCTGGCAAGGTACACATTTTCCTCCGTTCGGAGGGGGGACCAAGCTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC
## s1_GTCACGGCACGCATCG GGGGACCAATATTGAAAAGAATAGACCTGGTTTGTGAATTATGGCCTGGATTTCACTTATACTCTCTCTCCTGGCTCTCAGCTCAGGGGCCATTTCCCAGGCTGTTGTGACTCAGGAATCTGCACTCACCACATCACCTGGTGAAACAGTCACACTCACTTGTCGCTCAAGTACTGGGGCTGTTACAACTAGTAACTATGCCAACTGGGTCCAAGAAAAACCAGATCATTTATTCACTGGTCTAATAGGTGGTACCAACAACCGAGCTCCAGGTGTTCCTGCCAGATTCTCAGGCTCCCTGATTGGAGACAAGGCTGCCCTCACCATCACAGGGGCACAGACTGAGGATGAGGCAATATATTTCTGTGCTCTATGGTACAGCAACCATTTGGTGTTCGGTGGAGGAACCAAACTGACTGTCCTAGGCCAGCCCAAGTCTTCGCCATCAGTCACCCTGTTTCCACCTTCCTCTGAAGAGCTCGAGACTAACAAGGCCACACTGGTGTGTA;TCTGTCACGGCACGCATCGGACTATCAAATATCTTCTATGGGAGAGAGAACTACAACCTGTCTGTCTCAGCAGAGATCAGTAGTACCTGCATTATGGCCTGGACTCCTCTCTTCTTCTTCTTTGTTCTTCATTGCTCAGGTTCTTTCTCCCAACTTGTGCTCACTCAGTCATCTTCAGCCTCTTTCTCCCTGGGAGCCTCAGCAAAACTCACGTGCACCTTGAGTAGTCAGCACAGTACGTACACCATTGAATGGTATCAGCAACAGCCACTCAAGCCTCCTAAGTATGTGATGGAGCTTAAGAAAGATGGAAGCCACAGCACAGGTGATGGGATTCCTGATCGCTTCTCTGGATCCAGCTCTGGTGCTGATCGCTACCTTAGCATTTCCAACATCCAGCCTGAAGATGAAGCAATATACATCTGTGGTGTGGGTGATACAATTAAGGAACAATTTGTGTATGTTTTCGGCGGTGGAACCAAGGTCACTGTCCTAGGTCAGCCCAAGTCCACTCCCACTCTCACCGTGTTTCCACCTTCCTCTGAGGAGCTCAAGGAAAACAAAGCCACACTGGTGTGTCTGATTTCCAACTTTTCCCCGAGTGGTGTGACAGTGGCCTG
## s1_TGACTAGTCGCCTGTT TGGGGAATGTCAGGTCACAGCAGAAACATGAAGTTTCCTTCTCAACTTCTGCTCTTACTGCTGTTTGGAATCCCAGGCATGATATGTGACATCCAGATGACACAATCTTCATCCTCCTTTTCTGTATCTCTAGGAGACAGAGTCACCATTACTTGCAAGGCAAGTGAGGACATATATAATCGGTTAGCCTGGTATCAGCAGAAACCAGGAAATGCTCCTAGGCTCTTAATATCTGGTGCAACCAGTTTGGAAACTGGGGTTCCTTCAAGATTCAGTGGCAGTGGATCTGGAAAGGATTACACTCTCAGCATTACCAGTCTTCAGACTGAAGATGTTGCTACTTATTACTGTCAACAGTATTGGAGTACTCGGACGTTCGGTGGAGGCACCAAGCTGGAAATCAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC;TGACTAGTCGCCTGTTCGTACTTCGTTTTATTATTTGGGAGTCATTCTTGGTCAGGAGACGTTGTAGAAATGAGACCGTCTATTCAGTTCCTGGGGCTCTTGTTGTTCTGGCTTCATGGTGCTCAGTGTGACATCCAGATGACACAGTCTCCATCCTCACTGTCTGCATCTCTGGGAGGCAAAGTCACCATCACTTGCAAGGCAAGCCAAGACATTAACAAGTATATAGCTTGGTACCAACACAAGCCTGGAAAAGGTCCTAGGCTGCTCATACATTACACATCTACATTACAGCCAGGCATCCCATCAAGGTTCAGTGGAAGTGGGTCTGGGAGAGATTATTCCTTCAGCATCAGCAACCTGGAGCCTGAAGATATTGCAACTTATTATTGTCTACAGTATGATAATCTTCTGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTC
## VDJ_sequence_nt_trimmed VJ_sequence_nt_trimmed
## s1_CACCTTGCAAACGCGA
## s1_CGATGGCAGGGTATCG
## s1_CTGATAGGTTCTGTTT
## s1_GCTGCTTTCTCTGAGA
## s1_GTCACGGCACGCATCG
## s1_TGACTAGTCGCCTGTT
## VDJ_sequence_aa VJ_sequence_aa
## s1_CACCTTGCAAACGCGA
## s1_CGATGGCAGGGTATCG
## s1_CTGATAGGTTCTGTTT
## s1_GCTGCTTTCTCTGAGA
## s1_GTCACGGCACGCATCG
## s1_TGACTAGTCGCCTGTT
## VDJ_raw_ref
## s1_CACCTTGCAAACGCGA ATGGGATGGAGCTGGATCTTTCTCCTCTTCCTGTCAGGAACTGCAGGTGTCCTCTCTGAGGTCCAGCTGCAACAGTCTGGACCTGAGCTGGTGAAGCCTGGGGCTTCAGTGAAGATACCCTGCAAGGCTTCTGGATACACATTCACTGACTACAACATGGACTGGGTGAAGCAGAGCCATGGAAAGAGCCTTGAGTGGATTGGAGATATTAATCCTAACAATGGTGGTACTATCTACAACCAGAAGTTCAAGGGCAAGGCCACATTGACTGTAGACAAGTCCTCCAGCACAGCCTACATGGAGCTCCGCAGCCTGACATCTGAGGACACTGCAGTCTATTACTGTGCAAGACTAACTGGGACCTACTGGTACTTCGATGTCTGGGGCACAGGGACCACGGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT
## s1_CGATGGCAGGGTATCG ATGGGATGGAGCTGGATCTTTCTCTTTCTCCTGTCAGGAACTGCAGGTGTCCTCTCTGAGGTCCAGCTGCAACAATCTGGACCTGAGCTGGTGAAGCCTGGGGCTTCAGTGAAGATATCCTGTAAGGCTTCTGGATACACGTTCACTGACTACTACATGAACTGGGTGAAGCAGAGCCATGGAAAGAGCCTTGAGTGGATTGGAGATATTAATCCTAACAATGGTGGTACTAGCTACAACCAGAAGTTCAAGGGCAAGGCCACATTGACTGTAGACAAGTCCTCCAGCACAGCCTACATGGAGCTCCGCAGCCTGACATCTGAGGACTCTGCAGTCTATTACTGTGCAAGACCTGGTTTGCTTACTGGGGCCAAGGGACTCTGGTCACTGTCTCTGCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT
## s1_CTGATAGGTTCTGTTT ATGGAATGGCCTTTGATCTTTCTCTTCCTCCTGTCAGGAACTGCAGGTGTCCAATCCCAGGTTCAGCTGCAGCAGTCTGGGGCTGAGCTGGTGAAGCCTGGGGCCTCAGTGAAGATTTCCTGCAAAGCTTCTGGCTACGCATTCAGTAGCTACTGGATGAACTGGGTGAAGCAGAGGCCTGGAAAGGGTCTTGAGTGGATTGGACAGATTTATCCTGGAGATGGTGATACTAACTACAACGGAAAGTTCAAGGGCAAGGCCACACTGACTGCAGACAAATCCTCCAGCACAGCCTACATGCAGCTCAGCAGCCTGACCTCTGAGGACTCTGCGGTCTATTTCTGTGCAAGACTACTGGTACTTCGATGTCTGGGGCACAGGGACCACGGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT
## s1_GCTGCTTTCTCTGAGA ATGTACTTGGGACTGAACTGTGTATTCATAGTTTTTCTCTTAAAAGGTGTCCAGAGTGAAGTGAAGCTTGAGGAGTCTGGAGGAGGCTTGGTGCAACCTGGAGGATCCATGAAACTCTCTTGTGCTGCCTCTGGATTCACTTTTAGTGACGCCTGGATGGACTGGGTCCGCCAGTCTCCAGAGAAGGGGCTTGAGTGGGTTGCTGAAATTAGAAACAAAGCTAATAATCATGCAACATACTATGCTGAGTCTGTGAAAGGGAGGTTCACCATCTCAAGAGATGATTCCAAAAGTAGTGTCTACCTGCAAATGAACAGCTTAAGAGCTGAAGACACTGGCATTTATTACTGTACCAGGATTACTATGCTATGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT
## s1_GTCACGGCACGCATCG ATGGGATGGAGCTGGATCTTTTTCTTCCTCCTGTCAGGAACTGCAGGTGTCCACTGTCAGGTCCAGCTGAAGCAGTCTGGGGCTGAGCTGGTGAGGCCTGGGGCTTCAGTGAAGCTGTCCTGCAAGGCTTCTGGCTACACTTTCACTGACTACTATATAAACTGGGTGAAGCAGAGGCCTGGACAGGGACTTGAGTGGATTGCAAGGATTTATCCTGGAAGTGGTAATACTTACTACAATGAGAAGTTCAAGGGCAAGGCCACACTGACTGCAGAAAAATCCTCCAGCACTGCCTACATGCAGCTCAGCAGCCTGACATCTGAGGACTCTGCTGTCTATTTCTGTGCAAGACCTACTATAGTAACTACATTACTATGCTATGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT
## s1_TGACTAGTCGCCTGTT ATGAAATGCAGCTGGATCATCTTCTTCCTGATGGCAGTGGTTACAGGGGTCAATTCAGAGGTTCAGCTGCAGCAGTCTGGGGCAGAGCTTGTGAAGCCAGGGGCCTCAGTCAAGTTGTCCTGCACAGCTTCTGGCTTCAACATTAAAGACTACTATATGCACTGGGTGAAGCAGAGGACTGAACAGGGCCTGGAGTGGATTGGAAGGATTGATCCTGAGGATGGTGAAACTAAATATGCCCCGAAATTCCAGGGCAAGGCCACTATAACAGCAGACACATCCTCCAACACAGCCTACCTGCAGCTCAGCAGCCTGACATCTGAGGACACTGCCGTCTATTACTGTGCTAGAACTACTTTGACTACTGGGGCCAAGGCACCACTCTCACAGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCAGGTGTTGCTGTCTCCCAAGAGCATCCTTGAAGGTTCAGATGAATACCTGGTATGCAAAATCCACTACGGAGGCAAAAACAAAGATCTGCATGTGCCCATTCCAGCTGTCGCAGAGATGAACCCCAATGTAAATGTGTTCGTCCCACCACGGGATGGCTTCTCTGGCCCTGCACCACGCAAGTCTAAACTCATCTGCGAGGCCACGAACTTCACTCCAAAACCGATCACAGTATCCTGGCTAAAGGATGGGAAGCTCGTGGAATCTGGCTTCACCACAGATCCGGTGACCATCGAGAACAAAGGATCCACACCCCAAACCTACAAGGTCATAAGCACACTTACCATCTCTGAAATCGACTGGCTGAACCTGAATGTGTACACCTGCCGTGTGGATCACAGGGGTCTCACCTTCTTGAAGAACGTGTCCTCCACATGTGCTGCCAGTCCCTCCACAGACATCCTAACCTTCACCATCCCCCCCTCCTTTGCCGACATCTTCCTCAGCAAGTCCGCTAACCTGACCTGTCTGGTCTCAAACCTGGCAACCTATGAAACCCTGAATATCTCCTGGGCTTCTCAAAGTGGTGAACCACTGGAAACCAAAATTAAAATCATGGAAAGCCATCCCAATGGCACCTTCAGTGCTAAGGGTGTGGCTAGTGTTTGTGTGGAAGACTGGAATAACAGGAAGGAATTTGTGTGTACTGTGACTCACAGGGATCTGCCTTCACCACAGAAGAAATTCATCTCAAAACCCAATGAGGTGCACAAACATCCACCTGCTGTGTACCTGCTGCCACCAGCTCGTGAGCAACTGAACCTGAGGGAGTCAGCCACAGTCACCTGCCTGGTGAAGGGCTTCTCTCCTGCAGACATCAGTGTGCAGTGGCTTCAGAGAGGGCAACTCTTGCCCCAAGAGAAGTATGTGACCAGTGCCCCGATGCCAGAGCCTGGGGCCCCAGGCTTCTACTTTACCCACAGCATCCTGACTGTGACAGAGGAGGAATGGAACTCCGGAGAGACCTATACCTGTGTTGTAGGCCACGAGGCCCTGCCACACCTGGTGACCGAGAGGACCGTGGACAAGTCCACTGGTAAACCCACACTGTACAATGTCTCCCTGATCATGTCTGACACAGGCGGCACCTGCTAT
## VJ_raw_ref
## s1_CACCTTGCAAACGCGA ATGGCCTGGACTTCACTTATACTCTCTCTCCTGGCTCTCTGCTCAGGAGCCAGTTCCCAGGCTGTTGTGACTCAGGAATCTGCACTCACCACATCACCTGGTGGAACAGTCATACTCACTTGTCGCTCAAGTACTGGGGCTGTTACAACTAGTAACTATGCCAACTGGGTCCAAGAAAAACCAGATCATTTATTCACTGGTCTAATAGGTGGTACCAGCAACCGAGCTCCAGGTGTTCCTGTCAGATTCTCAGGCTCCCTGATTGGAGACAAGGCTGCCCTCACCATCACAGGGGCACAGACTGAGGATGATGCAATGTATTTCTGTGCTCTATGGTACAGCACCCATTTCTTATGTTTTCGGCGGTGGAACCAAGGTCACTGTCCTAGGTCAGCCCAAGTCCACTCCCACTCTCACCGTGTTTCCACCTTCCTCTGAGGAGCTCAAGGAAAACAAAGCCACACTGGTGTGTCTGATTTCCAACTTTTCCCCGAGTGGTGTGACAGTGGCCTGGAAGGCAAATGGTACACCTATCACCCAGGGTGTGGACACTTCAAATCCCACCAAAGAGGGCAACAAGTTCATGGCCAGCAGCTTCCTACATTTGACATCGGACCAGTGGAGATCTCACAACAGTTTTACCTGTCAAGTTACACATGAAGGGGACACTGTGGAGAAGAGTCTGTCTCCTGCAGAATGTCTC
## s1_CGATGGCAGGGTATCG ATGGAATCACAGACTCAGGTCCTCATGTCCCTGCTGTTCTGGGTATCTGGTACCTGTGGGGACATTGTGATGACACAGTCTCCATCCTCCCTGACTGTGACAGCAGGAGAGAAGGTCACTATGAGCTGCAAGTCCAGTCAGAGTCTGTTAAACAGTGGAAATCAAAAGAACTACTTGACCTGGTACCAGCAGAAACCAGGGCAGCCTCCTAAACTGTTGATCTACTGGGCATCCACTAGGGAATCTGGGGTCCCTGATCGCTTCACAGGCAGTGGATCTGGAACAGATTTCACTCTCACCATCAGCAGTGTGCAGGCTGAAGACCTGGCAGTTTATTACTGTCAGAATGATTATAGTTATCCTCCGCTCACGTTCGGTGCTGGGACCAAGCTGGAGCTGAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGAACAACTTCTACCCCAAAGACATCAATGTCAAGTGGAAGATTGATGGCAGTGAACGACAAAATGGCGTCCTGAACAGTTGGACTGATCAGGACAGCAAAGACAGCACCTACAGCATGAGCAGCACCCTCACGTTGACCAAGGACGAGTATGAACGACATAACAGCTATACCTGTGAGGCCACTCACAAGACATCAACTTCACCCATTGTCAAGAGCTTCAACAGGAATGAGTGT
## s1_CTGATAGGTTCTGTTT AGACAGGCAGTGGGAGCAAGATGGATTCACAGGCCCAGGTTCTTATATTGCTGCTGCTATGGGTATCTGGTACCTGTGGGGACATTGTGATGTCACAGTCTCCATCCTCCCTGGCTGTGTCAGCAGGAGAGAAGGTCACTATGAGCTGCAAATCCAGTCAGAGTCTGCTCAACAGTAGAACCCGAAAGAACTACTTGGCTTGGTACCAGCAGAAACCAGGGCAGTCTCCTAAACTGCTGATCTACTGGGCATCCACTAGGGAATCTGGGGTCCCTGATCGCTTCACAGGCAGTGGATCTGGGACAGATTTCACTCTCACCATCAGCAGTGTGCAGGCTGAAGACCTGGCAGTTTATTACTGCAAGCAATCTTATAATCTTCCTGTACACGTTCGGAGGGGGGACCAAGCTGGAAATAAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGAACAACTTCTACCCCAAAGACATCAATGTCAAGTGGAAGATTGATGGCAGTGAACGACAAAATGGCGTCCTGAACAGTTGGACTGATCAGGACAGCAAAGACAGCACCTACAGCATGAGCAGCACCCTCACGTTGACCAAGGACGAGTATGAACGACATAACAGCTATACCTGTGAGGCCACTCACAAGACATCAACTTCACCCATTGTCAAGAGCTTCAACAGGAATGAGTGT
## s1_GCTGCTTTCTCTGAGA ATGGATTTTCAAGTGCAGATTTTCAGCTTCCTGCTAATCAGTGCCTCAGTCATAATATCCAGAGGACAAATTGTTCTCACCCAGTCTCCAGCAATCATGTCTGCATCTCCAGGGGAGAAGGTCACCATGACCTGCAGTGCCAGCTCAAGTGTAAGTTACATGCACTGGTACCAGCAGAAGTCAGGCACCTCCCCCAAAAGATGGATTTATGACACATCCAAACTGGCTTCTGGAGTCCCTGCTCGCTTCAGTGGCAGTGGGTCTGGGACCTCTTACTCTCTCACAATCAGCAGCATGGAGGCTGAAGATGCTGCCACTTATTACTGCCAGCAGTGGAGTAGTAACCCACCCAGCTCACGTTCGGTGCTGGGACCAAGCTGGAGCTGAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGAACAACTTCTACCCCAAAGACATCAATGTCAAGTGGAAGATTGATGGCAGTGAACGACAAAATGGCGTCCTGAACAGTTGGACTGATCAGGACAGCAAAGACAGCACCTACAGCATGAGCAGCACCCTCACGTTGACCAAGGACGAGTATGAACGACATAACAGCTATACCTGTGAGGCCACTCACAAGACATCAACTTCACCCATTGTCAAGAGCTTCAACAGGAATGAGTGT
## s1_GTCACGGCACGCATCG GCTGACCAATATTGAAAAGAATAGACCTGGTTTGTGAATTATGGCCTGGATTTCACTTATACTCTCTCTCCTGGCTCTCAGCTCAGGGGCCATTTCCCAGGCTGTTGTGACTCAGGAATCTGCACTCACCACATCACCTGGTGAAACAGTCACACTCACTTGTCGCTCAAGTACTGGGGCTGTTACAACTAGTAACTATGCCAACTGGGTCCAAGAAAAACCAGATCATTTATTCACTGGTCTAATAGGTGGTACCAACAACCGAGCTCCAGGTGTTCCTGCCAGATTCTCAGGCTCCCTGATTGGAGACAAGGCTGCCCTCACCATCACAGGGGCACAGACTGAGGATGAGGCAATATATTTCTGTGCTCTATGGTACAGCAACCATTTCCTGGGTGTTCGGTGGAGGAACCAAACTGACTGTCCTAGGCCAGCCCAAGTCTTCGCCATCAGTCACCCTGTTTCCACCTTCCTCTGAAGAGCTCGAGACTAACAAGGCCACACTGGTGTGTACGATCACTGATTTCTACCCAGGTGTGGTGACAGTGGACTGGAAGGTAGATGGTACCCCTGTCACTCAGGGTATGGAGACAACCCAGCCTTCCAAACAGAGCAACAACAAGTACATGGCTAGCAGCTACCTGACCCTGACAGCAAGAGCATGGGAAAGGCATAGCAGTTACAGCTGCCAGGTCACTCATGAAGGTCACACTGTGGAGAAGAGTTTGTCCCGTGCTGACTGTTCC
## s1_TGACTAGTCGCCTGTT ATGAAGTTTCCTTCTCAACTTCTGCTCTTACTGCTGTTTGGAATCCCAGGCATGATATGTGACATCCAGATGACACAATCTTCATCCTCCTTTTCTGTATCTCTAGGAGACAGAGTCACCATTACTTGCAAGGCAAGTGAGGACATATATAATCGGTTAGCCTGGTATCAGCAGAAACCAGGAAATGCTCCTAGGCTCTTAATATCTGGTGCAACCAGTTTGGAAACTGGGGTTCCTTCAAGATTCAGTGGCAGTGGATCTGGAAAGGATTACACTCTCAGCATTACCAGTCTTCAGACTGAAGATGTTGCTACTTATTACTGTCAACAGTATTGGAGTACTCCTCCGTGGACGTTCGGTGGAGGCACCAAGCTGGAAATCAAACGGGCTGATGCTGCACCAACTGTATCCATCTTCCCACCATCCAGTGAGCAGTTAACATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGAACAACTTCTACCCCAAAGACATCAATGTCAAGTGGAAGATTGATGGCAGTGAACGACAAAATGGCGTCCTGAACAGTTGGACTGATCAGGACAGCAAAGACAGCACCTACAGCATGAGCAGCACCCTCACGTTGACCAAGGACGAGTATGAACGACATAACAGCTATACCTGTGAGGCCACTCACAAGACATCAACTTCACCCATTGTCAAGAGCTTCAACAGGAATGAGTGT
## VDJ_trimmed_ref VJ_trimmed_ref VDJ_raw_consensus_id
## s1_CACCTTGCAAACGCGA clonotype28_concat_ref_1
## s1_CGATGGCAGGGTATCG clonotype2_concat_ref_1
## s1_CTGATAGGTTCTGTTT clonotype57_concat_ref_1
## s1_GCTGCTTTCTCTGAGA clonotype22_concat_ref_1
## s1_GTCACGGCACGCATCG clonotype60_concat_ref_1
## s1_TGACTAGTCGCCTGTT clonotype52_concat_ref_1
## VJ_raw_consensus_id orig_barcode specifity
## s1_CACCTTGCAAACGCGA clonotype28_concat_ref_3 CACCTTGCAAACGCGA NA
## s1_CGATGGCAGGGTATCG clonotype2_concat_ref_2 CGATGGCAGGGTATCG NA
## s1_CTGATAGGTTCTGTTT clonotype57_concat_ref_2 CTGATAGGTTCTGTTT NA
## s1_GCTGCTTTCTCTGAGA clonotype22_concat_ref_3 GCTGCTTTCTCTGAGA NA
## s1_GTCACGGCACGCATCG clonotype60_concat_ref_2 GTCACGGCACGCATCG NA
## s1_TGACTAGTCGCCTGTT clonotype52_concat_ref_3 TGACTAGTCGCCTGTT NA
## affinity batches
## s1_CACCTTGCAAACGCGA NA Unspecified
## s1_CGATGGCAGGGTATCG NA Unspecified
## s1_CTGATAGGTTCTGTTT NA Unspecified
## s1_GCTGCTTTCTCTGAGA NA Unspecified
## s1_GTCACGGCACGCATCG NA Unspecified
## s1_TGACTAGTCGCCTGTT NA Unspecified
While filtering out all cells with aberrant chain numbers is often used during processing of VDJ data, Platypus offers a format which can accomodate and integrate these cells into analysis. (See VDJ_clonotype()). A third option was proposed by Zhang W et al. (Sci Adv. 2021 10.1126/sciadv.abf5835): To choose between excess chains, by the count of unique molecular identifier of each contig (UMIs). The VGM function implements this strategy with two parameters: 1. select.excess.chains.by.umi.count Is a boolean. Once set to TRUE the VGM will filter excess chains based on UMI count
select.excess.chains.by.umi.count = T
barcode | Nr_of_VJ_chains | VJ_UMIs |
---|---|---|
Cell 1 | 2 | 1;1 |
Cell 2 | 2 | 1;5 |
Cell 3 | 2 | 3;3 |
FOR: excess.chain.confidence.count.threshold = 1000
Cell 1 -> both chains are below the threshold and are therefore subject to filtering. Given that both chains have the same UMI count, a one contig is eliminated at random. Cell 2 -> both chains are below the threshold and subject to filtering. Chain 2 has the higher UMI and is therefore kept Cell 3 -> proceeds as for Cell 1
FOR: excess.chain.confidence.count.threshold = 3
Cell 1 -> same as above Cell 2 -> chain 1 is below threshold and subject to filtering. Chain 2 is above threshold and therefore considered a high confidence chain and not filtered Cell 3 -> Both chains are equal or above threshold, both are considered high confident and no chain is filtered.
UMI distribution within VDJ datasets can vary. To optimize this filtering parameter we recommend investigating UMI frequencies in cells with double chains.
vgm <- VDJ_GEX_matrix(VDJ.out.directory.list = VDJ.out.directory.list,
select.excess.chains.by.umi.count = T,
excess.chain.confidence.count.threshold = 1000)
It may be useful to obtain full-length sequences (e.g., for phylogenetics or experimental expression and validation, see Pogson M. et al. Nat Comm. 2016, 10.1038/ncomms12535) from single-cell sequencing data. We have therefore added an option to recover both trimmed and untrimmed sequences as determined by the cellranger vdj function. If the user sets the trim.and.align parameter to TRUE, this will return processed VDJ contig strings as detailed below:
Trimming: The raw nt contig sequences are trimmed to the start of the V segment and the end of the J segment. (VDJ/VJ_sequence_nt_trimmed) These trimmed sequences contain signal peptide information. In the case that a user does not want this, we currently use MIXCR to realign and extract the sequence that spans FR1 to FR4 for both heavy and light chains. This can be done using the VDJ_call_MiXCR() function within Platypus, although it does require the user to download mixcr locally.
Translation: The trimmed contigs are translated to yield full amino acid sequences. (VDJ/VJ_sequence_aa)
Realigning to reference: The untrimmed germline sequences (as determined by 10x genomics) for each clonotype consensus sequence are returned are returned when trim.and.align = T. This can be used on IMGT or with MIXCR to further provide a shorter reference sequence that covers just the FR1 to FR4. Furthermore the function aligns trimmed contig sequences to reference contigs using Biostrings::pairwiseAlignment with alignment = “local”; see optimization parameters in section 4.1.2). This results in a reference sequence from the start of the V segment to the end of the J segment. (VDJ/VJ_trimmed_ref) Most of the time this trimmed germline sequence will be out of frame given the CDR3 deletions and insertions. In the case that a user wants an in-frame germline (e.g. for expression), the CDR3 will likely need to be replace. We are currently developing a pipeline that supplies CDR3s of those recovered sequences that are closest to germline - so stay tuned :)
The trimmed contigs are aligned to the sequences of the concat_ref.fasta file and the reference is trimed accordingly. (This is done using: Biostrings::pairwiseAlignment with alignment = “local”; see optimization parameters in section 4.1.2)
vgm <- VDJ_GEX_matrix(VDJ.out.directory.list = VDJ.out.directory.list,
trim.and.align = T)
vgm[[1]][1,c("VDJ_sequence_nt_raw", "VDJ_sequence_nt_trimmed", "VDJ_sequence_aa", "VDJ_trimmed_ref")]
## VDJ_sequence_nt_raw
## s1_AAACGGGGTTTAGGAA TGGGAAGTGTGCAGCCATGGGCAGGCTTACTTCTTCATTCCTGTTACTGATTGTCCCTGCATATGTCCTGTCCCAGGTTACTCTGAAAGAGTCTGGCCCTGGGATATTGCAGCCCTCCCAGACCCTCAGTCTGACTTGTTCTTTCTCTGGGTTTTCACTGAGCACTTTTGGTATGGGTGTAGGCTGGATTCGTCAGCCTTCAGGGAAGGGTCTGGAGTGGCTGGCACACATTTGGTGGGATGATGATAAGTACTATAACCCAGCCCTGAAGAGTCGGCTCACAATCTCCAAGGATACCTCCAAAAACCAGGTATTCCTCAAGATCGCCAATGTGGACACTGCAGATACTGCCACATACTACTGTGCTCGAATAGGATATGCTATGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCAGAGAGTCAGTCCTTCCCAAATGTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATAAGAATCTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCACCATTTCCTTCACCTGGAACTACCAGAACAACACTGAAGTCATCCAGGGTATCAGAACCTTCCCAACACTGAGGACAGGGGGCAAGTACCTAGCCACCTCGCA
## VDJ_sequence_nt_trimmed
## s1_AAACGGGGTTTAGGAA ATGGGCAGGCTTACTTCTTCATTCCTGTTACTGATTGTCCCTGCATATGTCCTGTCCCAGGTTACTCTGAAAGAGTCTGGCCCTGGGATATTGCAGCCCTCCCAGACCCTCAGTCTGACTTGTTCTTTCTCTGGGTTTTCACTGAGCACTTTTGGTATGGGTGTAGGCTGGATTCGTCAGCCTTCAGGGAAGGGTCTGGAGTGGCTGGCACACATTTGGTGGGATGATGATAAGTACTATAACCCAGCCCTGAAGAGTCGGCTCACAATCTCCAAGGATACCTCCAAAAACCAGGTATTCCTCAAGATCGCCAATGTGGACACTGCAGATACTGCCACATACTACTGTGCTCGAATAGGATATGCTATGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCA
## VDJ_sequence_aa
## s1_AAACGGGGTTTAGGAA MGRLTSSFLLLIVPAYVLSQVTLKESGPGILQPSQTLSLTCSFSGFSLSTFGMGVGWIRQPSGKGLEWLAHIWWDDDKYYNPALKSRLTISKDTSKNQVFLKIANVDTADTATYYCARIGYAMDYWGQGTSVTVSS
## VDJ_trimmed_ref
## s1_AAACGGGGTTTAGGAA ATGGGCAGGCTTACTTCTTCATTCCTGTTACTGATTGTCCCTGCATATGTCCTGTCCCAGGTTACTCTGAAAGAGTCTGGCCCTGGGATATTGCAGCCCTCCCAGACCCTCAGTCTGACTTGTTCTTTCTCTGGGTTTTCACTGAGCACTTTTGGTATGGGTGTAGGCTGGATTCGTCAGCCTTCAGGGAAGGGTCTGGAGTGGCTGGCACACATTTGGTGGGATGATGATAAGTACTATAACCCAGCCCTGAAGAGTCGGCTCACAATCTCCAAGGATACCTCCAAAAACCAGGTATTCCTCAAGATCGCCAATGTGGACACTGCAGATACTGCCACATACTACTGTGCTCGAATAGATTACTATGCTATGGACTACTGGGGTCAAGGAACCTCAGTCACCGTCTCCTCA
A key difficulty of TCR and BCR sequences, is that CDR3-containing junction regions almost never matches known references. This leaves alignment gaps and can lead to the highest scoring alignment to not contain the junction-following J segment. The VGM therefore allows to tweak the alignment parameters gap.opening.cost and gap.extension.cost
vgm <- VDJ_GEX_matrix(VDJ.out.directory.list = VDJ.out.directory.list,
trim.and.align = T,
gap.opening.cost = 10,
gap.extension.cost = 4)
To accelerate alignment the VGM allows for multicore processing via the functions mclapply or parlapply depending on the operating system from the Parallel package. Given initiation times of multicore processes, we only recommend using it for datasets with >2000 cells.
The parameter numcores can be used to set the number of cores used. This defaults to all available cores, therefore, setting a limit when running the function on a cluster is crucial.
By default parallel.processing is set to “none” and the function uses standard lapply
#For LINUX and WINDOWS users
vgm <- VDJ_GEX_matrix(VDJ.out.directory.list = VDJ.out.directory.list,
trim.and.align = T,
parallel.processing = "parlapply",
numcores = 8)
#For MAC users
vgm <- VDJ_GEX_matrix(VDJ.out.directory.list = VDJ.out.directory.list,
trim.and.align = T,
parallel.processing = "mclapply",
numcores = 8)
We hope that this comprehensive vignette of the VGM function enables any user to employ the full functionality of this function. If you have any issues, requests or ideas concerning existing or new features, please reach out to us via Github or Email.
## R version 4.0.5 (2021-03-31)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19044)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252
## [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
## [5] LC_TIME=German_Germany.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] Platypus_3.4.0 SeuratObject_4.0.4 Seurat_4.1.0 forcats_0.5.1
## [5] stringr_1.4.0 purrr_0.3.4 readr_2.1.2 tidyr_1.1.3
## [9] tibble_3.1.2 ggplot2_3.3.5 tidyverse_1.3.1 dplyr_1.0.7
##
## loaded via a namespace (and not attached):
## [1] readxl_1.3.1 backports_1.4.1 systemfonts_1.0.4
## [4] plyr_1.8.6 igraph_1.2.6 lazyeval_0.2.2
## [7] splines_4.0.5 listenv_0.8.0 scattermore_0.7
## [10] digest_0.6.27 useful_1.2.6 htmltools_0.5.2
## [13] fansi_0.5.0 magrittr_2.0.1 memoise_2.0.1
## [16] tensor_1.5 cluster_2.1.2 ROCR_1.0-11
## [19] tzdb_0.2.0 Biostrings_2.58.0 globals_0.14.0
## [22] modelr_0.1.8 matrixStats_0.59.0 pkgdown_2.0.2
## [25] spatstat.sparse_2.0-0 colorspace_2.0-2 rvest_1.0.2
## [28] ggrepel_0.9.1 textshaping_0.3.6 haven_2.4.3
## [31] xfun_0.27 crayon_1.5.0 jsonlite_1.7.2
## [34] spatstat.data_2.1-2 survival_3.2-11 zoo_1.8-9
## [37] glue_1.4.2 polyclip_1.10-0 gtable_0.3.0
## [40] zlibbioc_1.36.0 XVector_0.30.0 seqinr_4.2-8
## [43] leiden_0.3.9 future.apply_1.8.1 BiocGenerics_0.36.1
## [46] abind_1.4-5 scales_1.1.1 DBI_1.1.2
## [49] miniUI_0.1.1.1 Rcpp_1.0.7 viridisLite_0.4.0
## [52] xtable_1.8-4 reticulate_1.20 spatstat.core_2.2-0
## [55] stats4_4.0.5 htmlwidgets_1.5.4 httr_1.4.2
## [58] RColorBrewer_1.1-2 ellipsis_0.3.2 ica_1.0-2
## [61] farver_2.1.0 pkgconfig_2.0.3 uwot_0.1.10
## [64] sass_0.4.0 dbplyr_2.1.1 deldir_0.2-10
## [67] utf8_1.2.1 labeling_0.4.2 tidyselect_1.1.1
## [70] rlang_0.4.10 reshape2_1.4.4 later_1.2.0
## [73] munsell_0.5.0 cellranger_1.1.0 tools_4.0.5
## [76] cachem_1.0.6 cli_3.1.1 generics_0.1.2
## [79] ade4_1.7-18 broom_0.7.12 ggridges_0.5.3
## [82] evaluate_0.14 fastmap_1.1.0 yaml_2.2.1
## [85] ragg_1.2.1 goftest_1.2-2 knitr_1.37
## [88] fs_1.5.2 fitdistrplus_1.1-6 RANN_2.6.1
## [91] pbapply_1.5-0 future_1.24.0 nlme_3.1-152
## [94] mime_0.11 xml2_1.3.3 compiler_4.0.5
## [97] rstudioapi_0.13 plotly_4.10.0 png_0.1-7
## [100] spatstat.utils_2.2-0 reprex_2.0.1 bslib_0.3.1
## [103] stringi_1.7.4 highr_0.9 RSpectra_0.16-0
## [106] desc_1.4.0 lattice_0.20-44 Matrix_1.3-4
## [109] vctrs_0.3.8 pillar_1.7.0 lifecycle_1.0.1
## [112] spatstat.geom_2.2-0 lmtest_0.9-38 jquerylib_0.1.4
## [115] RcppAnnoy_0.0.18 data.table_1.14.0 cowplot_1.1.1
## [118] irlba_2.3.3 httpuv_1.6.1 patchwork_1.1.1
## [121] R6_2.5.1 promises_1.2.0.1 KernSmooth_2.23-20
## [124] gridExtra_2.3 IRanges_2.24.1 parallelly_1.30.0
## [127] codetools_0.2-18 MASS_7.3-54 assertthat_0.2.1
## [130] rprojroot_2.0.2 withr_2.4.3 sctransform_0.3.3
## [133] S4Vectors_0.28.1 mgcv_1.8-36 parallel_4.0.5
## [136] hms_1.1.1 grid_4.0.5 rpart_4.1-15
## [139] rmarkdown_2.11 Rtsne_0.15 shiny_1.7.1
## [142] lubridate_1.8.0