The VGM_expand_featurebarcodes function function can be used to trace back the cell origin of each sample after using cell hashing for single-cell sequencing. Replaces the original sample_id column of a vgm object with a pasted version of the original sample_id and the last digits of the feature barcode.

The original sample_id is stored in a new column called original_sample_id. Additionally, a second new column is created containing final barcode assignment information. Those barcodes match the origin FB_assignment if by.majority.barcodes is set to FALSE (default). However, if this input parameter is set to TRUE, the majority barcode assignment in stored in this colum.

Note: The majority barcode of a cell is the feature barcode which is most frequently assigned to the cells clonotype (10x default clonotype). The majority barcode assignment can be used under the assumption that all cells which are assigned to the same clonotype (within one sample), originate from the same donor organ or at least the same donor depending on the experimental setup.

For example: The original sample_id of a cell is "s1", the cell belongs to "clonotype1" and the feature barcode assigned to it is "i1-TotalSeq-C0953". If by.majority.barcodes default (FALSE) is used, the resulting new sample_id would be "s1_0953". However, if majority barcode assignment is used AND "i1-TotalSeq-C0953" is not the most frequently occurring barcode in "clonotype1" but rather barcode "i1-TotalSeq-C0951", the new sample_id would be "s1_0951". --> e.g., if 15 cells belong to clonotype1: 3 cells have no assigned barcode, 2 are assigned to "i1-TotalSeq-C0953" and 10 are assigned to "i1-TotalSeq-C0951" --> all 15 cells will have the new sample_id "s1_0951".

VGM_expand_featurebarcodes(
  vgm,
  by.majority.barcodes,
  integrate.in.gex,
  vdj.only,
  platypus.version
)

Arguments

vgm

VGM output of VDJ_GEX_matrix function (Platypus V3)

by.majority.barcodes

Logical. Default is FALSE. Indicated whether strict barcode assignment or majority barcode assignment should be used to create the new sample_id. If TRUE, for each clonotype the most frequent feature barcode will be chosen and assigned to each cell, even if that cell itself does not have this particular barcode assigned.

integrate.in.gex

Logical. Default is FALSE. If TRUE, the newly created sample_id's are integrated into gex component as well. Not recommended if no further gex analysis is done due to much longer computational time.

vdj.only

Logical. Defines if only vdj information is provided as input. Default is set to FALSE. If set to TRUE a vdj dataframe has to be provided as input (vgm = vdj). Also, integrate.in.gex is automatically set to FALSE since no gex (vgm[[2]]) information is provided.

platypus.version

This function works with "v3" only, there is no need to set this parameter.

Value

This function returns a vgm with new sample_id's in case vdj.only is set to FALSE (default). If vdj.only is set to true only the vdj dataframe with new sample_id's is returned. Note: If vdj.only is set to default (FALSE), VDJ information in the metadata of the GEX object is necessary. For this set integrate.VDJ.to.GEX to TRUE in the VDJ_GEX_matrix function

Examples

#For Platypus version 3

# 1. If only vdj data (vgm[[1]]) and
#strict feature barcode assignment is used:
vgm_expanded_fb <- VGM_expand_featurebarcodes(
vgm = small_vgm[[1]],
by.majority.barcodes = FALSE,
integrate.in.gex=FALSE, vdj.only= TRUE)

# 2. If whole vgm and strict fb assignment is used
#(gex and vdj - necessary if gene expression analysis
# of sub-samples is desired):
vgm_expanded_fb <- VGM_expand_featurebarcodes(
vgm = small_vgm,
by.majority.barcodes = FALSE,
integrate.in.gex=TRUE, vdj.only= FALSE)

# 3. If whole vgm and majority barcode assignement is used
#(gex and vdj) - necessary if gene expression analysis
#of sub-samples is desired):
vgm_expanded_fb <- VGM_expand_featurebarcodes(vgm = small_vgm,
by.majority.barcodes = TRUE,
integrate.in.gex=TRUE, vdj.only= FALSE)

#Note: Majority barcode assignment is recommended
#if the assumption that all cells within one clonotype
#originate from the same sample sub-group is feasible.