This PlatypusML_feature_extraction_GEX function takes as input specified features from the second output of the VDJ_GEX_matrix function and encodes according to the specified strategy. The function returns a matrix containing the encoded extracted features as columns and the different cells as rows. This function should be called as a first step in the process of modeling the VGM data using machine learning.

PlatypusML_feature_extraction_GEX(
  VGM,
  encoding.level,
  unique.sequence,
  which.features,
  n.PCs,
  which.label,
  problem,
  verbose.classes,
  platypus.version
)

Arguments

VGM

output of the VDJ_GEX_matrix function, containing both VDJ and GEX objects.

encoding.level

String. Specifies on which level the features will be extracted. There are three possible options: "clone" (one random sample per clone), "clone.avg" (average expression per clone), "unique.sequence" (selecting only unique sequences based on a specified sequence (in the unique.sequence argument)). Defaults to "clone.avg".

unique.sequence

String. Needs to be specified only when encoding.level is set to "unique.sequence". The name of the sequence on which unique selection should be based on. Defaults to "VDJ_cdr3s_aa".

which.features

String. Information on which GEX features should be encoded. Options are "varFeatures" (the 1000 most variable features obtained by Seurat::FindVariableFeatures) or "PCs" (the top n PCs, number of PCs to be defined in n.PCs). Defaults to "PCs".

n.PCs

Integer. Number of PCs to be used if choosing which.features == "PCs". Max 50. Defaults to 20.

which.label

String. The name of the column in VGM[[2]] which will be appended to the encodings and used as a label in a chosen ML model later. The label has to be a binary label. If missing, no label will be appended to the encoded features.

problem

String ("classification" or "regression"). Whether the return matrix will be used in a classification problem or a regression one. Defaults to "classification".

verbose.classes

Boolean. Whether to display information on the distribution of samples between classes. Defaults to TRUE. For this parameter to be set to TRUE, classification must all be set to TRUE (default).

platypus.version

This function works with "v3" only, there is no need to set this parameter.

Value

A dataframe containing the encoded features and its label, each row corresponding to a different cell. The label can be found in the last column of the dataframe returned. If which.label="NA" only the encoded features are returned.

Examples