This function takes the output from the VDJ_call_MIXCR function as input in the VDJ.mixcr.out argument and predicts the structure with Alpha Fold. From the VDJ.mixcr.out object the full length VDJ & VJ sequence containing all the frameworks and CDR's is used to predict the structure of the variable part with Alpha Fold multi. If the user has no access to the Euler function, the fucntion just returns a fasta file with the VDJ and VJ sequence, that can be used for running Alpha Fold on a Cluster. For users that have a login to the Euler cluster, this function will automatically connect to Euler and start Alpha Fold for all the indicated sequences. After the prediction is finished the same function can be used to import the predicted structures as a pdb file and add it to the input as a list object

AlphaFold_prediction(
  VDJ.mixcr.out,
  cells.to.predict,
  max.template.date,
  dir.name,
  fasta.storage.path,
  euler.user.name,
  rm.local.fasta,
  import,
  import.local.path,
  import.local.dirnames,
  euler.dirname,
  euler.dirpath,
  n.ranked,
  rm.euler.files,
  rm.local.output,
  output.path,
  antigen.fasta.path,
  fasta.directory.path,
  platypus.version
)

Arguments

VDJ.mixcr.out

Contains the output from the VDJ_call_MIXCR function with VJ_aa_mixcr and VDJ_aa_mixcr columns containing the full length amino acid sequence of Framework 1 - 4.

cells.to.predict

Here you can specify 10x barcodes for the cells of the VDJ.mixcr.out that should be used for structure prediction. It can be set to "ALL" if the antibody structure of all cells shall be predicted.

max.template.date

This is a parameter for running Alpha Fold and a date can be specified in the following format: "yyyy-mm-dd" This tells Aplhpa Fold which state of the databases it shall use.

dir.name

By default the function creates a directory named AlphaFold_Fasta, where the FASTA files created for prediction. The name of this directory can be changed by specifying the dir.name argument.

fasta.storage.path

Here you can specify where the function saves the fasta files needed as an Alpha Fold input. By default files an 'AlphaFold_Fasta' directory with all the fasta files is created in the same directory as the R script runs.

euler.user.name

If running Alpha Fold on Euler is requested, the user name needs to be specified in this parameter. Make sure that you have access to GPU usage on the Cluster. You will be prompted to enter you password by the "ssh" package which handles your credentials in a safe manner.

rm.local.fasta

Here you can specify if the local AlphaFold_Fasta directory shall be deleted from your local computer after uploaded to the scratch on Euler. By default it is set to TRUE, to keep your environment clean. If the function is not used in the Euler modus it is set to FALSE, so you will have the fasta files as an output.

import

This argument is for telling the function to import predicted structures. It is by default set to FALSE, which will initiate prediction not import. There are two options for importing predicted structures: Import = "euler" will start a connection to Euler and import the pdb files from the "AlphaFold_Fasta/output" directory. Import = "local" will import the pdb files form a local directory.

import.local.path

If import = "local" is used you can specify the path to the AlphaFold_Output directory here. By default it is expected in the same directory as the r script runs.

import.local.dirnames

If import = "local" is used the function expects a directory named 'Output_AlphaFold' in the same directory as the script runs. In case you do not wanna import all the pdb files off all samples in the 'Output_AlphaFold directory you can specify a sub directories in the import.local.dirnames parameter. (import.local.dirnames = c(s4_AGCCTAATCCCTTGCA-1_ranked,s4_CCCATACCACGTTGGC-1_ranked,...))

euler.dirname

If import = "euler" is used the name of the directory containing the Alpha Fold output directory can be specified in euler.dirname. It is set to "AlphaFold_Fasta" by default and is expected to be on your scratch.

euler.dirpath

If import = "euler" is used the path to the directory containing the Alpha Fold output folder can be specified in euler.path. By default the function expects the output in the AlphaFold_Fasta directory on your scratch. In case you wanna import the data from a different location you can specify the path here. The function expects a sub directory named output which contains sub directories named after the specific barcodes. (../scratch/AlphaFold_Fasta/output/s4_AGCCTAATCCCTTGCA-1/)

n.ranked

Alpha Fold returns 21 predictions for each sequence which are ranked for 0:20. The ranked_0 is the most accurate according to the model. Here you can specify how many of the top ranked structures are added to the output object. By default only the most accurate structure 'ranked_0.pdb' is integrated.

rm.euler.files

Here you can specify if the files on Euler shall be deleted after importing them. It is set to FASLE by default to reduce the risk of unintentionally deleting the predictions. However, make sure to keep you scratch environment clean.

rm.local.output

Here you an specify if the downloaded output folder from Euler shall be deleted after the import. It is set to true by default to keep you environment clean.

output.path

If the data is downloaded from the cluster it is by default stored in a sub folder in the current directory. If the data should be downloaded at a different location this can be specified in the output.path.

antigen.fasta.path

It can be of interest to predict the antibody structure together with the antigen to see interaction. For this purpose a path to a FASTA file containing the amino acid sequence of the antigen can be specified in the antigen.fasta.path argument. This will add the antigen sequence to every antibody prediction.

fasta.directory.path

The prediction function can also be used to predict structure directly from amino acid FASTA files without specifying a the VDJ.mixcr.out argument. For this the path to a directory, congaing all the FASTA files of interest can be specified in the fasta.directory.path argument. The files just need to have the .fasta extension. If multiple FASTA files are in the directory, the function will predict all of them separately.

platypus.version

This function is not directly depended on other Platypus functions but was developed to be compatible with v3.

Value

This function returns a list with the VDJ.mixcr.out in the first element and a list of pdb files as a second element

Note

For running Alpha Fold on Euler, the user needs to have access to GPU usage. This is automatically activated if one is part of the Reddy Euler Group.

If running prediction on Euler, the function will create a "AlphaFold_Fasta" directory in sour scratch on the cluster where all the fasta files are uploaded. The output files will be saved as well in this directory.

Examples