In a nutshell, what does ReKINect do?
ReKINect is a computational framework that predicts the likely functionality of mutations.
What do you mean with "likely functionality"? Can you be more specific?
ReKINect will map mutations to different functional residues and, by doing so, will not only predict how these functional residues might be affected by mutations but also propose a likely functional impact of each mutation to phosphorylation-based signaling networks (Kinases, SH2 proteins and substrates). As in any computational prediction, there might be instances where these predictions turn out to be false or to have a different effect than expected (e.g. if a given mutation is heterozygous -especially relevant for inactivations, unless there is haploinsufficiency-, or cells have adapted to a given mutation, the predicted effect might be masked). Anyhow, we still highly recommend ReKINect as a valuable source to explore potential functional mutations.
Ok, so what type of functional residues are currently been used?
We are currently mapping onto 3 different types of functional residues:
A - Essential residues in protein domains (Kinase and SH2 domains).
B - Determinants of specificity in protein domains (Kinase and SH2 domains).
C - Phosphorylation sites in any protein.
And what kind of predictions can ReKINect produce based on this mapping?
Based on this information, ReKINect can predict five mutational functionalities:
2 - Kinase constitutive activation.
3 - Downstream rewiring (i.e. changes in domain specificity).
4 - Upstream rewiring (i.e. changes in the kinase or SH2 domain phosphorylating or binding the mutated peptide).*
5 - Destruction of phosphorylation sites.
*Upstream rewiring requires NetworKIN and/or NetPhorest predictions. If you are interested in this, you can either try to run them online or get in touch with us.
For further information on ReKINect and the methodology behind our predictions, please refer to our publication (Creixell et al. submitted).
Ok, all this sounds interesting, how can I submit my sequencing results?
As illustrated below, any input file submitted to ReKINect has to follow follow what we call mutant fasta file format, where...
- every single amino-acid residue missense mutation is reported under a separate fasta header.
- every single amino-acid residue missense mutation has a paired reference wild-type sequence.
- there are special separators in the header that describe whether the sequence is wild type (e.g. "_reference") or mutant (e.g. ":xxx"). The protein identificators before these separators must be the same between every wild type sequence and all mutants so that wild type sequences can be recognized for each mutant.
But my sequencing results are in vcf format. How can I generate a mutant fasta file from a vcf file?
Ideally, you (or somebody else with some degree of bioinformatics expertise) should be able to map the coordinates of every mutation in your vcf file to a unique canonical protein sequence. This can be done using ensembl's VEP service. The ProteinSeqs plugin would be particularly helpful in this case.
Why is it critical that every mutation only maps to a unique canonical protein?
In order to avoid the incorrect mapping of phosphorylation sites, ReKINect matches its large set of known phosphorylation peptides to all the reference (i.e. wild type) sequences given. Being as conservative as possible, any peptide that doesn't match any reference protein or that matches several reference proteins at the same time will be discarded. Thus, introducing several isoforms of the same protein as separate entries would lead to many phosphorylation sites being discarded from the analysis, due to them matching more than one reference sequence at the same time.
And in what format will I get my results back?
ReKINect output files contain a large amount of information for any mutation that maps onto kinases, SH2-proteins or around a phosphorylation site (Mutations not mapping onto any of these will not appear in the output file). This information is reported in one line per mutation using the following structure:
- The first column contains the original fasta header for the mutant protein (so that one can back-track to the protein sequence).
- The second column contains a summary of the interpretation and the final prediction given by ReKINect (this is probably the most important column and some users may not be interested in the more detailed information given in subsequent columns).
- The third, fourth and fifth column contain phosphorylation site information about whether the mutation hits a residue near a phosphorylation site (from -5 to +5), whether it destroys the phosphorylation site (hits the phosphorylation site itself) or is predicted to mimick the phosphorylation state of the protein (acidic mutation in positions -1, 0 or +1), respectively.
- Columns six to twelve contain kinase information about whether the mutation hits a kinase protein (column 6), and if so, its name (column 7); about whether it hits the kinase domain of the kinase protein (column 8), and if so, what position of the kinase domain alignment it hits (column 9); and finally, about whether it hits an essential residue of the kinase domain (column 10), a determinant of specificity (column 11) or the activation segment (column 12) of the kinase domain.
- And last, columns thirteen to eighteen contain SH2 information about whether the mutation hits an SH2 protein (column 13), and if so, its name (column 14); about whether it hits the SH2 domain of the SH2 protein (column 15), and if so, what position of the SH2 domain alignment it hits (column 16); and finally, about whether it hits an essential residue of the SH2 domain (column 17) or a determinant of specificity (column 18).
*Please note when trying to refer to columns that the numbering in the header of the output file starts from 0.
(C) 2013-2015 Pau Creixell and Rune Linding / Web-development: Pau Creixell and Xavier Robin