PROSITE Pattern syntax for PHI-BLAST searches:

Examples:

x(3) corresponds to x-x-x
x(2,4) corresponds to x-x or x-x-x or x-x-x-x
A(3) corresponds to A-A-A
Note: You can only use a range with 'x', i.e. A(2,4) is not a valid pattern element.

When a pattern is restricted to either the N- or C-terminal of a sequence, that pattern either starts with a `<' symbol or respectively ends with a `>' symbol. In some rare cases (e.g. PS00267 or PS00539), '>' can also occur inside square brackets for the C-terminal element. 'F-[GSTV]-P-R-L-[G>]' means that either 'F-[GSTV]-P-R-L-G' or 'F-[GSTV]-P-R-L>' are considered.

Extended syntax allowed:

If your pattern consists of one-letter amino acid codes only, without any ambiguous residues, you need not specify the '-', i.e. you can directly copy/paste peptide sequences into the text field.
Example: M-A-S-K-E can be written as MASKE.

To search all sequences which do not contain a certain amino acid, e.g Cys, you can use <{C}*>.

Examples:

[AC]-x-V-x(4)-{ED}
This pattern is translated as: [Ala or Cys]-any-Val-any-any-any-any-{any but Glu or Asp}

< A-x-[ST](2)-x(0,1)-V
This pattern, which must be in the N-terminal of the sequence (`<'), is translated as: Ala-any-[Ser or Thr]-[Ser or Thr]-(any or none)-Val

<{C}*>
This pattern describes all sequences which do not contain any Cysteines.

IIRIFHLRNI
This pattern describes all sequences which contain the subsequence 'IIRIFHLRNI'.

 

Additional notes can be found in the Motifs and MotifSearch sections on the GCG manual

N.B. Standard symbols for amino acid and nucleotides can be found in Appendix III of the GCG manual.