"Active site" Record
The Active site record is applied to residues of enzymes known or
thought to
function in the actual catalytic reaction of the enzyme. It should be
applied
to a single residue or a short list of residues; it should not be
applied to a
range (a hyphenated pair). If the active site residues are not
specifically
known but have been localized to a segment of the sequence, the
"Region" record
rather than the "Active site" record should be used. "Active
site" features in
entries without an Enzyme Commission notation in either their title or
"Contains" records are suspect and will be flagged as possible errors.
The format for the "Active site" record is
"Active site: "res ["," res...]
["("description")"] ["#link" link]"#status
" status
The status is required for this feature. All the residues participating in each active site that do not
require different modifiers, should be combined in the same feature. Do
not combine residues from different active sites or that need different
modifiers. The use of description fields, discussed below, should be avoided if
possible.
Examples:
Active site: Arg #status experimental
Active site: Asp, His, Ser #status predicted
Active site: His, His, Asp #status experimental
A residue list may be used only for those residues which participate in
the same concerted catalytic reaction. If all the residues participating in
one active site are the same type, then only one residue need be shown.
Enzymes recognized to have several distinct catalytic reactions should have an
"Active site" record for each active site. Multiple "Active site"
records for what is, in fact, a single active site should be combined into one record
using a list of residues, unless different status conditions apply.
[GRAY] Formerly, mechanisms were presented but this should no
longer be done except when the mechanism is used as a description. Generally such a
description should be applied only when multiple active sites occur in
the same
entry. [BLACK] In particular, the description "charge relay
system" should not be used except in enzymes with multiple
activities.
Examples currently used are:
Active site: Cys (amide transfer)
Active site: Cys (of 3-oxoacyl-[acyl-carrier-protein] synthase)
Active site: Lys (of 3-oxoacyl-[acyl-carrier-protein] reductase)
Active site: Lys (of enoyl-[acyl-carrier-protein] reductase)
Active site: Ser (of enoyl-[acyl-carrier-protein] reductase)
Active site: Ser (of oleoyl-[acyl-carrier-protein] hydrolase)
Active site: Ser (of [acyl-carrier-protein] acetyl/malonyltransferase)
Active site: Glu (alpha-reaction)
Active site: His, Lys, Cys (beta-reaction)
Descriptors like these may be replaced with "#link"
modifiers which point to tags in appropriate Function records, or Domain or Product
features.
Active site: Cys #link ARD #status predicted
Here the link "ARD" points to a Function record with the tag
"<ARD>". This
mechanism will also be used to link active site records with different
status conditions but which belong to the same active site object.
When a residue has a stable, covalently-bound, catalytically-active
prosthetic group, only the "Binding site: ... (covalent)" feature should be
used. An "Active site" record should not also be used because it is the
prosthetic group which is active and not the amino acid as such. In particular, for an
active site phosphoserine only the annotation:
Binding site: phosphate (Ser) (covalent) #status
experimental
should appear. When a residue forms a transient, covalent bond in its
role as an active site then the "Active site" record should be used and
the description field may be used. The nature of the intermediate should be made as
clear as practical. Annotators should consider carefully whether a
covalently-bound group is stable or transient in determining whether an annotation should
be for a modified or an active site. The following possible features show
active sites with transient groups that could easily be confused with a binding
site.
Active site: Ser (phosphoserine intermediate)
Active site: Tyr (phosphotyrosine intermediate)
No examples yet exist of the second feature. Other current acceptable
examples are:
Active site: Asp (aspartylphosphate intermediate)
Active site: Cys (phosphocysteine intermediate)
Active site: Cys (S-acetylcysteine intermediate)
Active site: Cys (sulfocysteine intermediate)
Active site: His (phosphohistidine intermediate)
Active site: Lys (ribulose-bisphosphate-binding)
Most of these features are documented in the RESID database. Avoid
records that are unnecessarily detailed or are synonymous with existing
features, like:
Active site: His (covalent intermediate)
Active site: Asp (phosphate-binding)
Be particularly suspicious of claims that Gly, Val, Leu, Ile, Pro, Asn,
Gln, Pro, Met or Phe residues are active site residues. It is chemically
dubious that such residues function in the actual catalytic reaction of an
enzyme. Glycine and a few other residues can form free radicals that participate
in free radical reactions, but for physical reasons such reactions are
extremely rare in biochemical reactions.
Current examples are:
Active site: Cys (cysteine thiyl radical intermediate)
Active site: Gly (stable glycyl radical)
Active site: Trp (tryptophyl radical intermediate)
Active site: Tyr (stable tyrosyl radical)
These features are documented in the RESID database.
Residues that are structurally located near an active site but do not
participate directly in the catalytic reaction of that active site
should not be annotated in the PIR databases. Annotations for such residues will
only be carried from PDB entries in the NRL_3D database. Not all reactive
compounds that block an enzymatic reaction wind up reacting with an active site
residue; they may react with a residue near the active site and block the
substrate's access to the active site. Something may be more of a "reactive
site" than an "active site", so be cautious about accepting this as experimental evidence
for active site residues.
For cysteine residues that form catalytically active disulfide bonds
only the annotation
Disulfide bonds: redox-active should appear.
Even though selenocysteine may function as an active site, only the
feature
Modified site: selenocysteine should be used.
Residues that participate in allosteric control of enzyme activity but
are not catalytically active should not be annotated as active sites but as
binding sites or as regions. Residues that participate in different, symmetry-related active
sites of complexes should not be combined in the same feature, but an
appropriate description should be used to indicate the relationship.
Active site: Asp (shared with dimeric partner)
Active site: Cys (shared with dimeric partner)
These features imply that there are two symmetry-related active sites.
Each site consists of an aspartate and a cysteine contributed by different
chains of the homodimer.
[BLACK] The annotation Active site: ... inhibitory ...
should not be used. Instead, use the annotation
"Inhibitory site: "
[BLACK] Do not use the term "active site" in either
"Domain" or "Region" features. Instead,
use the term "catalytic".
Back to Top
General Definitions for Binding Sites and Modified Sites
In binding sites and modified sites, the following definitions are very
important. Because they include historical accidents and
grammatical exigencies, these are operational definitions and do not
necessarily extend beyond the purposes of this document.
Generally, an attachment site is an amino acid residue which has its
side chain chemically changed post-translationally in such a way that it could be
restored by physiological processes of hydrolysis, ammonolysis or simple (2H)
reduction. Such chemical changes may occur transiently, or more or less
permanently, but they must be covalent. The principle is that attachment site residues
could in principle be recovered and detected by typical methods of sequence
analysis, whereas modified sites could not be.
The "Binding site" feature includes two classes, attachment sites
and binding sites. A "binding site" is an amino acid residue, or
a group of them, that forms biochemically important, non-covalent bonds
with ions or molecules (other than the protein constituting the entry).
These bonds may be ionic, ligand (dative), Van der Waals, or donative or
receptive hydrogen bonds. One borderline case is the sulfur-metal bond which will be regarded as
covalent for cysteine when a cluster of atoms is bound, and non-covalent (dative ligand).
Methionine sulfur-metal bonds will be regarded as non-covalent (dative ligand). attachment sites
will distinguished by using "(covalent)" in "Binding site"
records. All new "Binding sites" without "(covalent)" are
reviewed and subject to conversion. Consequently it is very important for annotators
to provide the "(covalent)" designation in every case when it
should be applied.
A "modified site" is an amino acid residue which is either
- chemically changed post-translationally in such a way that it could
not be restored by physiological processes of hydrolysis, ammonolysis or simple
(2H) reduction (that is, it is not a side-chain attachment site),
- chemically changed in any way involving the alpha amino group,
including N-formylmethionine (this applies to both the amino terminus and internal
residues),
- a carboxyl terminal residue with any chemical change involving the
alpha-carboxyl group,
- a selenocysteine residue (these are translationally incorporated but
for historical reasons are regarded as modified cysteine residues);
- aspartate or glutamate esters that can arise from either the acid or
the amide forms.
Back to Top
"Binding Site" Record
Using the foregoing definitions "Binding site" records are
applied in two cases:
- when an amino acid residue, or a group of them, forms biochemically
important, non-covalent bonds with ions or molecules (other than the
protein constituting the entry); or
- when an amino acid residue forms an attachment site in which its
side chain is chemically changed post-translationally in such a way that it
could in principle be restored by physiological processes. Such cases
must have a "(covalent)" bond description.
The format for the "Binding site" record is
("Binding site:" ["(or" position ")"]
bound-group name "(" res ["," res...] ")"
["(covalent)" | "(" bonding description ")"]
["(" form ")"]
["(partial)"] ["#link " link] "#status " status
The status is required for this feature.
Currently acceptable covalent examples are listed below. The status, link
and partial descriptors have been removed, and a few minor variants have
been eliminated. Most of these features are documented in the Residues
database.
Binding site: 2Fe-2S cluster (Cys) (covalent)
Binding site: 2Fe-2S cluster (Cys, His, Cys, His) (covalent)
Binding site: 3Fe-4S cluster (Cys) (covalent)
Binding site: 4-hydroxycinnamyl (Cys) (covalent)
Binding site: 4Fe-4S cluster (Cys) (covalent)
Binding site: 4Fe-4S cluster (Cys) (covalent) (shared with dimeric partner)
Binding site: 4Fe-4S cluster 1 (Cys) (covalent)
Binding site: 4Fe-4S cluster 2 (Cys) (covalent)
Binding site: AMP (Tyr) (covalent)
Binding site: UMP (Tyr) (covalent)
Binding site: acetyl (Lys) (covalent)
Binding site: biotin (Lys) (covalent)
Binding site: carbohydrate (Asn) (covalent)
Binding site: carbohydrate (Asn) (covalent) (in ...)
Binding site: carbohydrate (Cys) (covalent)
Binding site: carbohydrate (Lys) (covalent)
Binding site: carbohydrate (Ser) (covalent)
Binding site: carbohydrate (Thr) (covalent)
Binding site: carbohydrate (Trp) (covalent)
Binding site: carbohydrate (Tyr) (covalent)
Binding site: carbon dioxide (Lys) (covalent) (by ...)
Binding site: chondroitin sulfate (Ser) (covalent)
Binding site: cysteine (Cys) (covalent)
Binding site: cysteine (Cys) (covalent) (in ...)
Binding site: dermatan sulfate (Ser) (covalent)
Binding site: farnesyl (Cys) (covalent)
Binding site: fatty acid (Ser) (covalent)
Binding site: fatty acid (Thr) (covalent)
Binding site: formyl (Lys) (covalent)
Binding site: geranyl-geranyl (Cys) (covalent)
Binding site: glutathione (Cys) (covalent)
Binding site: glycerylphosphorylethanolamine (Glu) (covalent)
Binding site: heme (Cys) (covalent)
Binding site: heme (Glu) (covalent)
Binding site: heme, high potential (Cys) (covalent)
Binding site: heme, low potential (Cys) (covalent)
Binding site: heparan sulfate (Ser) (covalent)
Binding site: homocitryl Mo-7Fe-8S cluster (Cys) (covalent)
Binding site: keratan sulfate (Thr) (covalent)
Binding site: lipoamide (Lys) (covalent)
Binding site: methyl (Cys) (covalent)
Binding site: molybdopterin (Cys) (covalent)
Binding site: molybdopterin guanine dinucleotide (Cys) (covalent)
Binding site: murein (Lys) (covalent)
Binding site: myristate (Lys) (covalent)
Binding site: nitrosonium (Cys) (covalent)
Binding site: palmitate (Cys) (covalent)
Binding site: palmitate (Lys) (covalent)
Binding site: phosphate (Arg) (covalent)
Binding site: phosphate (Asp) (covalent)
Binding site: phosphate (His) (covalent)
Binding site: phosphate (His) (covalent) (by ...)
Binding site: phosphate (Ser) (covalent)
Binding site: phosphate (Ser) (covalent) (by ...)
Binding site: phosphate (Ser) (covalent) (in ...)
Binding site: phosphate (Thr) (covalent)
Binding site: phosphate (Thr) (covalent) (by ...)
Binding site: phosphate (Tyr) (covalent)
Binding site: phosphate (Tyr) (covalent) (by ...)
Binding site: phosphopantetheine (Ser) (covalent)
Binding site: phosphoribosyl dephospho-coenzyme A (Ser) (covalent)
Binding site: phosphoryl-DNA (Ser) (covalent)
Binding site: phosphoryl-DNA (Thr) (covalent)
Binding site: phosphoryl-DNA (Tyr) (covalent)
Binding site: phosphoryl-RNA (Ser) (covalent)
Binding site: phosphoryl-RNA (Tyr) (covalent)
Binding site: phycocyanobilin (Cys) (covalent)
Binding site: phycoerythrobilin (Cys) (covalent)
Binding site: phytochromobilin (Cys) (covalent)
Binding site: polyglutamate (Glu) (covalent)
Binding site: polyglycine (Glu) (covalent)
Binding site: pyridoxal phosphate (Lys) (covalent)
Binding site: retinal (Lys) (covalent)
Binding site: sn-2,3-diacylglycerol (Cys) (covalent)
Binding site: sn-2,3-diphytanylglycerol diether (Cys) (covalent)
Binding site: sulfate (Tyr) (covalent)
Binding site: vanadium cofactor (Cys) (covalent)
Binding site: iron-sulfur clusters (Cys) (covalent)
[use this only when the cluster form has not been determined
and cannot be predicted]
A large variety in the "(by ...)" descriptor exists.
Please consult the database to determine currently used forms.
Examples of currently acceptable "Binding site" features not
labeled "covalent" are listed below. The residue lists (in all but a few cases), status, link
and partial descriptors have been removed, and a few minor variants have
been eliminated.
[the following with one locant]
Binding site: heme iron (His) (axial ligand)
Binding site: heme iron (His) (axial ligand) (shared with alpha
chain)
Binding site: heme iron (His) (axial ligand) (shared with beta
chain)
[the following with two locants]
Binding site: heme iron (His) (axial ligands)
Binding site: heme iron (His) (proximal axial ligand)
Binding site: heme iron (His, Met) (axial ligands)
Binding site: heme iron (Met, His) (axial ligands)
Binding site: heme iron (Tyr) (axial ligand)
Binding site: heme iron, high potential (His) (axial ligand)
Binding site: heme iron, high potential (His) (axial ligands)
Binding site: heme iron, high potential (His, Met) (axial ligands)
Binding site: heme iron, high potential (His, Tyr) (axial ligands)
Binding site: heme iron, low potential (His) (axial ligand)
Binding site: heme iron, low potential (His) (axial ligands)
Binding site: heme iron, low potential (His, Tyr) (axial ligands)
Binding site: heparin
Binding site: histamine
Binding site: homocitryl Mo-7Fe-8S cluster molybdenum (His) (ligand)
Binding site: iron
Binding site: iron (Asp) (shared with tetrameric partners)
Binding site: iron (His) (shared with chain M)
Binding site: iron (His, Glu, His) (shared with chain L)
Binding site: iron (Lys) (shared with tetrameric partners)
Binding site: magnesium
Binding site: magnesium (Glu) (shared with chain I)
Binding site: magnesium (His) (shared with chain II)
Binding site: manganese
Binding site: mercury
Binding site: metal
Binding site: methylcobalamin cobalt
Binding site: micellar substrate
Binding site: molybdopterin (Arg)
Binding site: molybdopterin cytosine dinucleotide (Arg)
Binding site: nickel
Binding site: nickel 1
Binding site: nickel 2
Binding site: omega-aminocarboxylic acids
Binding site: oxygen (His) (distal axial ligand)
Binding site: oxygen (Tyr) (distal axial ligand)
Binding site: phospholipid
Binding site: plastoquinone
Binding site: potassium
Binding site: pyrophosphate
Binding site: retinoic acid
Binding site: siroheme iron (Cys) (axial ligand)
Binding site: substrate
Binding site: substrate phosphate
Binding site: thyroxine
Binding site: transition metal ions
Binding site: ubiquinone
Binding site: zinc
Binding site: zinc, catalytic
[see note below on the next two]
Binding site: zinc, catalytic (Cys, His, His, His) (inhibited)
Binding site: zinc, catalytic (His) (active)
Binding site: zinc, high affinity
Binding site: zinc, noncatalytic
All these have been reviewed. If a reference is encountered that
discusses the covalent nature of one of these binding sites, please bring it to the
attention of us.
Be careful when you encounter a binding site established by a reactive analoguethese are designed to form covalent bonds
when the actual compound may be bound noncovalently. None of the former features
Binding site: ATP (Lys) (covalent)
were ever actually covalent!
An alternate locant may be placed after the "Binding site" and
before the bound group name.
Binding site: (or 150) phosphate (Ser) (covalent) #status
experimental but this form should be avoided if at all
possible.
The bound-group name must always be followed by a set of parentheses
inclosing a residue or a list of residues that matches sequence residues
corresponding to the preceding numbers. Strict parsing is enforced for
this rule. If all the residues participating in one binding site are the
same type, then only one residue need be shown, for example:
Binding site: calcium (Asp)
The only bonding descriptions presently used are "covalent",
"axial ligand",
"axial ligands", "proximal axial ligand" and "distal axial
ligand". For these
ligand cases, care must be taken in specifying the bound entity:
"heme iron" rather than simply "heme".
Binding site: heme iron (His, Met) (axial
ligands)
Covalent bonds to heme and similar prosthetic groups are to the group
and not to the metal.
Binding site: heme (Cys) (covalent)
Also, use ligand if there is only one locant in the feature, and
ligands if there are two or more locants even though they are all the same type
of residue and one residue is shown. Thus,
44/Binding site: heme iron (His) (axial ligand)
44,68/Binding site: heme iron (His) (axial ligands)
The second feature has two locants, "44,68", but only one
residue, "His", and "ligands" is used.
When a particular binding site occurs in both an active and an inhibited
form, binding site records should appear for both forms:
Binding site: zinc, catalytic (Cys, His, His, His) (inhibited)
Binding site: zinc, catalytic (His) (active)
In this pair of records, the first denotes the inhibited binding site
with a Cys ligand from a propeptide, and the second denotes the active
binding site with only the three His ligands of the enzyme.
A single substrate may be listed simply as "substrate". For
multiple substrates, other than water, in the same entry the substrate may be
named.
Binding site: substrate (Arg) Binding site: fructose-1,6-bisphosphate (Lys)
(covalent)
When it is experimentally observed that a group is covalently bound
at less than 95 mole per cent, the "(partial)" annotation should be
used. [BLACK] A numeric percentage or some other fractional indication should not be
used. If the covalent binding is 95 mole percent or greater, don't use the
"(partial)" annotation. If the "(partial)" annotation is used, it will
almost always be based on an experimental observation so the "#status
experimental" status should also appear; [BLACK] do not use "(partial) #status
predicted".
The "in" form should be used very sparingly when the
covalent bond is known to occur only in the mature form or in one of
several alternative polypeptide products and the entry presents an
immature sequence.
Binding site: carbohydrate (Asp) (covalent) (in mature
form)
Binding site: phosphopantetheine (Ser) (covalent) (in acyl carrier
protein)
These may be replaced by appropriate "#link" descriptors.
The "by" form is used to distinguish among different binding sites
of the same group, for example:
Binding site: phosphate (Ser) (covalent) (by autophosphorylation)
Binding site: phosphate (Ser) (covalent) (by
Ca/calmodulin-dependent kinase)
Binding site: phosphate (Ser) (covalent) (by cAMP-dependent protein
kinase)
Binding site: phosphate (Tyr) (covalent) (by autophosphorylation in
vivo)
Binding site: phosphate (Tyr) (covalent) (by autophosphorylation in
vitro)
[GRAY] The use of the terms in vivo and in vitro
is questionable. If a feature is known to occur in vivo, it is what would otherwise be
regarded as an experimentally determined features and so the term is superfluous. If
a feature is known to occur in vitro, then even if it is experimentally
determined it only amounts to a prediction that the natural modification
might occur at that location and just the "#status predicted" status
is warranted. Alternatively, if an in vitro feature marks something that occurs
under unnatural conditions and the descriptor would only distinguish it from
the natural occurances, then a comment is warranted and not a feature (as
with the former "Binding site: carbohydrate (Gln)" features determined to
be unnatural). A feature marked both in vitro and "#status
predicted" would seem to have very little value under any circumstance.
Some covalent binding sites can occur only as a consequence of a prior
modification. These are nonetheless biochemically separate and distinct
features. For such cases we use two features, one to indicate the nature
of the modification and the other to indicate the secondary change.
For example:
42/Modified site: 5-hydroxylysine (Lys)
42/Binding site: carbohydrate (Lys) (covalent)
In the first step, a lysine is hydroxylated. It may or possibly may
not be subsequently glycosylated. If they were combined in a single feature,
there would be a problem using the partial modifier. Would it mean the
lysines at that position were partially hydroxylated but all the hydroxylysines
were glycosylated, or would it mean that the lysines were all hydroxylated
but that hydroxylysines were partially glycosylated. In the RESID database such
cases are indicated by the records:
Conditions: secondary to ...if a prior modificaiton is required "or"
Conditions: incidental to ...if it is not.
N6-acetylated lysine will be annotated as
Binding site: acetyl (Lys) (covalent)
[BLACK] Do not annotate it as
Modified site: N6-acetyllysine (Lys)
When there are biochemically significantly different binding sites for
the same compound in the same entry (rare), the bound-group name may include
modifiers that distinguish between the functional differences of the bound-group
or of the binding sites. These modifiers should be placed after the bound-group,
without parentheses and separated from it by a comma.
For example:
Binding site: calcium, high affinity
Binding site: calcium, low affinity
Binding site: heme, high-potential (Cys) (covalent)
Binding site: heme, low-potential (Cys) (covalent)
Binding site: heme iron, high-potential (His)
Binding site: heme iron, low-potential (His)
Binding site: zinc, catalytic
Binding site: zinc, noncatalytic
Otherwise, different binding sites are only distiguished by being
grouped in separate "Binding site" records and those binding sites should
not be labeled.
[GRAY] Do not use such features as:
Binding site: calcium 1
Binding site: calcium 2
except to distinguish structurally distinct features, and not
otherwise chemically indistinguishable sites.
Where the sequence was determined by protein sequencing and the nature
of the covalently attached group precludes assignment of a residue as either an
acid or an amide, and unless there is unequivocal evidence to the contrary
(for example, the nucleotide sequence), there is a reasonable biochemical
presumption that the residue should be the amide. The reported sequence
should be presented with the ambiguity explicit in the "Residues"
record, the amide presented in the sequence and feature records and an appropriate note
like
Note: we have shown the unidentified residue(s) as ... forming ...
(or bound to ...) based on ....
[GRAY] Concerted non-covalent binding of macromolecules by a set
of residues would probably best be annotated through a "Region" record
rather than through a "Binding site" record. Something like:
42-60/Region: DNA-binding should be
used instead of:
42,45,48,50,53,56,60/Binding site: DNA
(Leu)
Back to Top
"Inhibitory site" Record
The format for the Inhibitory Site record is
"Inhibitory site:" res ["," res...]
"(" activity ["," activity ...] ")"
"#status " status
An inhibitory site is to an inhibitor what an active site is to an
enzyme. It is the residue, or small set of residues, that is responsible for
blocking the activity of an enzyme or set of enzymes. It should be applied to single
residues, and to a small list of residues only sparingly. The status is
required for this feature. Without a crystallographic structure it is
very
difficult to obtain experimental evidence that a particular residue is
an inhibitory site, so most will have predicted status.
Some examples, with status omitted:
Inhibitory site: Arg (acrosin)
Inhibitory site: Arg (thrombin, coagulation factor Xa)
Inhibitory site: Arg (trypsin)
Inhibitory site: Arg (unknown proteinase)
Inhibitory site: Cys (thermolysin)
Inhibitory site: Leu (chymotrypsin)
Inhibitory site: Leu (chymotrypsin, elastase)
Inhibitory site: Lys (trypsin)
Inhibitory site: Met (chymotrypsin, subtilisin)
Inhibitory site: Tyr (chymotrypsin)
[GRAY] In the case that one of two residues is thought to be
responsible for the inhibitory action, the record may be applied to a list and this
format is used
"Inhibitory site:" res "or" res
"(" activity ["," activity ...] ")"
"#status " status
For example,
Inhibitory site: Leu or Met (elastin, chymotrypsin)
#status predicted
The "or" form should be avoided whenever possible.
[BLACK] The "Inhibitory site" record is not used for
allosteric inhibitor sites; those may be annotated as binding sites.
Back to Top
"Modified site" Record
The format for the Modified site record is
"Modified site: " ["(or" position")"] name "("res")"
["("form")"] ["("extent")"]
"#status" status
"res" is the three-letter code for the original
encoded residue (with the exception of selenocysteine and N-formylmethionine where no three-letter
code is used). The "or" form should be avoided whenever
possible. Different residues with the same feature can be combined. In cases when an
annotator wishes to distinguish the features belonging to different domain, or
product features more clearly, then the separate modified sites for the
different domains need not be combined, as with blocked amino- or
carboxyl-terminals. The status is required for this feature.
THESE FEATURES MUST BE DOCUMENTED IN THE RESID DATABASE.
Bring any new examples to the attention of us.
Back to Top
Modified Side Chains
In the most general case the side chain is chemically modified in such a
way that the original residue could not (in principle) be detected by normal
sequencing methods. The following is a list of such modified residues.
Modified site: (Z)-dehydrobutyrine (Thr)
Modified site: 2'-bromophenylalanine (Phe)
Modified site: 2'-glucosyl-tryptophan (Trp)
Modified site: 2'-[3-carboxamido-3-(trimethylammonio)propyl]histidine (His)
Modified site: 3',4'-dihydroxyphenylalanine (Tyr)
Modified site: 3'-bromophenylalanine (Phe)
Modified site: 3'-FAD-histidine (His)
Modified site: 3'-methylhistidine (His)
Modified site: 3-hydroxyphenylalanine (Phe)
Modified site: 3-hydroxyproline (Pro)
Modified site: 3-oxoalanine (Cys)
Modified site: 4'-bromophenylalanine (Phe)
Modified site: 4-hydroxyarginine (Arg)
Modified site: 4-hydroxylysine (Lys)
Modified site: 4-hydroxyproline (Pro)
Modified site: 5-hydroxylysine (Lys)
Modified site: 6-bromotryptophan (Trp)
Modified site: ADP-ribosylarginine (Arg) (by ...)
Modified site: ADP-ribosylasparagine (Asn) (by ...)
Modified site: ADP-ribosylcysteine (Cys) (by ...)
Modified site: ADP-ribosylserine (Ser) (by ...)
Modified site: allysine (Lys)
Modified site: arginine derivative (Arg)
Modified site: asparagine derivative (Asn)
Modified site: beta-methylthioaspartic acid (Asp)
Modified site: bromohistidine (His)
Modified site: citrulline (Arg)
Modified site: cysteine derivative (Cys)
Modified site: cysteine sulfenic acid (Cys)
Modified site: D-alanine (Ala)
Modified site: D-alanine (Ser)
Modified site: D-allo-isoleucine (Ile)
Modified site: D-asparagine (Asn)
Modified site: D-leucine (Leu)
Modified site: D-methionine (Met)
Modified site: D-phenylalanine (Phe)
Modified site: D-serine (Ser)
Modified site: D-tryptophan (Trp)
Modified site: dehydroalanine (Ser)
Modified site: dehydroalanine (Tyr)
Modified site: dehydrobutyrine (Thr)
Modified site: dehydrotyrosine (Tyr)
Modified site: erythro-beta-hydroxyasparagine (Asn)
Modified site: erythro-beta-hydroxyaspartic acid (Asp)
Modified site: gamma-carboxyglutamic acid (Glu)
Modified site: glutamate methyl ester (Gln)
Modified site: glutamate methyl ester (Glu)
Modified site: glutamine derivative (Gln)
[the following two ambiguous features should be avoided if possible]
Modified site: hydroxylysine (Lys)
Modified site: hydroxyproline (Pro)
Modified site: isoleucine derivative (Ile)
Modified site: lysine derivative (Lys)
Modified site: N4-methylasparagine (Asn)
Modified site: N5-methylglutamine (Gln)
Modified site: N6,N6,N6-trimethyllysine (Lys)
Modified site: N6,N6-dimethyllysine (Lys)
Modified site: N6-(4-amino-2-hydroxybutyl)lysine (Lys)
Modified site: N6-methyllysine (Lys)
Modified site: omega-N,omega-N-dimethylarginine (Arg)
Modified site: omega-N,omega-N'-dimethylarginine (Arg)
Modified site: omega-N-methylarginine
Modified site: S-(6-FMN)-cysteine (Cys)
Modified site: S-(8alpha-FAD)-cysteine (Cys)
Modified site: selenocysteine
Modified site: thyroxine (Tyr)
Modified site: topaquinone (Tyr)
Modified site: triiodothyronine (Tyr)
Modified site: tryptophyl quinone (Trp)
Whenever possible, new modified residues should be added with
substitution positions and stereo-isomer indicators provided in accordance with
appropriate IUPAC and IUB rules. Please bring any additional or new modified residues to the
attention of us.
[BLACK] Ambiguous notations such as
Modified site: methylation #status
predicted
should not be used.
We have chosen to use the unambiguous IUPAC numbered position forms, in
preference to the IUB Greek letter designations, when such usage allows
us to
avoid inconsistencies between common usage
("epsilon-aminomethyl") and IUB
recommended usage ("zeta-amino-methyl").
Note that standard abbreviations for the modified residues are not used,
so
that, the correct feature is
Modified site: gamma-carboxyglutamic acid
(Glu)
and not
Modified site: gamma-carboxyglutamic acid
(Gla)
Back to Top
Modified Amino Terminus
The format for this form of the "Modified site" record is
"Modified site: "name "(" res
") "["(" form ")"] ["(" extent ")"]
"#status" status
The chemical name should be as specific as possible and should usually
include the term "amino end" at the end. When an unblocked or longer
precursor form is presented in the entry and the modified site is not position 1, the
"in mature form" modifier should be used, for example.
Modified site: acetylated amino end (Ala) (in mature
form) #status experimental
[GRAY] Because not all processed forms requiring this modifier
are the final "mature" form, it may become necessary to replace
this modifier with something like "(in processed form) #link
...". Annotators are invited to comment on this proposal.
Current acceptable examples are:
Modified site: 2-oxobutanoic acid (Thr)
Modified site: L-3-phenyllactic acid (Phe)
Modified site: N-formylmethionine
Modified site: acetylated amino end (xxx)
[the following form is used only when the presented
sequence is completely ambiguous at the amino terminus]
Modified site: blocked amino end
Modified site: blocked amino end (xxx)
Modified site: dimethylated amino end (Pro)
Modified site: fatty acylated amino end (Cys)
Modified site: formylated amino end (Gly)
Modified site: glucuronylated amino end (Gly)
Modified site: methylated amino end (Ala)
Modified site: myristylated amino end (Gly)
Modified site: succinylated amino end (Trp)
Modified site: pyrrolidone carboxylic acid (Gln)
Modified site: pyruvic acid (Ser)
Modified site: trimethylated amino end (Ala)
The form descriptor "(probably ...)" should be used with
"blocked amino end" whenever an appropriate prediction can be made for an otherwise experimentally determined ambiguous feature.
Modified site: blocked amino end (Ala) (probably acetylated)
#status experimental
The "blocked amino end" is usually only appropriate with
experimental status, because otherwise the specific modification would be used with a
predicted status. With increasing degrees of certainty
Modified site: acetylated amino end (Ala) #status
predicted says you are guessing both whether and by what,
Modified site: blocked amino end (Ala) (probably
acetylated)
#status experimental
says you know whether but are guessing by what,
Modified site: acetylated amino end (Ala) #status
experimental
says you know both whether and by what.
Formylated amino terminal methionine is coded for and like
selenocysteine is
not really a modified site. However it should be annotated as a modified
site
when it is experimentally observed in a protein. Making the residue
explicit
is not required in this case. No occurrence has yet been noted of this
modified residue in other than the first position.
For amino terminal glutamine undergoing cyclization the format is
"Modified site: pyrrolidone carboxylic acid
(Gln)" ["(in mature form)"]["#link " link]
"#status " status
When the amino terminus is known to be glutamine and blocked,
pyrrolidone
carboxylic acid can be assumed unless a reason to believe otherwise is
explicitly provided, in which case Modified site: blocked amino end (Gln) (in mature form) #status experimental
should be used. The form Modified site: pyrrolidone carboxylic acid
(Glx) should be avoided.
The ambiguity should be explicitly noted in the
"Residues"record, an appropriate comment made, and the sequence
and feature presented as Gln. People entering sequences should be
explicitly warned about the notation "E" appearing in some
articles; such sequences should be entered with a "Q" and an
appropriate feature prepared.
[BLACK] Combined annotated forms like
Modified site: acetylated and phosphorylated amino end
(Ser)
should not be used. These should appear in two records:
Modified site: acetylated amino end (Ser)
Binding site: phosphate (Ser) (covalent)
See also the discussion of incidental and secondary modifications
under the covalent type "Binding site" section above.
In the case where a residue is enzymatically cleaved at the bond between
the alpha carbon and the alpha amino-nitrogen to produce a new amino
terminus blocked with a 2-oxo or a 2-hydroxy acid, the residue giving rise to the
blocking group is entered in the sequence and one of these annotations
is used Modified site: 2-oxobutanoic acid (Thr)
Modified site: L-3-phenyllactic acid (Phe)
Modified site: pyruvic acid (Ser)
These features do not have "amino end" in the chemical name.
However,if the
preceding sequence is shown, these features should have the "(in
mature form)"
modifier.
Back to Top
Modified Carboxyl Terminus
The format for this form of the "Modified site" record has the same
format as for the modified amino terminus
"Modified site:" name "(" res
")" ["(" extent ")"] ["(" form
")"] "#status " status
Current examples are:
Modified site: amidated carboxyl end (xxx)
Modified site: amidated carboxyl end (xxx) (in mature form)
Modified site: amidated carboxyl end (xxx) (amide in mature form
...
from following glycine)
Modified site: amidated carboxyl end (Ala) (amide in mature form
...
from following serine)
Modified site: amidated carboxyl end (Tyr) (amide in mature form
...
from following leucine)
Modified site: blocked carboxyl end (xxx)
Modified site: chondroitin sulfate ester carboxyl end (Asp) (in
mature form)
Modified site: GPI-anchor ethanolamine amidated carboxyl end (xxx)
(in mature form)
Modified site: GSI-anchor ethanolamine amidated carboxyl end (Ser)
(in mature form)
Modified site: methyl ester carboxyl end (Cys) (in mature form)
The chemical name should be as specific as possible and should include
the term "carboxyl end" at the end. The "in" form should be used when a
longer immature sequence is presented in the entry and the modified site is not at the
final position.
In the case where the carboxyl amide arises from enzymatic cleavage of
the bond between the alpha-carbon and amino nitrogen of the following
glycine residue, a special form of the "in mature form" annotation is
used Modified site: amidated carboxyl end (Ile) (amide in
mature form from following glycine)
All but a very small number of amidations arise from this mechanism. The
cases where leucine and serine are used are documented but not
well-understood.
The GSI-anchor is a chemically distinct modification that must be
carefully distinguished from the more well-known GPI-anchor.
Connections through the amino- or carboxyl-ends to other encoded peptide
chains are now all treated uniformly as Cross-link features.
Back to Top
Selenocysteine
The format for this form of the "Modified site:"
record is
"Modified site: selenocysteine "#status "
status
It had formerly been thought that selenocysteine arose from
post-translational modification of cysteine residues and no single-letter code was
assigned. When it was discovered to be encoded, the assignment of a special
single-letter code presented an insurmountable software implementation problem. Instead
this feature record is applied to those residues, or list of residues.
Although it usually serves as an active site, a second feature for that annotation
is superfluous. However, when it also serves as a covalent binding site for
a prosthetic group, it is considered a secondary modification and two
feature records are used.
Modified site: selenocysteine
Binding site: molybdopterin guanine dinucleotide (Cys)
(covalent)
Two different things are going on here. The first feature indicates the
true coding identity of the residue. The second indicates the true prosthetic
group covalently bound to the sequence-presented residue. [This all
arise because of the terrible historical accident that no one knew
selenocysteine was encoded until it was too late. Ever computer database uses "C"
and everyone's computer program will break if a new letter is introduced for it.]
Do not use the 1-letter code "X" in the canonical sequence or the
3-letter code "Sec" in a feature for selenocysteine. "X" may, of course, be used
in "Residues" records for encoded selenocysteine.
Back to Top
Acetyllysine, Carbamyllysine, and Acylcysteine
Amino terminal lysine acetylated on the alpha-amino group should be
annotated
Modified site: acetylated amino end
(Lys)
When a lysine in any position is acetylated or carbamylated on the
N6-amino
group, it should be annotated like:
Binding site: acetyl (Lys) (covalent)
Binding site: carbon dioxide (Lys) (covalent)
Likewise, be careful to distinguish amino terminal cysteine acylated on
the
alpha-amino group from S-acylated cysteine. The amino-acylated form is
like:
Modified site: acetylated amino end (Cys)
Modified site: fatty acylated amino end (Cys)
while the S-acylated form is like:
Binding site: palmitate (Cys) (covalent)
Binding site: sn-2,3-diacylglycerol (Cys) (covalent)
Other protein sequence databases are not careful in making this
important distinction and contain errors on this point.
Back to Top
Aspartate and Glutamate esters>
Because it has been experimentally observered that both glutamic acid
and glutamine give rise to glutamate methyl ester in the same protein and
these rules would otherwise require that they be annotated differently, esters
of the acids will be annotated with Modified site records.
Current acceptable examples are:
Modified site: glutamate methyl ester (Gln) (by
cheB-dependent deamidation and methylation)
Modified site: glutamate methyl ester (Glu)
Back to Top
Revised 10/22/01
|