Introduction to Features [PIR - Protein Information Resource]

Home About PIR Databases Search/Retrieval Download Support

HOME / About / Introduction to Features

Introduction to Features

	Active Site Records
	General Definition of Binding and Modification Sites
	Binding Site Records
	Inhibitory Site Records
	Modified Site Records
	Modified Side Chains
	Modified Amino Terminus
	Modified Carboxyl Terminus
	Selenocysteine
	Acetyllysine, Carbamyllysine, Acylcysteine
	Aspartate & Glutamate Esters

"Active site" Record

The Active site record is applied to residues of enzymes known or thought to function in the actual catalytic reaction of the enzyme. It should be applied to a single residue or a short list of residues; it should not be applied to a range (a hyphenated pair). If the active site residues are not specifically known but have been localized to a segment of the sequence, the "Region" record rather than the "Active site" record should be used. "Active site" features in entries without an Enzyme Commission notation in either their title or "Contains" records are suspect and will be flagged as possible errors. The format for the "Active site" record is
"Active site: "res ["," res...] ["("description")"] ["#link" link]"#status " status
The status is required for this feature. All the residues participating in each active site that do not require different modifiers, should be combined in the same feature. Do not combine residues from different active sites or that need different modifiers. The use of description fields, discussed below, should be avoided if possible.

Examples:

Active site: Arg #status experimental
Active site: Asp, His, Ser #status predicted
Active site: His, His, Asp #status experimental

A residue list may be used only for those residues which participate in the same concerted catalytic reaction. If all the residues participating in one active site are the same type, then only one residue need be shown. Enzymes recognized to have several distinct catalytic reactions should have an "Active site" record for each active site. Multiple "Active site" records for what is, in fact, a single active site should be combined into one record using a list of residues, unless different status conditions apply.

[GRAY] Formerly, mechanisms were presented but this should no longer be done except when the mechanism is used as a description. Generally such a description should be applied only when multiple active sites occur in the same entry.

[BLACK] In particular, the description "charge relay system" should not be used except in enzymes with multiple activities.

Examples currently used are:

Active site: Cys (amide transfer)
Active site: Cys (of 3-oxoacyl-[acyl-carrier-protein] synthase)
Active site: Lys (of 3-oxoacyl-[acyl-carrier-protein] reductase)
Active site: Lys (of enoyl-[acyl-carrier-protein] reductase)
Active site: Ser (of enoyl-[acyl-carrier-protein] reductase)
Active site: Ser (of oleoyl-[acyl-carrier-protein] hydrolase)
Active site: Ser (of [acyl-carrier-protein] acetyl/malonyltransferase)
Active site: Glu (alpha-reaction)
Active site: His, Lys, Cys (beta-reaction)

Descriptors like these may be replaced with "#link" modifiers which point to tags in appropriate Function records, or Domain or Product features.

Active site: Cys #link ARD #status predicted

Here the link "ARD" points to a Function record with the tag "<ARD>". This mechanism will also be used to link active site records with different status conditions but which belong to the same active site object.

When a residue has a stable, covalently-bound, catalytically-active prosthetic group, only the "Binding site: ... (covalent)" feature should be used. An "Active site" record should not also be used because it is the prosthetic group which is active and not the amino acid as such. In particular, for an active site phosphoserine only the annotation:

Binding site: phosphate (Ser) (covalent) #status experimental

should appear. When a residue forms a transient, covalent bond in its role as an active site then the "Active site" record should be used and the description field may be used. The nature of the intermediate should be made as clear as practical. Annotators should consider carefully whether a covalently-bound group is stable or transient in determining whether an annotation should be for a modified or an active site. The following possible features show active sites with transient groups that could easily be confused with a binding site.

Active site: Ser (phosphoserine intermediate)
Active site: Tyr (phosphotyrosine intermediate)

No examples yet exist of the second feature. Other current acceptable examples are:

Active site: Asp (aspartylphosphate intermediate)
Active site: Cys (phosphocysteine intermediate)
Active site: Cys (S-acetylcysteine intermediate)
Active site: Cys (sulfocysteine intermediate)
Active site: His (phosphohistidine intermediate)
Active site: Lys (ribulose-bisphosphate-binding)

Most of these features are documented in the RESID database. Avoid records that are unnecessarily detailed or are synonymous with existing features, like:

Active site: His (covalent intermediate)
Active site: Asp (phosphate-binding)

Be particularly suspicious of claims that Gly, Val, Leu, Ile, Pro, Asn, Gln, Pro, Met or Phe residues are active site residues. It is chemically dubious that such residues function in the actual catalytic reaction of an enzyme. Glycine and a few other residues can form free radicals that participate in free radical reactions, but for physical reasons such reactions are extremely rare in biochemical reactions.
Current examples are:

Active site: Cys (cysteine thiyl radical intermediate)
Active site: Gly (stable glycyl radical)
Active site: Trp (tryptophyl radical intermediate)
Active site: Tyr (stable tyrosyl radical)

These features are documented in the RESID database.

Residues that are structurally located near an active site but do not participate directly in the catalytic reaction of that active site should not be annotated in the PIR databases. Annotations for such residues will only be carried from PDB entries in the NRL_3D database. Not all reactive compounds that block an enzymatic reaction wind up reacting with an active site residue; they may react with a residue near the active site and block the substrate's access to the active site. Something may be more of a "reactive site" than an "active site", so be cautious about accepting this as experimental evidence for active site residues.

For cysteine residues that form catalytically active disulfide bonds only the annotation

Disulfide bonds: redox-active

should appear.

Even though selenocysteine may function as an active site, only the feature

Modified site: selenocysteine

should be used.

Residues that participate in allosteric control of enzyme activity but are not catalytically active should not be annotated as active sites but as binding sites or as regions. Residues that participate in different, symmetry-related active sites of complexes should not be combined in the same feature, but an appropriate description should be used to indicate the relationship.

Active site: Asp (shared with dimeric partner)
Active site: Cys (shared with dimeric partner)

These features imply that there are two symmetry-related active sites. Each site consists of an aspartate and a cysteine contributed by different chains of the homodimer.

[BLACK] The annotation

Active site: ... inhibitory ...

should not be used. Instead, use the annotation

"Inhibitory site: "

[BLACK] Do not use the term "active site" in either "Domain" or "Region" features. Instead, use the term "catalytic".

Back to Top

General Definitions for Binding Sites and Modified Sites

In binding sites and modified sites, the following definitions are very important. Because they include historical accidents and grammatical exigencies, these are operational definitions and do not necessarily extend beyond the purposes of this document.

Generally, an attachment site is an amino acid residue which has its side chain chemically changed post-translationally in such a way that it could be restored by physiological processes of hydrolysis, ammonolysis or simple (2H) reduction. Such chemical changes may occur transiently, or more or less permanently, but they must be covalent. The principle is that attachment site residues could in principle be recovered and detected by typical methods of sequence analysis, whereas modified sites could not be.

The "Binding site" feature includes two classes, attachment sites and binding sites. A "binding site" is an amino acid residue, or a group of them, that forms biochemically important, non-covalent bonds with ions or molecules (other than the protein constituting the entry). These bonds may be ionic, ligand (dative), Van der Waals, or donative or receptive hydrogen bonds. One borderline case is the sulfur-metal bond which will be regarded as covalent for cysteine when a cluster of atoms is bound, and non-covalent (dative ligand).

Methionine sulfur-metal bonds will be regarded as non-covalent (dative ligand). attachment sites will distinguished by using "(covalent)" in "Binding site" records. All new "Binding sites" without "(covalent)" are reviewed and subject to conversion. Consequently it is very important for annotators to provide the "(covalent)" designation in every case when it should be applied.

A "modified site" is an amino acid residue which is either

chemically changed post-translationally in such a way that it could not be restored by physiological processes of hydrolysis, ammonolysis or simple (2H) reduction (that is, it is not a side-chain attachment site),
chemically changed in any way involving the alpha amino group, including N-formylmethionine (this applies to both the amino terminus and internal residues),
a carboxyl terminal residue with any chemical change involving the alpha-carboxyl group,
a selenocysteine residue (these are translationally incorporated but for historical reasons are regarded as modified cysteine residues);
aspartate or glutamate esters that can arise from either the acid or the amide forms.

Back to Top

"Binding Site" Record

Using the foregoing definitions "Binding site" records are applied in two cases:

when an amino acid residue, or a group of them, forms biochemically important, non-covalent bonds with ions or molecules (other than the protein constituting the entry); or
when an amino acid residue forms an attachment site in which its side chain is chemically changed post-translationally in such a way that it could in principle be restored by physiological processes. Such cases must have a "(covalent)" bond description.

The format for the "Binding site" record is

("Binding site:" ["(or" position ")"] bound-group name "(" res ["," res...] ")"
["(covalent)" | "(" bonding description ")"] ["(" form ")"] ["(partial)"] ["#link " link] "#status " status

The status is required for this feature.

Currently acceptable covalent examples are listed below. The status, link and partial descriptors have been removed, and a few minor variants have been eliminated. Most of these features are documented in the Residues database.

Binding site: 2Fe-2S cluster (Cys) (covalent) Binding site: 2Fe-2S cluster (Cys, His, Cys, His) (covalent) Binding site: 3Fe-4S cluster (Cys) (covalent) Binding site: 4-hydroxycinnamyl (Cys) (covalent) Binding site: 4Fe-4S cluster (Cys) (covalent) Binding site: 4Fe-4S cluster (Cys) (covalent) (shared with dimeric partner) Binding site: 4Fe-4S cluster 1 (Cys) (covalent) Binding site: 4Fe-4S cluster 2 (Cys) (covalent) Binding site: AMP (Tyr) (covalent) Binding site: UMP (Tyr) (covalent) Binding site: acetyl (Lys) (covalent) Binding site: biotin (Lys) (covalent) Binding site: carbohydrate (Asn) (covalent) Binding site: carbohydrate (Asn) (covalent) (in ...) Binding site: carbohydrate (Cys) (covalent) Binding site: carbohydrate (Lys) (covalent) Binding site: carbohydrate (Ser) (covalent) Binding site: carbohydrate (Thr) (covalent) Binding site: carbohydrate (Trp) (covalent) Binding site: carbohydrate (Tyr) (covalent) Binding site: carbon dioxide (Lys) (covalent) (by ...) Binding site: chondroitin sulfate (Ser) (covalent) Binding site: cysteine (Cys) (covalent) Binding site: cysteine (Cys) (covalent) (in ...) Binding site: dermatan sulfate (Ser) (covalent) Binding site: farnesyl (Cys) (covalent) Binding site: fatty acid (Ser) (covalent) Binding site: fatty acid (Thr) (covalent) Binding site: formyl (Lys) (covalent) Binding site: geranyl-geranyl (Cys) (covalent) Binding site: glutathione (Cys) (covalent) Binding site: glycerylphosphorylethanolamine (Glu) (covalent) Binding site: heme (Cys) (covalent) Binding site: heme (Glu) (covalent) Binding site: heme, high potential (Cys) (covalent) Binding site: heme, low potential (Cys) (covalent) Binding site: heparan sulfate (Ser) (covalent) Binding site: homocitryl Mo-7Fe-8S cluster (Cys) (covalent) Binding site: keratan sulfate (Thr) (covalent) Binding site: lipoamide (Lys) (covalent) Binding site: methyl (Cys) (covalent) Binding site: molybdopterin (Cys) (covalent) Binding site: molybdopterin guanine dinucleotide (Cys) (covalent) Binding site: murein (Lys) (covalent) Binding site: myristate (Lys) (covalent) Binding site: nitrosonium (Cys) (covalent) Binding site: palmitate (Cys) (covalent) Binding site: palmitate (Lys) (covalent) Binding site: phosphate (Arg) (covalent) Binding site: phosphate (Asp) (covalent) Binding site: phosphate (His) (covalent) Binding site: phosphate (His) (covalent) (by ...) Binding site: phosphate (Ser) (covalent) Binding site: phosphate (Ser) (covalent) (by ...) Binding site: phosphate (Ser) (covalent) (in ...) Binding site: phosphate (Thr) (covalent) Binding site: phosphate (Thr) (covalent) (by ...) Binding site: phosphate (Tyr) (covalent) Binding site: phosphate (Tyr) (covalent) (by ...) Binding site: phosphopantetheine (Ser) (covalent) Binding site: phosphoribosyl dephospho-coenzyme A (Ser) (covalent) Binding site: phosphoryl-DNA (Ser) (covalent) Binding site: phosphoryl-DNA (Thr) (covalent) Binding site: phosphoryl-DNA (Tyr) (covalent) Binding site: phosphoryl-RNA (Ser) (covalent) Binding site: phosphoryl-RNA (Tyr) (covalent) Binding site: phycocyanobilin (Cys) (covalent) Binding site: phycoerythrobilin (Cys) (covalent) Binding site: phytochromobilin (Cys) (covalent) Binding site: polyglutamate (Glu) (covalent) Binding site: polyglycine (Glu) (covalent) Binding site: pyridoxal phosphate (Lys) (covalent) Binding site: retinal (Lys) (covalent) Binding site: sn-2,3-diacylglycerol (Cys) (covalent) Binding site: sn-2,3-diphytanylglycerol diether (Cys) (covalent) Binding site: sulfate (Tyr) (covalent) Binding site: vanadium cofactor (Cys) (covalent) Binding site: iron-sulfur clusters (Cys) (covalent)
[use this only when the cluster form has not been determined and cannot be predicted]
A large variety in the "(by ...)" descriptor exists. Please consult the database to determine currently used forms.

Examples of currently acceptable "Binding site" features not labeled "covalent" are listed below. The residue lists (in all but a few cases), status, link and partial descriptors have been removed, and a few minor variants have been eliminated.

[the following with one locant]

Binding site: heme iron (His) (axial ligand)
Binding site: heme iron (His) (axial ligand) (shared with alpha chain)
Binding site: heme iron (His) (axial ligand) (shared with beta chain) [the following with two locants]

Binding site: heme iron (His) (axial ligands)
Binding site: heme iron (His) (proximal axial ligand)
Binding site: heme iron (His, Met) (axial ligands)
Binding site: heme iron (Met, His) (axial ligands)
Binding site: heme iron (Tyr) (axial ligand)
Binding site: heme iron, high potential (His) (axial ligand)
Binding site: heme iron, high potential (His) (axial ligands)
Binding site: heme iron, high potential (His, Met) (axial ligands)
Binding site: heme iron, high potential (His, Tyr) (axial ligands)
Binding site: heme iron, low potential (His) (axial ligand)
Binding site: heme iron, low potential (His) (axial ligands)
Binding site: heme iron, low potential (His, Tyr) (axial ligands)
Binding site: heparin
Binding site: histamine
Binding site: homocitryl Mo-7Fe-8S cluster molybdenum (His) (ligand)
Binding site: iron
Binding site: iron (Asp) (shared with tetrameric partners)
Binding site: iron (His) (shared with chain M)
Binding site: iron (His, Glu, His) (shared with chain L)
Binding site: iron (Lys) (shared with tetrameric partners)
Binding site: magnesium
Binding site: magnesium (Glu) (shared with chain I)
Binding site: magnesium (His) (shared with chain II)
Binding site: manganese
Binding site: mercury
Binding site: metal
Binding site: methylcobalamin cobalt
Binding site: micellar substrate
Binding site: molybdopterin (Arg)
Binding site: molybdopterin cytosine dinucleotide (Arg)
Binding site: nickel
Binding site: nickel 1
Binding site: nickel 2
Binding site: omega-aminocarboxylic acids
Binding site: oxygen (His) (distal axial ligand)
Binding site: oxygen (Tyr) (distal axial ligand)
Binding site: phospholipid
Binding site: plastoquinone
Binding site: potassium
Binding site: pyrophosphate
Binding site: retinoic acid
Binding site: siroheme iron (Cys) (axial ligand)
Binding site: substrate
Binding site: substrate phosphate
Binding site: thyroxine
Binding site: transition metal ions
Binding site: ubiquinone
Binding site: zinc
Binding site: zinc, catalytic [see note below on the next two]
Binding site: zinc, catalytic (Cys, His, His, His) (inhibited)
Binding site: zinc, catalytic (His) (active)
Binding site: zinc, high affinity
Binding site: zinc, noncatalytic

All these have been reviewed. If a reference is encountered that discusses the covalent nature of one of these binding sites, please bring it to the attention of us. Be careful when you encounter a binding site established by a reactive analoguethese are designed to form covalent bonds when the actual compound may be bound noncovalently. None of the former features

Binding site: ATP (Lys) (covalent)
were ever actually covalent! An alternate locant may be placed after the "Binding site" and before the bound group name.

Binding site: (or 150) phosphate (Ser) (covalent) #status experimental

but this form should be avoided if at all possible.

The bound-group name must always be followed by a set of parentheses inclosing a residue or a list of residues that matches sequence residues corresponding to the preceding numbers. Strict parsing is enforced for this rule. If all the residues participating in one binding site are the same type, then only one residue need be shown, for example:

Binding site: calcium (Asp)

The only bonding descriptions presently used are "covalent", "axial ligand", "axial ligands", "proximal axial ligand" and "distal axial ligand". For these ligand cases, care must be taken in specifying the bound entity: "heme iron" rather than simply "heme".

Binding site: heme iron (His, Met) (axial ligands)

Covalent bonds to heme and similar prosthetic groups are to the group and not to the metal.

Binding site: heme (Cys) (covalent)

Also, use ligand if there is only one locant in the feature, and ligands if there are two or more locants even though they are all the same type of residue and one residue is shown. Thus,

44/Binding site: heme iron (His) (axial ligand)
44,68/Binding site: heme iron (His) (axial ligands)

The second feature has two locants, "44,68", but only one residue, "His", and "ligands" is used.

When a particular binding site occurs in both an active and an inhibited form, binding site records should appear for both forms:

Binding site: zinc, catalytic (Cys, His, His, His) (inhibited)
Binding site: zinc, catalytic (His) (active)

In this pair of records, the first denotes the inhibited binding site with a Cys ligand from a propeptide, and the second denotes the active binding site with only the three His ligands of the enzyme.

A single substrate may be listed simply as "substrate". For multiple substrates, other than water, in the same entry the substrate may be named.

Binding site: substrate (Arg)
Binding site: fructose-1,6-bisphosphate (Lys) (covalent)

When it is experimentally observed that a group is covalently bound at less than 95 mole per cent, the "(partial)" annotation should be used.

[BLACK] A numeric percentage or some other fractional indication should not be used. If the covalent binding is 95 mole percent or greater, don't use the "(partial)" annotation. If the "(partial)" annotation is used, it will almost always be based on an experimental observation so the "#status experimental" status should also appear;

[BLACK] do not use "(partial) #status predicted".

The "in" form should be used very sparingly when the covalent bond is known to occur only in the mature form or in one of several alternative polypeptide products and the entry presents an immature sequence.

Binding site: carbohydrate (Asp) (covalent) (in mature form)
Binding site: phosphopantetheine (Ser) (covalent) (in acyl carrier protein)

These may be replaced by appropriate "#link" descriptors.

The "by" form is used to distinguish among different binding sites of the same group, for example:

Binding site: phosphate (Ser) (covalent) (by autophosphorylation)
Binding site: phosphate (Ser) (covalent) (by Ca/calmodulin-dependent kinase)
Binding site: phosphate (Ser) (covalent) (by cAMP-dependent protein kinase)
Binding site: phosphate (Tyr) (covalent) (by autophosphorylation in vivo)
Binding site: phosphate (Tyr) (covalent) (by autophosphorylation in vitro)

[GRAY] The use of the terms in vivo and in vitro is questionable. If a feature is known to occur in vivo, it is what would otherwise be regarded as an experimentally determined features and so the term is superfluous. If a feature is known to occur in vitro, then even if it is experimentally determined it only amounts to a prediction that the natural modification might occur at that location and just the "#status predicted" status is warranted. Alternatively, if an in vitro feature marks something that occurs under unnatural conditions and the descriptor would only distinguish it from the natural occurances, then a comment is warranted and not a feature (as with the former "Binding site: carbohydrate (Gln)" features determined to be unnatural). A feature marked both in vitro and "#status predicted" would seem to have very little value under any circumstance.

Some covalent binding sites can occur only as a consequence of a prior modification. These are nonetheless biochemically separate and distinct features. For such cases we use two features, one to indicate the nature of the modification and the other to indicate the secondary change. For example:

42/Modified site: 5-hydroxylysine (Lys)
42/Binding site: carbohydrate (Lys) (covalent)

In the first step, a lysine is hydroxylated. It may or possibly may not be subsequently glycosylated. If they were combined in a single feature, there would be a problem using the partial modifier. Would it mean the lysines at that position were partially hydroxylated but all the hydroxylysines were glycosylated, or would it mean that the lysines were all hydroxylated but that hydroxylysines were partially glycosylated. In the RESID database such cases are indicated by the records:

Conditions: secondary to ...if a prior modificaiton is required "or" Conditions: incidental to ...if it is not.

N6-acetylated lysine will be annotated as

Binding site: acetyl (Lys) (covalent)
[BLACK]

Do not annotate it as

Modified site: N6-acetyllysine (Lys)

When there are biochemically significantly different binding sites for the same compound in the same entry (rare), the bound-group name may include modifiers that distinguish between the functional differences of the bound-group or of the binding sites. These modifiers should be placed after the bound-group, without parentheses and separated from it by a comma.

For example:

Binding site: calcium, high affinity
Binding site: calcium, low affinity
Binding site: heme, high-potential (Cys) (covalent)
Binding site: heme, low-potential (Cys) (covalent)
Binding site: heme iron, high-potential (His)
Binding site: heme iron, low-potential (His)
Binding site: zinc, catalytic
Binding site: zinc, noncatalytic

Otherwise, different binding sites are only distiguished by being grouped in separate "Binding site" records and those binding sites should not be labeled.

[GRAY] Do not use such features as:

Binding site: calcium 1
Binding site: calcium 2
except to distinguish structurally distinct features, and not otherwise chemically indistinguishable sites.

Where the sequence was determined by protein sequencing and the nature of the covalently attached group precludes assignment of a residue as either an acid or an amide, and unless there is unequivocal evidence to the contrary (for example, the nucleotide sequence), there is a reasonable biochemical presumption that the residue should be the amide. The reported sequence should be presented with the ambiguity explicit in the "Residues" record, the amide presented in the sequence and feature records and an appropriate note like

Note: we have shown the unidentified residue(s) as ... forming ... (or bound to ...) based on ....

[GRAY] Concerted non-covalent binding of macromolecules by a set of residues would probably best be annotated through a "Region" record rather than through a "Binding site" record. Something like:

42-60/Region: DNA-binding

should be used instead of:

42,45,48,50,53,56,60/Binding site: DNA (Leu)

Back to Top

"Inhibitory site" Record

The format for the Inhibitory Site record is

"Inhibitory site:" res ["," res...] "(" activity ["," activity ...] ")" "#status " status

An inhibitory site is to an inhibitor what an active site is to an enzyme. It is the residue, or small set of residues, that is responsible for blocking the activity of an enzyme or set of enzymes. It should be applied to single residues, and to a small list of residues only sparingly. The status is required for this feature. Without a crystallographic structure it is very difficult to obtain experimental evidence that a particular residue is an inhibitory site, so most will have predicted status.

Some examples, with status omitted:

Inhibitory site: Arg (acrosin)
Inhibitory site: Arg (thrombin, coagulation factor Xa)
Inhibitory site: Arg (trypsin)
Inhibitory site: Arg (unknown proteinase)
Inhibitory site: Cys (thermolysin)
Inhibitory site: Leu (chymotrypsin)
Inhibitory site: Leu (chymotrypsin, elastase)
Inhibitory site: Lys (trypsin)
Inhibitory site: Met (chymotrypsin, subtilisin)
Inhibitory site: Tyr (chymotrypsin)

[GRAY] In the case that one of two residues is thought to be responsible for the inhibitory action, the record may be applied to a list and this format is used

"Inhibitory site:" res "or" res "(" activity ["," activity ...] ")" "#status " status

For example,

Inhibitory site: Leu or Met (elastin, chymotrypsin) #status predicted

The "or" form should be avoided whenever possible.

[BLACK] The "Inhibitory site" record is not used for allosteric inhibitor sites; those may be annotated as binding sites.

Back to Top

"Modified site" Record

The format for the Modified site record is

"Modified site: " ["(or" position")"] name "("res")" ["("form")"] ["("extent")"] "#status" status

"res" is the three-letter code for the original encoded residue (with the exception of selenocysteine and N-formylmethionine where no three-letter code is used). The "or" form should be avoided whenever possible. Different residues with the same feature can be combined. In cases when an annotator wishes to distinguish the features belonging to different domain, or product features more clearly, then the separate modified sites for the different domains need not be combined, as with blocked amino- or carboxyl-terminals. The status is required for this feature. THESE FEATURES MUST BE DOCUMENTED IN THE RESID DATABASE. Bring any new examples to the attention of us.

Back to Top

Modified Side Chains

In the most general case the side chain is chemically modified in such a way that the original residue could not (in principle) be detected by normal sequencing methods. The following is a list of such modified residues.

Modified site: (Z)-dehydrobutyrine (Thr)
Modified site: 2'-bromophenylalanine (Phe)
Modified site: 2'-glucosyl-tryptophan (Trp)
Modified site: 2'-[3-carboxamido-3-(trimethylammonio)propyl]histidine (His)
Modified site: 3',4'-dihydroxyphenylalanine (Tyr)
Modified site: 3'-bromophenylalanine (Phe)
Modified site: 3'-FAD-histidine (His)
Modified site: 3'-methylhistidine (His)
Modified site: 3-hydroxyphenylalanine (Phe)
Modified site: 3-hydroxyproline (Pro)
Modified site: 3-oxoalanine (Cys)
Modified site: 4'-bromophenylalanine (Phe)
Modified site: 4-hydroxyarginine (Arg)
Modified site: 4-hydroxylysine (Lys)
Modified site: 4-hydroxyproline (Pro)
Modified site: 5-hydroxylysine (Lys)
Modified site: 6-bromotryptophan (Trp)
Modified site: ADP-ribosylarginine (Arg) (by ...)
Modified site: ADP-ribosylasparagine (Asn) (by ...)
Modified site: ADP-ribosylcysteine (Cys) (by ...)
Modified site: ADP-ribosylserine (Ser) (by ...)
Modified site: allysine (Lys)
Modified site: arginine derivative (Arg)
Modified site: asparagine derivative (Asn)
Modified site: beta-methylthioaspartic acid (Asp)
Modified site: bromohistidine (His)
Modified site: citrulline (Arg)
Modified site: cysteine derivative (Cys)
Modified site: cysteine sulfenic acid (Cys)
Modified site: D-alanine (Ala)
Modified site: D-alanine (Ser)
Modified site: D-allo-isoleucine (Ile)
Modified site: D-asparagine (Asn)
Modified site: D-leucine (Leu)
Modified site: D-methionine (Met)
Modified site: D-phenylalanine (Phe)
Modified site: D-serine (Ser)
Modified site: D-tryptophan (Trp)
Modified site: dehydroalanine (Ser)
Modified site: dehydroalanine (Tyr)
Modified site: dehydrobutyrine (Thr)
Modified site: dehydrotyrosine (Tyr)
Modified site: erythro-beta-hydroxyasparagine (Asn)
Modified site: erythro-beta-hydroxyaspartic acid (Asp)
Modified site: gamma-carboxyglutamic acid (Glu)
Modified site: glutamate methyl ester (Gln)
Modified site: glutamate methyl ester (Glu)
Modified site: glutamine derivative (Gln)
[the following two ambiguous features should be avoided if possible]
Modified site: hydroxylysine (Lys)
Modified site: hydroxyproline (Pro)
Modified site: isoleucine derivative (Ile)
Modified site: lysine derivative (Lys)
Modified site: N4-methylasparagine (Asn)
Modified site: N5-methylglutamine (Gln)
Modified site: N6,N6,N6-trimethyllysine (Lys)
Modified site: N6,N6-dimethyllysine (Lys)
Modified site: N6-(4-amino-2-hydroxybutyl)lysine (Lys)
Modified site: N6-methyllysine (Lys)
Modified site: omega-N,omega-N-dimethylarginine (Arg)
Modified site: omega-N,omega-N'-dimethylarginine (Arg)
Modified site: omega-N-methylarginine
Modified site: S-(6-FMN)-cysteine (Cys)
Modified site: S-(8alpha-FAD)-cysteine (Cys)
Modified site: selenocysteine
Modified site: thyroxine (Tyr)
Modified site: topaquinone (Tyr)
Modified site: triiodothyronine (Tyr)
Modified site: tryptophyl quinone (Trp)

Whenever possible, new modified residues should be added with substitution positions and stereo-isomer indicators provided in accordance with appropriate IUPAC and IUB rules. Please bring any additional or new modified residues to the attention of us.

[BLACK] Ambiguous notations such as

Modified site: methylation #status predicted

should not be used.

We have chosen to use the unambiguous IUPAC numbered position forms, in preference to the IUB Greek letter designations, when such usage allows us to avoid inconsistencies between common usage ("epsilon-aminomethyl") and IUB recommended usage ("zeta-amino-methyl").

Note that standard abbreviations for the modified residues are not used, so that, the correct feature is

Modified site: gamma-carboxyglutamic acid (Glu)

and not

Modified site: gamma-carboxyglutamic acid (Gla)

Back to Top

Modified Amino Terminus

The format for this form of the "Modified site" record is

"Modified site: "name "(" res ") "["(" form ")"] ["(" extent ")"] "#status" status

The chemical name should be as specific as possible and should usually include the term "amino end" at the end. When an unblocked or longer precursor form is presented in the entry and the modified site is not position 1, the "in mature form" modifier should be used, for example.

Modified site: acetylated amino end (Ala) (in mature form) #status experimental

[GRAY] Because not all processed forms requiring this modifier are the final "mature" form, it may become necessary to replace this modifier with something like "(in processed form) #link ...". Annotators are invited to comment on this proposal.

Current acceptable examples are:

Modified site: 2-oxobutanoic acid (Thr)
Modified site: L-3-phenyllactic acid (Phe)
Modified site: N-formylmethionine
Modified site: acetylated amino end (xxx)
[the following form is used only when the presented sequence is completely ambiguous at the amino terminus]
Modified site: blocked amino end
Modified site: blocked amino end (xxx)
Modified site: dimethylated amino end (Pro)
Modified site: fatty acylated amino end (Cys)
Modified site: formylated amino end (Gly)
Modified site: glucuronylated amino end (Gly)
Modified site: methylated amino end (Ala)
Modified site: myristylated amino end (Gly)
Modified site: succinylated amino end (Trp)
Modified site: pyrrolidone carboxylic acid (Gln)
Modified site: pyruvic acid (Ser)
Modified site: trimethylated amino end (Ala)

The form descriptor "(probably ...)" should be used with "blocked amino end" whenever an appropriate prediction can be made for an otherwise experimentally determined ambiguous feature.

Modified site: blocked amino end (Ala) (probably acetylated) #status experimental

The "blocked amino end" is usually only appropriate with experimental status, because otherwise the specific modification would be used with a predicted status. With increasing degrees of certainty

Modified site: acetylated amino end (Ala) #status predicted

says you are guessing both whether and by what,

Modified site: blocked amino end (Ala) (probably acetylated) #status experimental

says you know whether but are guessing by what,

Modified site: acetylated amino end (Ala) #status experimental

says you know both whether and by what.

Formylated amino terminal methionine is coded for and like selenocysteine is not really a modified site. However it should be annotated as a modified site when it is experimentally observed in a protein. Making the residue explicit is not required in this case. No occurrence has yet been noted of this modified residue in other than the first position.

For amino terminal glutamine undergoing cyclization the format is

"Modified site: pyrrolidone carboxylic acid (Gln)" ["(in mature form)"]["#link " link] "#status " status

When the amino terminus is known to be glutamine and blocked, pyrrolidone carboxylic acid can be assumed unless a reason to believe otherwise is explicitly provided, in which case

Modified site: blocked amino end (Gln) (in mature form) #status experimental

should be used. The form

Modified site: pyrrolidone carboxylic acid (Glx)

should be avoided.

The ambiguity should be explicitly noted in the "Residues"record, an appropriate comment made, and the sequence and feature presented as Gln. People entering sequences should be explicitly warned about the notation "E" appearing in some articles; such sequences should be entered with a "Q" and an appropriate feature prepared.

[BLACK] Combined annotated forms like

Modified site: acetylated and phosphorylated amino end (Ser)

should not be used. These should appear in two records:

Modified site: acetylated amino end (Ser)
Binding site: phosphate (Ser) (covalent)

See also the discussion of incidental and secondary modifications under the covalent type "Binding site" section above.

In the case where a residue is enzymatically cleaved at the bond between the alpha carbon and the alpha amino-nitrogen to produce a new amino terminus blocked with a 2-oxo or a 2-hydroxy acid, the residue giving rise to the blocking group is entered in the sequence and one of these annotations is used

Modified site: 2-oxobutanoic acid (Thr)
Modified site: L-3-phenyllactic acid (Phe)
Modified site: pyruvic acid (Ser)

These features do not have "amino end" in the chemical name. However,if the preceding sequence is shown, these features should have the "(in mature form)" modifier.

Back to Top

Modified Carboxyl Terminus

The format for this form of the "Modified site" record has the same format as for the modified amino terminus

"Modified site:" name "(" res ")" ["(" extent ")"] ["(" form ")"] "#status " status

Current examples are:

Modified site: amidated carboxyl end (xxx)
Modified site: amidated carboxyl end (xxx) (in mature form)
Modified site: amidated carboxyl end (xxx) (amide in mature form ... from following glycine)
Modified site: amidated carboxyl end (Ala) (amide in mature form ... from following serine)
Modified site: amidated carboxyl end (Tyr) (amide in mature form ... from following leucine)
Modified site: blocked carboxyl end (xxx)
Modified site: chondroitin sulfate ester carboxyl end (Asp) (in mature form)
Modified site: GPI-anchor ethanolamine amidated carboxyl end (xxx) (in mature form)
Modified site: GSI-anchor ethanolamine amidated carboxyl end (Ser) (in mature form)
Modified site: methyl ester carboxyl end (Cys) (in mature form)

The chemical name should be as specific as possible and should include the term "carboxyl end" at the end. The "in" form should be used when a longer immature sequence is presented in the entry and the modified site is not at the final position.

In the case where the carboxyl amide arises from enzymatic cleavage of the bond between the alpha-carbon and amino nitrogen of the following glycine residue, a special form of the "in mature form" annotation is used

Modified site: amidated carboxyl end (Ile) (amide in mature form from following glycine)

All but a very small number of amidations arise from this mechanism. The cases where leucine and serine are used are documented but not well-understood.

The GSI-anchor is a chemically distinct modification that must be carefully distinguished from the more well-known GPI-anchor.

Connections through the amino- or carboxyl-ends to other encoded peptide chains are now all treated uniformly as Cross-link features.

Back to Top

Selenocysteine

The format for this form of the

"Modified site:" record is "Modified site: selenocysteine "#status " status

It had formerly been thought that selenocysteine arose from post-translational modification of cysteine residues and no single-letter code was assigned. When it was discovered to be encoded, the assignment of a special single-letter code presented an insurmountable software implementation problem. Instead this feature record is applied to those residues, or list of residues. Although it usually serves as an active site, a second feature for that annotation is superfluous. However, when it also serves as a covalent binding site for a prosthetic group, it is considered a secondary modification and two feature records are used.

Modified site: selenocysteine
Binding site: molybdopterin guanine dinucleotide (Cys) (covalent)

Two different things are going on here. The first feature indicates the true coding identity of the residue. The second indicates the true prosthetic group covalently bound to the sequence-presented residue. [This all arise because of the terrible historical accident that no one knew selenocysteine was encoded until it was too late. Ever computer database uses "C" and everyone's computer program will break if a new letter is introduced for it.]

Do not use the 1-letter code "X" in the canonical sequence or the 3-letter code "Sec" in a feature for selenocysteine. "X" may, of course, be used in "Residues" records for encoded selenocysteine.

Back to Top

Acetyllysine, Carbamyllysine, and Acylcysteine

Amino terminal lysine acetylated on the alpha-amino group should be annotated

Modified site: acetylated amino end (Lys)

When a lysine in any position is acetylated or carbamylated on the N6-amino group, it should be annotated like:

Binding site: acetyl (Lys) (covalent)
Binding site: carbon dioxide (Lys) (covalent)

Likewise, be careful to distinguish amino terminal cysteine acylated on the alpha-amino group from S-acylated cysteine. The amino-acylated form is like:

Modified site: acetylated amino end (Cys)
Modified site: fatty acylated amino end (Cys)

while the S-acylated form is like:

Binding site: palmitate (Cys) (covalent)
Binding site: sn-2,3-diacylglycerol (Cys) (covalent)

Other protein sequence databases are not careful in making this important distinction and contain errors on this point.

Back to Top

Aspartate and Glutamate esters>

Because it has been experimentally observered that both glutamic acid and glutamine give rise to glutamate methyl ester in the same protein and these rules would otherwise require that they be annotated differently, esters of the acids will be annotated with Modified site records. Current acceptable examples are:

Modified site: glutamate methyl ester (Gln) (by cheB-dependent deamidation and methylation)
Modified site: glutamate methyl ester (Glu)

Back to Top

Revised 10/22/01

Protein Information Resource