Guide for Features Annotations [PIR - Protein Information Resource]

Home About PIR Databases Search/Retrieval Download Support

HOME / About / Guide for Features Annotations

Guide for Features Annotations

	Preliminary Comments
	Feature Records
	Feature Annotation Guidelines
	Status Indicators
	Combinability
	Tags
	Order of Features

Preliminary Comments

This documents the standardization of features records in NBRF format achieved through Release 56.

We have received comments on the improvement in the appearance and consistency of our database and recent research paper specifically mentioned employing the PIR features annotations in the sequence analysis. The following projects have been very successful.

Programs to check new or updated entries and to check existing database entries written in C have been considerably improved and extended. These are the programs that do the rules "enforcement" that will be discussed.
All ambiguous Binding site: features have been resolved to appropriate covalent or noncovalent features.
All explicit disulfide bond and most other site information in free-text comments has been converted to features.
The experimental status of all site, bond and product features have been assigned. A status is now required on these features in all PIR1 and PIR2 entries.
Four status types are used in features:
- experimental
- absent
- atypical
- predicted
The comment "in mature form" for amino-terminal features that are not at the first position of the entry and for carboxyl-terminal features that are not at the last position of the entry is required, and the comment is not used except in that context.
Many new features have been added and the covalent binding sites, modified sites, cross-links and active sites have been documented in the RESID database.

Back to Top

Feature Records

The following feature records appear in PIR1 and PIR2 annotations and are described in separate sections of this document:

Active Site:	Binding Site:
Clevage Site:	Cross-Link:
Disulfide Bonds:	Domain:
Inhibitory Site:	Modified Site:
Product:	Region:

The following notation is used in this document

Notation Stands for:
" " enclose explicit typographic characters

[ ] enclose optional elements

| separates alternative elements

. . . means indefinite repetition of the preceding optional elements

res means a 3-letter amino acid residue code

form "by" mechanism | "in" protein name |"in mature form"
extent "partial"

status "experimental"|"predicted"|" absent"|" atypical"

Back to Top

Features Annotation Guidelines

When you are preparing new features annotations, you should try to conform to these guidelines as closely as possible. These guidelines have three degrees of applicability:

features that are currently accepted and being used,
features that may have been used in the past but are now undesirable, that have been removed from entries that contain them and that should not be used in new entries ("black rules"); these are marked [BLACK].
features that occur in some entries but are of uncertain value, that have been proposed but not yet accepted, or that are otherwise under review ("gray rules"); these are marked [GRAY].

If an annotator thinks that a gray rule feature or some new feature is required for an entry, the annotator should check first with pirmail@georgetown.edu

Back to Top

Discussion of Status Indicators

A status indicator, either "experimental", "absent", "atypical" or "predicted" is required for all features except Domain and Region. Generally, it should not be used with "Region:" features. In the "Domain:" feature it should be used except for homology domains, for self evident features like "amino- terminal"or "serine-rich", and for features with arbitrary designations like "first", "1"or"A".

The "experimental" status means that the feature has been experimentally observed in the indicated way at the indicated location. Any indication of alternative forms means that all the alternatives have been observed. For example:

Modified site: N6-methyllysine or N6,N6-dimethyllysine (Lys)#status experiment

means that both forms have been observed at the indicated location.

On the other hand, an indication of an alternate location means that the feature is known to occur in one or the other position, but which could not be resolved experimentally. For example:

Modified site: (or 81) N6,N6,N6-trimethyllysine (Lys) #status experimental

The "predicted" status means that either the nature, the location, or both, of the feature has been predicted by some means. The experimental observation of a feature under unnatural conditions should be carefully considered and if the conditions seem sufficiently different from the natural case, the feature should be marked as a prediction. With the present system, a distinct problem occurs when either the nature or the location of a feature, but not both, has been experimentally determined. Generally, the most definite form of evidence should be presented and appropriate comments should be provided, either in the feature or in comments with the entry. For example, if a protein is known to be blocked and a translated sequence is presented, but the nature and location of the blocking group are unknown, then only a note or comment is appropriate. We would welcome any suggestions on how a feature with both experimental and predicted aspects can best be presented.

The status "absent" is used to indicate a feature that, although it would be otherwise predicted by some means, has been experimentally determined not to occur at the indicated position. It is intended to be used in the very limited cases when an investigation of the specific feature produced the experimental result. Currently, this status is mainly used for the

Binding site: carbohydrate (Asn) (covalent)

feature. The PIR was the first, and so far the only, database that makes it possible to distinguish the cases where there is negative experimental evidence from the cases where there is merely insufficient annotation.

The status "atypical" is used to indicate a feature that does not follow the "normal" pattern, that would otherwise be predicted not to occur, but that has been experimentally determined to occur at the indicated location. Again, it is intended to be used in the very limited cases when an investigation of the specific feature produced this result. Examples of its use are:

homology domains that have unusually large insertions or deletions,
carbohydrate binding site with the pattern N-X-C,
metal binding sites with an apparently missing ligand residue.

Back to Top

Discussion of "Combinability"

With the adoption of an "object-oriented"approach, it became important to distinguish the case of groups of residues constituting in the aggregate one feature from the case of residues in individual features sharing the same description and grouped for convenience. An example of a group of residues constituting in the aggregate one feature is:192,226,231/Binding site: copper (His, Cys, His)

Together this particular group of residues forms one unique binding site for copper. In this case the group forms a single "object" and other groups of the same kind could not be combined in the same record without creating ambiguity in the identity of the objects being represented. Such features are not "combinable".

An example of residues in individual features sharing the same description and grouped for convenience is:

192,226,231/Binding site: carbohydrate (Asn) (covalent)

In this case each residue individually is an "object" and they can be combined without introducing ambiguity. Such features are "combinable".

Features should not be combined unless in the discussion of a particular type of feature it is explicitly stated that it is combinable. Features with different "#status" or "#link" descriptors should not be combined.

Back to Top

Discussion of Tags

Tags are short labels that are attached to certain features and other records in the database. A particular tagged feature may have repeated examples within a single entry that must be uniquely distinguished by different tags. A tag is the very last element of the record separated from the preceding elements by a single space. Only one tag should occur in each record. A tag consists of (in order):

the character "<",
three or four uppercase alphabetic characters or numbers,
the character ">"

The same tag must not be applied to more than one record of any type in each entry. There is, as yet, no way to check for or impose standardization on the use of tags beyond these format rules. Some suggestions about tags will be discussed with particular features. Tags must be used with Domain and Product feature records. Their use with other feature types is presently problematic.[GRAY]

Back to Top

Discussion of Order of Features

The features records in each entry are essentially independent entities. A computer program reading a feature table could derive no additional information from the order of the records in it. However, the feature table is also looked at by humans from time to time, and the imposition of some regularity in the arrangement of features can be very helpful. The preferred order of features is as follows.

Product, Domain and Region records are arranged as a group in increasing order by the first element of their range, then in decreasing order by the second element of their range.
Second, site and bond records are arranged as a group in increasing order by their first element.

Formerly, a certain amount of "artistic license" could be employed in arranging a feature table to emphasize certain structural aspects of the protein or simply to give it a greater degree of coherence. The updating mechanism does not follow an annotator's idiosyncratic order and feature tables will rearranged according to the rules above.

Back to Top

Protein Information Resource