This documents the standardization of features records in NBRF format achieved
through Release 56.
We have received comments on the improvement in the appearance and
consistency of our database and recent research paper specifically mentioned
employing the PIR features annotations in the sequence analysis. The following
projects have been very successful.
Back to Top
- Programs to check new or updated entries and to check existing database entries written in C
have been considerably improved and extended. These are the programs that do the rules "enforcement" that will be discussed.
- All ambiguous Binding site: features have been resolved to appropriate
covalent or noncovalent features.
- All explicit disulfide bond and most other site information in free-text comments has been converted to features.
- The experimental status of all site, bond and product features have been
assigned. A status is now required on these features in all PIR1 and PIR2
Four status types are used in features:
- The comment "in mature form" for amino-terminal features that are not at
the first position of the entry and for carboxyl-terminal features that are
not at the last position of the entry is required, and the comment is not
used except in that context.
- Many new features have been added and the covalent binding sites, modified
sites, cross-links and active sites have been documented in the RESID database.
The following feature records appear in PIR1 and PIR2 annotations and are described in separate sections of this document:
The following notation is used in this document
|" "|| enclose explicit typographic characters|
|[ ]||enclose optional elements|
| | ||separates alternative elements|
|. . .||means indefinite repetition of the preceding optional elements|
|res||means a 3-letter amino acid residue code|
|form||"by" mechanism | "in" protein name |"in mature form"
|status||"experimental"|"predicted"|" absent"|" atypical"|
Back to Top
Features Annotation Guidelines
When you are preparing new features annotations, you should try to conform to
these guidelines as closely as possible. These guidelines have three degrees
- features that are currently accepted and being used,
- features that may have been used in the past but are now undesirable,
that have been removed from entries that contain them
and that should not be used in new entries ("black rules"); these are marked [BLACK].
- features that occur in some entries but are of uncertain value, that have
been proposed but not yet accepted, or that are otherwise under review
("gray rules"); these are marked [GRAY].
If an annotator thinks that a gray rule feature or some new feature is required
for an entry, the annotator should check first with email@example.com
Back to Top
Discussion of Status Indicators
A status indicator, either "experimental", "absent", "atypical" or "predicted"
is required for all features except Domain and Region. Generally, it should
not be used with "Region:" features. In the "Domain:" feature it should be
used except for homology domains, for self evident features like "amino-
terminal"or "serine-rich", and for features with arbitrary designations
like "first", "1"or"A".
The "experimental" status means that the feature has been experimentally
observed in the indicated way at the indicated location. Any indication of
alternative forms means that all the alternatives have been observed. For example:
Modified site: N6-methyllysine or N6,N6-dimethyllysine (Lys)#status experiment
means that both forms have been observed at the indicated location.
other hand, an indication of an alternate location means that the feature is
known to occur in one or the other position, but which could not be resolved
experimentally. For example:
Modified site: (or 81) N6,N6,N6-trimethyllysine (Lys) #status experimental
The "predicted" status means that either the nature, the location, or both, of
the feature has been predicted by some means. The experimental observation of
a feature under unnatural conditions should be carefully considered and if the
conditions seem sufficiently different from the natural case, the feature
should be marked as a prediction. With the present system, a distinct problem
occurs when either the nature or the location of a feature, but not both, has
been experimentally determined. Generally, the most definite form of evidence
should be presented and appropriate comments should be provided, either in the
feature or in comments with the entry. For example, if a protein is known to
be blocked and a translated sequence is presented, but the nature and location
of the blocking group are unknown, then only a note or comment is appropriate.
We would welcome any suggestions on how a feature with both experimental and
predicted aspects can best be presented.
The status "absent" is used to indicate a feature that, although it would be
otherwise predicted by some means, has been experimentally determined not to
occur at the indicated position. It is intended to be used in the very limited
cases when an investigation of the specific feature produced the experimental
result. Currently, this status is mainly used for the
Binding site: carbohydrate (Asn) (covalent)
feature. The PIR was the first, and so far the only, database that makes it possible to distinguish the cases where there is negative experimental evidence
from the cases where there is merely insufficient annotation.
The status "atypical" is used to indicate a feature that does not follow the
"normal" pattern, that would otherwise be predicted not to occur, but that has
been experimentally determined to occur at the indicated location. Again, it
is intended to be used in the very limited cases when an investigation of the
specific feature produced this result. Examples of its use are:
- homology domains that have unusually large insertions or deletions,
- carbohydrate binding site with the pattern N-X-C,
- metal binding sites with an apparently missing ligand residue.
Back to Top
Discussion of "Combinability"
With the adoption of an "object-oriented"approach, it became important to
distinguish the case of groups of residues constituting in the aggregate one
feature from the case of residues in individual features sharing the same
description and grouped for convenience. An example of a group of residues
constituting in the aggregate one feature is:
192,226,231/Binding site: copper (His, Cys, His)
Together this particular group of residues forms one unique binding site for
copper. In this case the group forms a single "object" and other groups of the
same kind could not be combined in the same record without creating ambiguity
in the identity of the objects being represented. Such features are not
An example of residues in individual features sharing the same description and
grouped for convenience is:
192,226,231/Binding site: carbohydrate (Asn) (covalent)
In this case each residue individually is an "object" and they can be combined
without introducing ambiguity. Such features are "combinable".
Features should not be combined unless in the discussion of a particular type
of feature it is explicitly stated that it is combinable. Features with
different "#status" or "#link" descriptors should not be combined.
Back to Top
Discussion of Tags
Tags are short labels that are attached to certain features and other records
in the database. A particular tagged feature may have repeated examples
within a single entry that must be uniquely distinguished by different tags.
A tag is the very last element of the record separated from the preceding
elements by a single space. Only one tag should occur in each record. A
tag consists of (in order):
- the character "<",
- three or four uppercase alphabetic characters or numbers,
- the character ">"
The same tag must not be applied to more than one record of any type in each
entry. There is, as yet, no way to check for or impose standardization on the
use of tags beyond these format rules. Some suggestions about tags will be
discussed with particular features. Tags must be used with Domain and Product
feature records. Their use with other feature types is presently problematic.[GRAY]
Back to Top
Discussion of Order of Features
The features records in each entry are essentially independent entities. A
computer program reading a feature table could derive no additional information
from the order of the records in it. However, the feature table is also looked
at by humans from time to time, and the imposition of some regularity in the
arrangement of features can be very helpful. The preferred order of features
is as follows.
Formerly, a certain amount of "artistic license" could be employed in arranging
a feature table to emphasize certain structural aspects of the protein or
simply to give it a greater degree of coherence. The updating mechanism does not
follow an annotator's idiosyncratic order and feature tables will rearranged
according to the rules above.
- Product, Domain and Region records are arranged as a group in increasing
order by the first element of their range, then in decreasing order by the
second element of their range.
- Second, site and bond records are arranged as a group in increasing order by
their first element.
Back to Top