Use of the information contained in this unapproved document is at your own risk
.Last update: 20 April,2001
1003.2-92 #24
Class: Defect situation
The standards states what it states, and conforming implementations
must conform to this. However, concerns have been raised about this
which are being referred to the Sponsors of the standard for consideration as
a future amendment.
This response will be incorporated in an IEEE interpretations
publication, and will be also made available on-line on the IEEE
SPAsystem.
_____________________________________________________________________________
Interpretation Number: (to be assigned by the IEEE)
Topic: tr
Relevant Sections: 4.64.5.1
Interpretation Request: (Defect Report)
-----------------------
Component: tr - Sect 4.64.5.1
Submitted by: Alex White
Ref. No.: tr.1
Proposed Resolution:
The interpretation request correctly describes what is in
the standard but this was not what was intended. The
working group will draft and propose a change to .2b to
describe what was originally intended.
_____________________________________________________________________________
In Section 4.64.5.1 - Standard Input {of tr}, the standard
states that the standard input to tr ``can be any file
type.'' [Draft 12 of IEEE Std 1003.2-1992 (July 1992), p.
483, line 10456]
However, in Section 4.64.5.3 - Environment Variables {of
tr}, the standard states that the LC_COLLATE variable
``shall determine the behaviour of range expressions and
equivalence classes.'' [Ibid., p. 483, lines 10499-10500]
and in Section 4.64.7 - Extended Description {of tr}, the
standard states that the \octal construct
[...] can be used to represent characters with
specific coded values. An octal sequence shall
consist of a backslash followed by the longest
sequence of one-, two-, or three-octal-digit
characters (01234567). The sequence shall cause
the character whose encoding is represented by the
one-, two-, or three-digit octal integer to be
placed into the array.
[Ibid., p. 484, lines 10525-10530]
These two statements cause tr to be unusable on any files of
type other than text. Historically, tr has been used to
manipulate files containing binary data. For example, the
perfectly valid, and useful construct:
tr -d '\200-\2ff'
to delete all characters with the top bit on or even
tr '\200-\2ff' '\0-\1ff'
to strip the top bit (which are useful operations on binary
files), no longer work.
For example, in the PC character set, \200 is a C-cedilla,
and \2ff is not defined as a glyph. Therefore, according to
section 4.64.5.3, the most likely interpretation is
characters which collate from C-cedilla (probably the letter
D) through the end will all match here. This is clearly
wrong, not historical practice, and of no use whatsoever.
May we interpret the standard as permitting octal escape
sequences as endpoints of a range to not use the collating
order, but rather byte ordering?
IEEE Interpretation for 1003.2-1992
-----------------------------------
The standard is clear in its requirement that octal sequences used as
endpoints in a range be treated as collating elements. The
implementation must follow this requirement. Concern over the wording of
this area of this standard has been forwarded to the sponsors.
Rationale for Interpretation:
-----------------------------
None.
_____________________________________________________________________________