Use of the information contained in this unapproved document is at your own risk
.Last update: 20 April,2001
1003.2-92 #211
_____________________________________________________________________________
Interpretation Number:XXX
Topic: awk sub command
Relevant Sections: pages 178 and 179, lines 642-655
PASC Interpretation Request: (Defect Report)
----------------------------
Date: 2001 March 26
------------------------------------------------------------------------
7 Defect Report concerning (number and title of International Standard
or DIS final text, if applicable):
Shell & Utilities: IEEE Std 1003.2 - 1992 (ISO 9945-2:1993)
------------------------------------------------------------------------
8 Qualifier (e.g. error, omission, clarification required):
Error=1
------------------------------------------------------------------------
9 References in document (e.g. page, clause, figure, and/or table
numbers):
Awk utility, pages 178 and 179, lines 642-655
------------------------------------------------------------------------
10 Nature of defect (complete, concise explanation of the perceived
problem):
The description of the sub command in the original standard made
the use of \ in the replacement text ambiguous if not in front of
another \ or &. The wording was changed in 1003.2b to:
sub(ERE, repl[, in])
Substitute the string repl in place fo the
first instance of the extended regular
expression ERE in string in and return the
number of substitutions. An ampersand (&)
appearing in the string repl shall be replaced
by the string from in that matches the ERE. An
ampersand preceded with a backslash (\) shall
be interpreted as the literal ampersand
character. Any other occurrence of a backslash
(e.g., preceding any other character) shall be
treated as a literal backslash character.
[Note that if repl is a string literal (the
lexical token STRING, see 4.1.7.8), the
handling of the ampersand character occurs
after any lexical proessing, including any
lexical backslash escape sequence processing.]
If in is specified and it is not an lvalue (see
4.1.7.2), the behavior is undefined. If in is
omitted, awk shall use the current record ($0)
in its place.
The problem with this wording is that there is no way to get sub/gsub
to generate a backslash followed by the matched text. If I have
a = "q"
and I want to use sub to make a have the value "\q" (two characters, \
and q), how do I do that? I can't.
sub("q", "\\&", a) --> a == &
sub("q", "\\\\&", a) --> a == \&
------------------------------------------------------------------------
11 Solution proposed by the submitter (optional):
In Autin Group Revision XCU draft 5, P2381, L6352-6356, replace
An ampersand preceded with a backslash (\) shall be interpreted as
the literal ampersand character. Any other occurrence of a backslash
(e.g., preceding any other character) shall be treated as a literal
backslash character.
with the following:
An ampersand preceded with a backslash (\) shall be interpreted as
the literal ampersand character. An occurance of two consecutive
backslashes shall be interpreted as just a single literal backslash
character. Any other occurrence of a backslash (e.g., preceding any
other character) shall be treated as a literal backslash character.
I.e., add the sentence "An occurance of two consecutive backslashes shall
be interpreted as just a single literal backslash character." into the
middle.
------------------------------------------------------------------------
Interpretation:
--------------
As noted by the submitter, the standard clearly states that only the
ampersand character can be escaped by the backslash character in the
awk utility's sub() string function. It requires that any other use of
the backslash character in this context (after lexical processing) be
treated as a literal backslash character. This does not match historic
practice and no rationale was provided indicating that this change in
behavior was intentional. Conforming implementations must conform to
this. However, concerns have been raised about this which are being
referred to the sponsor.
Rationale:
----------
None
Notes to project editor (not part of this interpretation):
==========================================================
Make the change proposed by the submitter to the XCU volume of Austin
Group Revision draft 5.
Forwarded to Interpretations group: 27 Mar 2001
Recirculated: 28 Mar 2001
Finalized: 10 Apr 2001