Defect Report concerning: IEEE Std. 1003.2-1992, ISO/IEC 9945-2:1993 - Shell & Utilities
Clause: pages 178 and 179, lines 642-655
PASC Interpretation Ref: pasc-1003.2-211
Topic: awk sub command


This is an unapproved interpretation of PASC 1003.2-1992, ISO/IEC 9945-2:1993 - Shell & Utilities.

Use of the information contained in this unapproved document is at your own risk.

Last update: 20 April,2001


								1003.2-92  #211

 _____________________________________________________________________________


	Interpretation Number:XXX	
	Topic:  awk sub command
	Relevant Sections: pages 178 and 179, lines 642-655




PASC Interpretation Request: (Defect Report)
----------------------------  

	Date: 2001 March 26


------------------------------------------------------------------------ 

 7  Defect Report concerning (number and title of International Standard
    or DIS final text, if applicable): 

	Shell & Utilities: IEEE Std 1003.2 - 1992 (ISO 9945-2:1993)

------------------------------------------------------------------------ 

 8  Qualifier (e.g. error, omission, clarification required):

	Error=1

------------------------------------------------------------------------ 

 9  References in document (e.g. page, clause, figure, and/or table
    numbers):

	Awk utility, pages 178 and 179, lines 642-655

------------------------------------------------------------------------ 

10  Nature of defect (complete, concise explanation of the perceived
    problem):

    The description of the sub command in the original standard made
    the use of \ in the replacement text ambiguous if not in front of
    another \ or &.  The wording was changed in 1003.2b to:
    
 		sub(ERE, repl[, in])
 			Substitute the string repl in place fo the
 			first instance of the extended regular
 			expression ERE in string in and return the
 			number of substitutions.  An ampersand (&)
 			appearing in the string repl shall be replaced
 			by the string from in that matches the ERE.  An
 			ampersand preceded with a backslash (\) shall
 			be interpreted as the literal ampersand
 			character.  Any other occurrence of a backslash
 			(e.g., preceding any other character) shall be
 			treated as a literal backslash character.
 			[Note that if repl is a string literal (the
 			lexical token STRING, see 4.1.7.8), the
 			handling of the ampersand character occurs
 			after any lexical proessing, including any
 			lexical backslash escape sequence processing.]
 			If in is specified and it is not an lvalue (see
 			4.1.7.2), the behavior is undefined.  If in is
 			omitted, awk shall use the current record ($0)
 			in its place.

    The problem with this wording is that there is no way to get sub/gsub
    to generate a backslash followed by the matched text.  If I have

	a = "q"

    and I want to use sub to make a have the value  "\q" (two characters, \
    and q), how do I do that?  I can't.

	sub("q", "\\&", a)	-->	a == &
	sub("q", "\\\\&", a)	-->	a == \&


------------------------------------------------------------------------ 

11  Solution proposed by the submitter (optional):

   In Autin Group Revision XCU draft 5, P2381, L6352-6356, replace

	An ampersand preceded with a backslash (\) shall be interpreted as
	the literal ampersand character.  Any other occurrence of a backslash
	(e.g., preceding any other character) shall be treated as a literal
	backslash character.

   with the following:

	An ampersand preceded with a backslash (\) shall be interpreted as
	the literal ampersand character.  An occurance of two consecutive
	backslashes shall be interpreted as just a single literal backslash
	character.  Any other occurrence of a backslash (e.g., preceding any
	other character) shall be treated as a literal backslash character.

  I.e., add the sentence "An occurance of two consecutive backslashes shall
  be interpreted as just a single literal backslash character." into the
  middle.

------------------------------------------------------------------------ 




Interpretation:

--------------
As noted by the submitter, the standard clearly states that only the
ampersand character can be escaped by the backslash character in the
awk utility's sub() string function.  It requires that any other use of
the backslash character in this context (after lexical processing) be
treated as a literal backslash character.  This does not match historic
practice and no rationale was provided indicating that this change in
behavior was intentional.  Conforming implementations must conform to
this.  However, concerns have been raised about this which are being
referred to the sponsor.

Rationale:
----------
None
Notes to project editor (not part of this interpretation):
==========================================================
Make the change proposed by the submitter to the XCU volume of Austin
Group Revision draft 5.

Forwarded to Interpretations group: 27 Mar 2001
Recirculated: 28 Mar 2001
Finalized: 10 Apr 2001