Defect Report concerning: IEEE Std. 1003.2-1992, ISO/IEC 9945-2:1993 - Shell & Utilities
Clause: 2.8
PASC Interpretation Ref: pasc-1003.2-29
Topic: regular expressions


This is an unapproved interpretation of PASC 1003.2-1992, ISO/IEC 9945-2:1993 - Shell & Utilities.

Use of the information contained in this unapproved document is at your own risk.

Last update: 20 April,2001


								1003.2-92  #29

	Class: No change


This response will be incorporated in an IEEE interpretations
publication, and will be also made available on-line on the IEEE 
SPAsystem.

 _____________________________________________________________________________


	Interpretation Number:	(to be assigned by the IEEE)
	Topic:			regular expressions
	Relevant Sections:	2.8


Interpretation Request: (Defect Report)
-----------------------
       Please provide an interpretation	of the following taken from
       Section 2.8 of IEEE Std 1003.2-1992.

       I think I know what the specified behavior is for the
       following cases,	but maybe I've opened an interesting
       question	or two.

       Given a locale in which "ch" is a multiple character
       collating element that collates between "c" and "d", then
       certainly
		 [[.ch.]]	 matches "ch".

       This makes it pretty clear that
		 [^[.ch.]]	 doesn't match "ch" (and not even
		 just the "c").

       Therefore, consistency argues that
		 [^c]	 matches "ch"
       And, of course,
		 [c]	 doesn't match "ch" (and not even just the
		 "c").

       If we're	in agreement so	far, then the simple rule is that
       if the string to	check against a	bracket	expression can be
       taken as	a multiple character collating element,	then the
       matching	process	must do	so.

       I'm pretty sure about the above.	 What I'm not so sure about
       is the behavior for character classes.  Take, for example,
		 [[:alpha:]]
       when presented with "ch".  The rationale	for POSIX.2
       confirms	that ``character classes are not intended to
       include collating elements''.  However, there are still two
       possible	answers: "ch" doesn't match, and the "c" of "ch"
       matches.	 I like	neither	of these answers; neither fits my
       intuitive belief	that "ch" should match as a unit.  Even
       worse, the nonportable
		 [a-z]	 *does*	match the unit "ch"!

       What is actually	specified for [[:alpha:]] here?



IEEE Interpretation for 1003.2-1992 
-----------------------------------


A character class expression is defined in section 2.8.3.2 of the
standard, as a set of characters belonging to a character class, as
defined in the LC_CTYPE category of the current locale.  A range
expression is defined in the same section as a set of collating elements
that fall between two elements in the current collation sequence,
inclusive.

Thus, a collating element ch, which is not a character, would be matched
by the range expression [a-z], but not by the character class (set of
specific characters specified in the locale file) [:alpha:].  [:alpha:]
would match the 'c' and the 'h' individually, for the same reason that
the expression [c] matches the 'c' in ch, but not the collating element
ch.


Rationale for Interpretation:
-----------------------------
None.
 _____________________________________________________________________________