IT Questions and Answers :)

Thursday, January 23, 2020

This question is categorized "General Linux" because it has been tested on Linux with "grep". Which of the following strings is NOT matched by the regular expression 'ca*t'

This question is categorized "General Linux" because it has been tested on Linux with "grep". Which of the following strings is NOT matched by the regular expression 'ca*t'

  • cart
  • cat
  • caat
  • ct 

This question is categorized "General Linux" because it has been tested on Linux with "grep". Which of the following strings is NOT matched by the regular expression 'ca*t'

EXPLANATION

cart  does NOT match the regular expression  ca*t  because  the  "r"   in "cart" is not matched.   

caat  matches the "c", the "a" (two times), and the "t" in the regular expression.
cat  matches the  "c", the  "a" (one time), and the  "t"  in the regular expression.
ct  matches the  "c",  the  "a"  (zero times),  and the "t"  in the regular expression.

From:  https://linux.die.net/man/1/grep
"Repetition
A regular expression may be followed by one of several repetition operators:

?
The preceding item is optional and matched at most once.

*
The preceding item will be matched zero or more times.

+
The preceding item will be matched one or more times. "
Note that "bash" will use  *  on the command line to match any string of characters in filenames.  

To match any string of characters with a regex use:     .*    
The period is a regex metacharacter matching any character except newline, and the asterisk metacharacter will expand matches to any length.

Since the asterisk is a globbing character to "bash", it should be escaped when entered on the bash command line with "grep".

Note that the asterisk is escaped in the command below so that bash does not expand the  '*'

$ grep  ca'*'t regex_test
ct
cat
caat
The asterisk could also be escaped to bash via  'ca*t'  or  ca\*t       Escaping passes the  to grep, instead of bash using it to glob filenames in the current directory.

Absent escaping, you could get unexpected results if the asterisk globs a filename:

$ ls
cannot  regex_test  test4
 
$ cat regex_test
ct
cat
caat
cannot
cart
chat

Below, the asterisk is NOT escaped, so bash expanded it to a filename:
$ grep ca*t regex_test
cannot





In the unescaped command above, filename "cannot" was globbed from  ca*t  and passed to grep, making the grep command expand to   "grep cannot regex_test"
Since the file "regex_test" contains the string "cannot" in its data, the results of "grep" are correct for the given command, but that command may have been unintended.

Compare the output of the unescaped  *  above, to the escaped  '*'  in the command below:
$ grep 'ca*t' regex_test
ct
cat
caat

If an unescaped asterisk is parsed by bash, but does not expand to a filename, bash will pass the asterisk to grep (or whatever other command was entered on the command line).  But don't count on bash not  globbing a filename when that's not what you want--always escape characters that are special to bash if you need them passed to your command.

From https://www.tldp.org/LDP/abs/html/globbingref.html
"Bash itself cannot recognize Regular Expressions. Inside scripts, it is commands and utilities -- such as sed and awk -- that interpret RE's.
Bash does carry out filename expansion [1] -- a process known as globbing -- but this does not use the standard RE set. Instead, globbing recognizes and expands wild cards. Globbing interprets the standard wild card characters [2] -- * and ?, character lists in square brackets, and certain other special characters (such as ^ for negating the sense of a match). There are important limitations on wild card characters in globbing, however. Strings containing * will not match filenames that start with a dot, as, for example, .bashrc. [3] Likewise, the ? has a different meaning in globbing than as part of an RE."

Lastly, this is old (but maintained), and a very good writeup on regular expressions: 
http://www.grymoire.com/Unix/Regular.html#TOC


SOURCE

https://linux.die.net/man/1/grep
Share:

0 comments:

Post a Comment

Popular Posts