EXPLANATION
cart does NOT match the regular expression
ca*t because the "
r" in
"cart" is not matched.
caat matches the
"c", the
"a" (two times), and the
"t" in the regular expression.
cat matches the
"c", the
"a" (one time), and the
"t" in the regular expression.
ct matches the
"c", the
"a" (zero times), and the
"t" in the regular expression.
From:
https://linux.die.net/man/1/grep
"Repetition
A regular expression may be followed by one of several repetition operators:
?
The preceding item is optional and matched at most once.
*
The preceding item will be matched zero or more times.
+
The preceding item will be matched one or more times. "
Note that "bash" will use
* on the command line to match any string of characters in filenames.
To match any string of characters with a regex use:
.*
The
period is a regex metacharacter matching any character except newline,
and the asterisk metacharacter will expand matches to any length.
Since the asterisk is a globbing character to "bash", it should be escaped when entered on the bash command line with "grep".
Note that the asterisk is escaped in the command below so that bash does not expand the
'*'
$ grep ca'*'t regex_test
ct
cat
caat
The asterisk could also be escaped to bash via
'ca*t' or
ca\*t Escaping passes the
* to grep, instead of bash using it to glob filenames in the current directory.
Absent escaping, you could get unexpected results if the asterisk globs a filename:
$ ls
cannot regex_test test4
$ cat regex_test
ct
cat
caat
cannot
cart
chat
Below, the asterisk is NOT escaped, so bash expanded it to a filename:
$ grep ca*t regex_test
cannot
In the unescaped command above, filename "cannot" was globbed from
ca*t and passed to grep, making the grep command expand to "
grep cannot regex_test"
Since
the file "regex_test" contains the string "cannot" in its data, the
results of "grep" are correct for the given command, but that command
may have been unintended.
Compare the output of the unescaped * above, to the escaped '*' in the command below:
$ grep 'ca*t' regex_test
ct
cat
caat
If an unescaped asterisk is parsed by bash, but does not
expand to a filename, bash will pass the asterisk to grep (or whatever
other command was entered on the command line). But don't count on bash
not globbing a filename when that's not what you want--always
escape characters that are special to bash if you need them passed to
your command.
From
https://www.tldp.org/LDP/abs/html/globbingref.html
"Bash itself cannot recognize Regular Expressions. Inside
scripts, it is commands and utilities -- such as
sed and awk -- that interpret RE's.
Bash does carry out filename
expansion
[1]
-- a process known as globbing -- but
this does not use the standard RE set.
Instead, globbing recognizes and expands wild
cards. Globbing interprets the standard wild
card characters
[2]
-- * and
?, character lists in
square brackets, and certain other special characters (such
as ^ for negating the sense of a match).
There are important limitations on wild
card characters in globbing, however. Strings containing
* will not match filenames that
start with a dot, as, for example, .bashrc.
[3]
Likewise, the ? has a different
meaning in globbing than as part of an RE."
Lastly, this is old (but maintained), and a very good writeup on regular expressions:
http://www.grymoire.com/Unix/Regular.html#TOC
SOURCE
https://linux.die.net/man/1/grep