IT Questions and Answers :)

Saturday, December 7, 2019

This question is categorized "General Linux" because it has been tested on Linux with "grep". Which of the following strings is NOT matched by the regular expression 'ca*t'

This question is categorized "General Linux" because it has been tested on Linux with "grep". Which of the following strings is NOT matched by the regular expression 'ca*t'

  • cart
  • cat
  • caat
  • ct
This question is categorized "General Linux" because it has been tested on Linux with "grep". Which of the following strings is NOT matched by the regular expression 'ca*t'

EXPLANATION

My Terminal output.

[root@sanbox ~]# cat file | grep 'ca*t'
cat
caat
ctx`

Being able to match varying sets of characters is the first thing regular expressions can do that isn’t already possible with the methods available on strings. However, if that was the only additional capability of regexes, they wouldn’t be much of an advance. Another capability is that you can specify that portions of the RE must be repeated a certain number of times.
The first metacharacter for repeating things that we’ll look at is *. * doesn’t match the literal character *; instead, it specifies that the previous character can be matched zero or more times, instead of exactly once.
For example, ca*t will match ct (0 a characters), cat (1 a), caaat (3 a characters), and so forth. The RE engine has various internal limitations stemming from the size of C’s int type that will prevent it from matching over 2 billion a characters; you probably don’t have enough memory to construct a string that large, so you shouldn’t run into that limit.
Repetitions such as * are greedy; when repeating a RE, the matching engine will try to repeat it as many times as possible. If later portions of the pattern don’t match, the matching engine will then back up and try again with fewer repetitions.
A step-by-step example will make this more obvious. Let’s consider the expression a[bcd]*b. This matches the letter 'a', zero or more letters from the class [bcd], and finally ends with a 'b'. Now imagine matching this RE against the string abcbd.
Step Matched Explanation
1 a The a in the RE matches.
2 abcbd The engine matches [bcd]*, going as far as it can, which is to the end of the string.
3 Failure The engine tries to match b, but the current position is at the end of the string, so it fails.
4 abcb Back up, so that [bcd]* matches one less character.
5 Failure Try b again, but the current position is at the last character, which is a 'd'.
6 abc Back up again, so that [bcd]* is only matching bc.
6 abcb Try b again. This time the character at the current position is 'b', so it succeeds.
The end of the RE has now been reached, and it has matched abcb. This demonstrates how the matching engine goes as far as it can at first, and if no match is found it will then progressively back up and retry the rest of the RE again and again. It will back up until it has tried zero matches for [bcd]*, and if that subsequently fails, the engine will conclude that the string doesn’t match the RE at all.
Another repeating metacharacter is +, which matches one or more times. Pay careful attention to the difference between * and +; * matches zero or more times, so whatever’s being repeated may not be present at all, while + requires at least one occurrence. To use a similar example, ca+t will match cat (1 a), caaat (3 a’s), but won’t match ct.
There are two more repeating qualifiers. The question mark character, ?, matches either once or zero times; you can think of it as marking something as being optional. For example, home-?brew matches either homebrew or home-brew.
The most complicated repeated qualifier is {m,n}, where m and n are decimal integers. This qualifier means there must be at least m repetitions, and at most n. For example, a/{1,3}b will match a/b, a//b, and a///b. It won’t match ab, which has no slashes, or a////b, which has four.
You can omit either m or n; in that case, a reasonable value is assumed for the missing value. Omitting m is interpreted as a lower limit of 0, while omitting n results in an upper bound of infinity — actually, the upper bound is the 2-billion limit mentioned earlier, but that might as well be infinity.
Readers of a reductionist bent may notice that the three other qualifiers can all be expressed using this notation. {0,} is the same as *, {1,} is equivalent to +, and {0,1} is the same as ?. It’s better to use *, +, or ? when you can, simply because they’re shorter and easier to read.
Share:

0 comments:

Post a Comment

Popular Posts