regex


Remove all non digit characters from String

The following Java snippet removes all non-digit characters from a String.
Non-digit characters are any characters that are not in the following set [0, 1, 2, 3, 4 ,5 ,6 ,7 ,8, 9].


myString.replaceAll("\\D", "");

For a summary of regular-expression constructs and information on the character classes supported by Java pattern visit the following link.

The \\D pattern that we used in our code is a predefined character class for non-digit characters. It is equivalent to [^0-9]that negates the predefined character class for digit characters [0-9].


JavaSript: Remove all non printable and all non ASCII characters from text 1

According to the ASCII character encoding, there are 95 printable characters in total.
Those characters are in the range [0x20 to 0x7E] ([32 to 126] in decimal) and they represent letters, digits, punctuation marks, and a few miscellaneous symbols.
Character 0x20 (or 32 in decimal) is the space character ' ' and
character 0x7E (or 126 in decimal) is the tilde character '~'.
Source: https://en.wikipedia.org/wiki/ASCII#Printable_characters

Since all the printable characters of ASCII are conveniently in one continuous range, we used the following to filter all other characters out of our string in JavaScript.


printable_ASCII_only_string = input_string.replace(/[^ -~]+/g, "");

What the above code does is that it passes the input string through a regular expression which will match all characters out of the printable range and replace them with nothing (hence, delete them).
In case you do not like writing your regular expression with the space character to it, you can re-write the above regular expression using the hex values of the two characters as follows:


printable_ASCII_only_string = input_string.replace(/[^\x20-\x7E]+/g, "");


How to find lines that contain only lowercase characters

To print all lines that contain only lower case characters, we used the following regular expression in grep:


egrep '^[[:lower:]]+$' <file>;
#If you do not have egrep, use
grep -e '^[[:lower:]]+$' <file>;

Breakdown of the above regular expression:

  • ^ instructs the regular expression parser that the pattern should always start with the beginning of the line
  • [[:lower:]] this special instruction informs us that only lower case characters can match it
  • + the plus sign causes the preceding token to be matched one or more times
  • $ signifies the end of the line

Regular expression to match any ASCII character

The following regular expression will match any ASCII character (character values [0-127]).

[\x00-\x7F]

The next regular expression makes the exact opposite match, it will match any character that is NOT ASCII (character values greater than 127).

[^\x00-\x7F]

 

gEdit - regular expression to match any ASCII character

gEdit – regular expression to match any ASCII character

gEdit - regular expression to match any Non-ASCII character

gEdit – regular expression to match any Non-ASCII character