javascript


JavaSript: Remove all non printable and all non ASCII characters from text

According to the ASCII character encoding, there are 95 printable characters in total.
Those characters are in the range [0x20 to 0x7E] ([32 to 126] in decimal) and they represent letters, digits, punctuation marks, and a few miscellaneous symbols.
Character 0x20 (or 32 in decimal) is the space character ' ' and
character 0x7E (or 126 in decimal) is the tilde character '~'.
Source: https://en.wikipedia.org/wiki/ASCII#Printable_characters

Since all the printable characters of ASCII are conveniently in one continuous range, we used the following to filter all other characters out of our string in JavaScript.

printable_ASCII_only_string = input_string.replace(/[^ -~]+/g, "");

What the above code does is that it passes the input string through a regular expression which will match all characters out of the printable range and replace them with nothing (hence, delete them).
In case you do not like writing your regular expression with the space character to it, you can re-write the above regular expression using the hex values of the two characters as follows:

printable_ASCII_only_string = input_string.replace(/[^\x20-\x7E]+/g, "");


PHP: Convert JavaScript-escaped Unicode characters to HTML hex references

There are cases where one might receive in PHP, escaped Unicode characters from the client side JavaScript. According to the RFC it is normal for JavaScript to convert characters to that format and in effect that we receive any character in the escaped format of \uXXXX in PHP.

Any character may be escaped.
If the character is in the Basic Multilingual Plane (U+0000 through U+FFFF),
then it may be represented as a six-character sequence:
a reverse solidus, followed by the lowercase letter u, followed by four hexadecimal digits that encode the character's code point.
The hexadecimal letters A though F can be upper or lowercase.

A sample input you might receive could look like this George\u2019s treasure box instead of George’s treasure box.

This kind of input should not be stored as is as it does not make sense to the HTML language, instead we should fix it up using preg_replace.

$decoded = preg_replace('/\\\\u([a-fA-F0-9]{4})/', '&#x\\1;', $input);

The above command will look for all instances of \uXXXX in the $input and it will replace each one with the appropriate character using the XXXX value that it will match.

What this part '/\\\\u([a-fA-F0-9]{4})/' of the code do is the following:

  • \\\\ – Find the character \ in the string, the reason we have four \ instead of one, is because it has special meaning in the regular expression and we have to escape it. For that reason we need to use two of them and get \\. After that, we need to escape each of them again due to the special meaning they have in PHP and we end up with four of them.
  • u – The previous step must be followed by a u character.
  • ([a-fA-F0-9]{4}) – After the previous step has matched, we need to match 4 characters. Each of them must be either a character from A-Z or a-z or 0-9.

This part '&#x\\1;' will:

  • &#x – Is a constant string that will print the characters &#x. These characters will instruct HTML to print the character that will occur using hexadecimal entity reference that will follow.
  • \\1 – Contains the reference of the 1st parenthesized pattern. In this case we only have a parenthesis around the XXXX part of the \uXXXX so \\1 will be replaced with the XXXX value.

[JavaScript] Get values from URL parameter

The following code will get the query parameters of a URL and assign them to an object called QueryMap.

// This function is anonymous, is executed immediately and the return value is assigned to QueryMap.
var QueryMap = function () {
	var query_map = {};
	//The search property sets or returns the querystring part of a URL, including the question mark (?).
	//So we remove the question mark by removing the first characted of the string.
	var query = window.location.search.substring(1);
	var variables = query.split('&');
	for (var i = 0; i < variables.length; i++) {
		var position = variables[i].indexOf('=');
		var key = variables[i].substring(0, position);
		var value = variables[i].substring(position + 1);
		// If it is the first entry with this name
		if (typeof query_map[key] === "undefined") {
			query_map[key] = decodeURIComponent(value);
			// If it is the second entry with this name we change the value to an array of values
		}
		// If it is the second entry with this name we change the value to an array of values
		else if (typeof query_map[key] === "string") {
			var array = [query_map[key], decodeURIComponent(value)];
			query_map[key] = array;
		}
		// If it is the third or later entry with this name we just add to the array
		else {
			query_map[key].push(decodeURIComponent(value));
		}
	}
	return query_map;
}();

This code will handle cases where the equals sing (=) is part of the value of a variable.

Examples:

If the URL is http://example.com?a=1
Then QueryMap.a will contain value “1”.

If the URL is http://example.com?a=1&b=2
Then QueryMap.a will contain value “1” and QueryMap.b will contain value “2”.

If the URL is http://example.com?a=1=0001&b=2
Then QueryMap.a will contain value “1=0001” and QueryMap.b will contain value “2”.

If the URL is http://example.com?a=1=0001&b=2&b=0010
Then QueryMap.a will contain value “1=0001” and QueryMap.b will contain an array with the values “2” and “0010”.