string

GNU Linux/Bash: A function that splits a word in half

11 August 2022 in Bash / GNU/Linux tagged bash / fedora / function / GNU/Linux / howtos / line / split / string / text / ubuntu by Tux

The following function takes one argument – a text file.
The text file should contain one word on each line.
The function reads the text file (argument) line by line.
Then it checks if the line has one word; if this is true, it splits the word in half.
Finally, it prints the two new words with a space between them.

#!/bin/bash

splitWordsInHalf () {
  # This function takes one argument - a text file.
  # The text file contains one word on each line.
  # It reads the text file (argument) line by line.
  # Then it checks if the line contains one word, if this is true, it splits the word in half.
  # Finally, it prints the two new words with a space between them
  while read line
  do
    words=( $line )
    if [ ${#words[@]} == 1 ]
    then
      echo ${line:0:${#line}/2} ${line:${#line}/2}
    fi
  done < $1
}

splitWordsInHalf input.txt

Example

Using the following input file:

banana
apple
ball
car
door

We will get the following output when we execute splitWordsInHalf input.txt:

ban ana
ap ple
ba ll
c ar
do or

Notes

The following parts of the code are in charge of looping on the data of the incoming file. The parameter (the input file) given to the function is translated into the variable $1. The while loop gets one line of text on each iteration and assigns the text to the variable that is named line. You could have chosen any other name that suits you instead of the word line.

splitWordsInHalf () {
  while read line
  do
    ...
  done < $1
}

The next part of the code (words=( $line )) converts the string value that is contained in the line variable into an array of words, and it assigns that array to the variable named words. Then, it counts the number of elements in the array (the number of words in the line) using the following ${#words[@]} and it checks that there is only one item.

words=( $line )
if [ ${#words[@]} == 1 ]
then
  ...
fi

The following line will print two strings. The first string is a sub-string of variable line that is composed by the first half of the value. The second sub-string is the second half of the value contained in the variable named line.

echo ${line:0:${#line}/2} ${line:${#line}/2}

The ${#line} will return the length of the string contained in the variable.

The structure ${VARIABLE:START:END} defines the slice of the string that we want returned.

Remove all non digit characters from String

21 December 2017 in Java tagged digit / filter / java / numbers / numeric / pattern / regex / regular expression / replaceAll / string by Tux

The following Java snippet removes all non-digit characters from a String.
Non-digit characters are any characters that are not in the following set [0, 1, 2, 3, 4 ,5 ,6 ,7 ,8, 9].


myString.replaceAll("\\D", "");

For a summary of regular-expression constructs and information on the character classes supported by Java pattern visit the following link.

The \\D pattern that we used in our code is a predefined character class for non-digit characters. It is equivalent to [^0-9]that negates the predefined character class for digit characters [0-9].

Java: remove leading character (or any prefix) from String only if it matches 1

19 December 2017 in Java tagged ?: ternary operator / haystack / java / length / needle / startsWith / string / substring by Tux

The following snippet allows you to check if a String in Java starts with a specific character (or a specific prefix) that you are looking for and remove it.
To keep the number of lines small, we used the Java ?: ternary operator, which defines a conditional expression in a compact way.
Then, to quickly check if the first character(s) is the one we are looking for before removing it we used the String.startsWith().
To compute the number of characters to remove we used the String.length() on the needle.
Finally, to strip the leading character(s) we used String.substring() whenever the prefix matched.


final String needle = "//";
final int needleSize = needle.length();
String haystack = "";
haystack = haystack.startsWith(needle) ? haystack.substring(needleSize) : haystack;
System.out.print(haystack);

haystack = "apple";
haystack = haystack.startsWith(needle) ? haystack.substring(needleSize) : haystack;
System.out.println(haystack);

haystack = "//banana";
haystack = haystack.startsWith(needle) ? haystack.substring(needleSize) : haystack;
System.out.println(haystack);

The above code will result in the following:


apple
banana

Bonus: to remove all the instances of a character anywhere in a string:


final String needle = "a";
final int needleSize = needle.length();
String haystack = "banana";
haystack = haystack.replace(needle,"");
System.out.println(haystack);

The above code will result in the following:

bnn

Bonus: to remove the first instance of a character anywhere in a string:


final String needle = "a";
final int needleSize = needle.length();
String haystack = "banana";
haystack = haystack.replaceFirst(needle,"");
System.out.println(haystack);

The above code will result in the following:

bnana

JavaSript: Remove all non printable and all non ASCII characters from text 1

18 November 2017 in JavaScript tagged ascii / characters / delete / javascript / printable / regex / replace / string by Tux

According to the ASCII character encoding, there are 95 printable characters in total.
Those characters are in the range [0x20 to 0x7E] ([32 to 126] in decimal) and they represent letters, digits, punctuation marks, and a few miscellaneous symbols.
Character 0x20 (or 32 in decimal) is the space character ' ' and
character 0x7E (or 126 in decimal) is the tilde character '~'.
Source: https://en.wikipedia.org/wiki/ASCII#Printable_characters

Since all the printable characters of ASCII are conveniently in one continuous range, we used the following to filter all other characters out of our string in JavaScript.


printable_ASCII_only_string = input_string.replace(/[^ -~]+/g, "");

What the above code does is that it passes the input string through a regular expression which will match all characters out of the printable range and replace them with nothing (hence, delete them).
In case you do not like writing your regular expression with the space character to it, you can re-write the above regular expression using the hex values of the two characters as follows:


printable_ASCII_only_string = input_string.replace(/[^\x20-\x7E]+/g, "");

Bytefreaks.net – a place for hacks

Bytefreaks.net – a place for hacks

string

GNU Linux/Bash: A function that splits a word in half

Example

Notes

Remove all non digit characters from String

Java: remove leading character (or any prefix) from String only if it matches 1

Bonus: to remove all the instances of a character anywhere in a string:

Bonus: to remove the first instance of a character anywhere in a string:

JavaSript: Remove all non printable and all non ASCII characters from text 1