ubuntu


Testing free Text to Speech engines on Ubuntu GNU/Linux

Recently, we decided to test a few free text to speech engines (TtS) on GNU/Linux. We were curious on what the current capabilities that are available as we wanted to create a few videos. To play a bit, we tested espeak, festival and pico. For this reason, we created a text file (called text.txt) and added the following content to it:

I triple E has a lot of scholarships, awards, and opportunities, but it doesn't have a centralized site where members can quickly identify the right ones.
Many problems arise as a result of the lack of this platform.
One crucial issue is that many people are unaware of specific opportunities until it is too late. Many projects are squandered each year because there is insufficient knowledge about these opportunities, resulting in low participation.
Another critical difficulty is having to start over with each application. Many people find it frustrating, and it prevents them from doing so.
The lack of real-time support to answer issues while an applicant is applying is critical, leading to discouragement and abandonment of the application process.
Providing references is a topic that many individuals are uncomfortable with. They are embarrassed to seek references that need to learn new systems and maybe answer the same questions posed in other ways.

Our solution is utilizing the Collabratec platform and storing all of these opportunities there:
Collabratec already has numerous key capabilities in place, lowering development costs.
Each application may have its own community or working group where an applicant can seek special clarifications or support. Collabratec will save money on development by repurposing existing technology. It will also give such features a new purpose.
Through those working groups, experienced members can share their knowledge and potentially coach applicants during their application process. Many members would be willing to help others attain their objectives, especially after they've gone through the process and understand the frustrations others are experiencing. We could utilize badges to reward individuals who aid others and those who apply for these possibilities, which is a frequent practice in Collabratec to make certain members stand out. This approach will assist members in getting to know one another and expanding their network outside their geographic zones, resulting in a genuinely global I triple E experience.
People who create opportunities can utilize the I triple E profile of a user to pre-populate elements of their application. As a result, the applicants' effort will be reduced because they will only fill in the questions related to that particular opportunity.
Without any additional work, the system may reuse earlier references. Assume that a reference has to be updated or validated to ensure that it is still valid. In that situation, the system may send an automatic notification to that person, asking them to approve, alter, or delete their earlier contribution.
Because users can readily share each application form and the corresponding working group information, Collabratec's capabilities as a social network will significantly enhance each opportunity's reach and all related documents, public comments, and discussions.

espeak

We started off with espeak and we used the following commands to test it:

# Command to install espeak;
sudo apt install espeak;
# Command that reads the text.txt file creates an audio file from its content.
espeak -f text.txt -w espeak.wav;

The result from espeak is below:

espeak definitely does not sound human-like. It is a fun tool if you need to create an audio file that sounds robotic! In our case, it was not a solution as we wanted to use a long text, listening to a robotic voice for a lot of time can be tiring.

festival

After that, we tested the text2wave tool of festival as follows:

sudo apt install festival;
cat text.txt | text2wave -o festival.wav;

The results from festival/text2wave are the following:

festival does sound a bit better than espeak, it’s almost smooth but not quite. You can easily tell that this is a computer-generated voice.

pico

Finally, we tested pico utilities. We set it up and used it as follows:

sudo apt install libttspico-utils;
pico2wave -l en-US -w test.wav "$(cat text.txt)";

The results of pico2wave were pretty good! Not perfect but still good! The voice is nearly human-like and fairly smooth. Below is the result of our test:

From the three utilities, pico was the most human-like and it fit our needs more. With this tool, we will be able to create certain videos with narration without being too annoying.

Other information

To create the videos, we used ffmpeg. As in the following commands, we combined the audio wave files with static images that were looped forever.

ffmpeg -loop 1 -i TtS-pico2wave.png -i test.wav -c:a aac -c:v libx264 -pix_fmt yuv420p -shortest TtS-pico2wave.mp4;
ffmpeg -loop 1 -i TtS-espeak.png -i espeak.wav -c:a aac -c:v libx264 -pix_fmt yuv420p -shortest TtS-espeak.mp4;
ffmpeg -loop 1 -i TtS-festival.png -i festival.wav -c:a aac -c:v libx264 -pix_fmt yuv420p -shortest TtS-festival.mp4;

Playing with MASK RCNN on videos .. again

Source code for the implementation that created this video will be uploaded soon.

A first attempt at using a pre-trained implementation of Mask R-CNN on Python 3, Keras, and TensorFlow. The model generates bounding boxes and segmentation masks for each instance of an object in each frame. It’s based on Feature Pyramid Network (FPN) and a ResNet101 backbone.

Setup

Conda / Anaconda

First of all, we installed and activated anaconda on an Ubuntu 20.04LTS desktop. To do so, we installed the following dependencies from the repositories:

sudo apt-get install libgl1-mesa-glx libegl1-mesa libxrandr2 libxrandr2 libxss1 libxcursor1 libxcomposite1 libasound2 libxi6 libxtst6;

Then, we downloaded the 64-Bit (x86) Installer from (https://www.anaconda.com/products/individual#linux).

Using a terminal, we followed the instructions here (https://docs.anaconda.com/anaconda/install/linux/) and performed the installation.

Python environment and OpenCV for Python

Following the previous step, we used the commands below to create a virtual environment for our code. We needed python version 3.9 (as highlighted here https://www.anaconda.com/products/individual#linux) and OpenCV for python.

source ~/anaconda3/bin/activate;
conda create --name MaskRNN python=3.9;
conda activate MaskRNN;
pip install numpy opencv-python;

Problems that we did not anticipate

When we tried to execute our code in the virtual environment:

python3 main.py --video="/home/bob/Videos/Live @ Santa Claus Village 2021-11-13 12_12.mp4";

We got the following error:

Traceback (most recent call last):
  File "/home/bob/MaskRCNN/main.py", line 6, in <module>
    from cv2 import cv2
  File "/home/bob/anaconda3/envs/MaskRNN/lib/python3.9/site-packages/cv2/__init__.py", line 180, in <module>
    bootstrap()
  File "/home/bob/anaconda3/envs/MaskRNN/lib/python3.9/site-packages/cv2/__init__.py", line 152, in bootstrap
    native_module = importlib.import_module("cv2")
  File "/home/bob/anaconda3/envs/MaskRNN/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
ImportError: libGL.so.1: cannot open shared object file: No such file or directory

We realized that we were missing some additional dependencies for OpenCV as our Ubuntu installation was minimal. To fix this issue, we installed the following package from the repositories:

sudo apt-get update;
sudo apt-get install -y python3-opencv;

Some notes on how to record audio from a terminal in Ubuntu 20.04LTS

Recently, we were trying to record the audio that was played on the system speakers using an Ubuntu 20.04LTS desktop. In the installation, there was no dedicated audio recorder installed and we did not want to install any. To record, we used the following command to get the list of all audio sources available to the system:

pactl list short sources;

The pactl command produced results like so:

3	alsa_output.pci-0000_00_1f.3.analog-stereo.monitor	module-alsa-card.c	s16le 2ch 44100Hz	SUSPENDED
4	alsa_input.pci-0000_00_1f.3.analog-stereo	module-alsa-card.c	s16le 2ch 44100Hz	SUSPENDED
10	alsa_input.usb-Dell_DELL_PROFESSIONAL_SOUND_BAR_AE515-00.iec958-stereo	module-alsa-card.c	s16le 2ch 44100Hz	SUSPENDED
12	alsa_output.usb-Dell_DELL_PROFESSIONAL_SOUND_BAR_AE515-00.analog-stereo.monitor	module-alsa-card.c	s16le 2ch 44100Hz	IDLE

We knew that the system was using the Dell soundbar in analog mode to play the music (as we could see in the Settings under the Sound category, which is depicted below), so we copied the following name from the line that starts with the number 12:

alsa_output.usb-Dell_DELL_PROFESSIONAL_SOUND_BAR_AE515-00.analog-stereo.monitor

Then we used that monitor name as an input device for FFmpeg like so:

ffmpeg -f pulse -i alsa_output.usb-Dell_DELL_PROFESSIONAL_SOUND_BAR_AE515-00.analog-stereo.monitor test.mp3;

When we were done recording, we pressed CTRL+C to stop the recording.


Ubuntu: The file content is encrypted, but currently not supported

While trying to extract a password protected/encrypted 7-Zip archive on a fresh Ubuntu 20.04 LTS, we got the following error:

An error occurred while extracting files. The file content is encrypted, but currently not supported.

To fix the problem, we just installed the p7zip-full package using the apt command.

sudo apt-get install p7zip-full;

After that, we were prompted as expected to input the decryption password and we were able to extract the archive.

Below is the information of the package that was retrieved by apt info.

sudo apt info p7zip-full
[sudo] password for bob: 
Package: p7zip-full
Version: 16.02+dfsg-7build1
Priority: optional
Section: universe/utils
Source: p7zip
Origin: Ubuntu
Maintainer: Ubuntu Developers <[email protected]>
Original-Maintainer: Robert Luberda <[email protected]>
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Installed-Size: 4887 kB
Depends: p7zip (= 16.02+dfsg-7build1), libc6 (>= 2.14), libgcc-s1 (>= 3.0), libstdc++6 (>= 5)
Suggests: p7zip-rar
Breaks: p7zip (<< 15.09+dfsg-3~)
Replaces: p7zip (<< 15.09+dfsg-3~)
Homepage: http://p7zip.sourceforge.net/
Task: kubuntu-desktop, kubuntu-full, xubuntu-desktop, lubuntu-desktop, ubuntustudio-desktop, ubuntukylin-desktop, ubuntu-mate-core, ubuntu-mate-desktop
Download-Size: 1187 kB
APT-Manual-Installed: yes
APT-Sources: http://cy.archive.ubuntu.com/ubuntu focal/universe amd64 Packages
Description: 7z and 7za file archivers with high compression ratio
 p7zip is the Unix command-line port of 7-Zip, a file archiver that
 handles the 7z format which features very high compression ratios.
 .
 p7zip-full provides utilities to pack and unpack 7z archives within
 a shell or using a GUI (such as Ark, File Roller or Nautilus).
 .
 Installing p7zip-full allows File Roller to use the very efficient 7z
 compression format for packing and unpacking files and directories.
 Additionally, it provides the 7z and 7za commands.
 .
 List of supported formats:
   - Packing / unpacking: 7z, ZIP, GZIP, BZIP2, XZ and TAR
   - Unpacking only: APM, ARJ, CAB, CHM, CPIO, CramFS, DEB, DMG, FAT,
     HFS, ISO, LZH, LZMA, LZMA2, MBR, MSI, MSLZ, NSIS, NTFS, RAR (only
     if non-free p7zip-rar package is installed), RPM, SquashFS, UDF,
     VHD, WIM, XAR and Z.
 .
 The dependent package, p7zip, provides 7zr, a light version of 7za,
 and p7zip, a gzip-like wrapper around 7zr.