Yearly Archives: 2021


AttributeError: module ‘html5lib.treebuilders’ has no attribute ‘_base’

Recently, we were receiving the above error on a GNU/Linux Ubuntu 20.04 LTS.

AttributeError: module 'html5lib.treebuilders' has no attribute '_base'

We had installed beautifulsoup4 and html5lib using pip. To solve the issue, we had to uninstall the html5lib that was installed by pip and install it through apt.

# Remove html5lib if you have installed it with pip3:
pip3 uninstall html5lib;

pip3 install --upgrade beautifulsoup4
sudo apt-get install python3-html5lib

How to check if PyTorch is using the GPU?

The following basic code, will import PyTorch into a project and test if GPU capabilities are enabled and available.

import torch

# Should produce "True"
torch.cuda.is_available()

# Should produce the number of available devices, if you have one device it should produce the value "1"
torch.cuda.device_count()

# If there is a device and it is the first one, it should produce the "0"
torch.cuda.current_device()

# Assuming that there is at least one device, with the following two commands we will get some information on the first available device

# Should produce something similar to "<torch.cuda.device object at 0x7f12b1a298d0>"
torch.cuda.device(0)

# Should produce something similar to "'GeForce GTX 1050 Ti'"
torch.cuda.get_device_name(0)

We are using Ubuntu 20.04 LTS and the NVidia drivers were installed automatically during installation.


OpenCV error: the function is not implemented

Recently, we were working on a python 3 project that was using opencv. When we used conda defaults repository to install opencv it installed a 3.X.Y version which would produce the following error on PyCharm on Ubuntu 20.04 LTS:

pycharm cv2.error: OpenCV(3.4.2) highgui/src/window.cpp:710: error: (-2:Unspecified error) The function is not implemented. Rebuild the library with Windows, GTK+ 2.x or Carbon support. If you are on Ubuntu or Debian, install libgtk2.0-dev and pkg-config, then re-run cmake or configure script in function 'cvStartWindowThread'

Following the instructions above, we installed the missing packages using sudo apt-get install libgtk2.0-dev pkg-config; with no change on the results. To fix the issue, we completely removed opencv from the stables repository and we installed version 4.5.1 from the conda-forge repository as follows:

conda remove opencv;
conda install -c conda-forge opencv=4.5.1;

How to optimize apache web server for maximum concurrent connections or increase max clients in apache

Apache Performance Tuning

Apache 2.x is a general-purpose webserver designed to balance flexibility, portability, and performance. Although it has not been designed specifically to set benchmark records, Apache 2.x can perform high performance in many real-world situations.

Compared to Apache 1.3, release 2.x contains many additional optimizations to increase throughput and scalability. Most of these improvements are enabled by default. However, there are compile-time and run-time configuration choices that can significantly affect performance. MPM tuning is one of the upgrades that has been made in apache 2.x.

What are MPM’s

  • It modifies the basic functionality of the apache server related to multi-thread & multi-processes style of working.
  • It must be built into apache at compilation with http_core and mod_so modules.
  • Only one MPM can be loaded into the server at any time.

Choosing an MPM

Apache 2.x supports pluggable concurrency models, called Multi-Processing Modules (MPMs). When building Apache, you must choose an MPM to use. There are platform-specific MPMs for some platforms: beos, mpm_netware, mpmt_os2, and mpm_winnt. For general Unix-type systems, there are several MPMs from which to choose. The choice of MPM can affect the speed and scalability of the httpd. There are usually 2 types of MPM’s. Nowadays people tend to install apache 2.x that users worker MPM.

Different types of MPM

There are 2 types of MPM’s

mpm_prefork_module

  • Apache 1.3 based.
  • The prefork MPM uses multiple child processes, and each child process has only one thread, and ultimately one process/thread is handling one connection at a time.
  • Used for security and stability.
  • Has higher memory consumption and lower performance over the newer Apache 2.0-based threaded MPMs.

mpm_worker_module

  • Apache 2.0 based.
  • The worker MPM uses multiple child processes, and each child process can have many threads, and each thread handles one connection at a time.
  • Worker generally is a good choice for high-traffic servers because it has a smaller memory footprint than the prefork MPM and has higher performance.
  • It does not provide the same level of isolation, i.e., request-to-request, as a process-based MPM does.

This newer Multi-Processing Module (MPM) implements a hybrid multi-process multi-threaded server. Using threads to serve requests can serve a large number of requests with fewer system resources than a process-based server. However, it retains much of a process-based server’s stability by keeping multiple processes available, each with many threads.

The most important directives used to control this MPM are ThreadsPerChild, which controls the number of threads deployed by each child process, and MaxClients, which controls the maximum total number of threads that may be launched.

How it works

Usually, I require that I want my apache to serve many concurrent users. Each user can fire a single request to an apache server, or a single user can fire many requests example, where a web page is requesting many image/javascript/css files. So now I want to increase this number.

This is the default worker-mpm configuration and i will tell, how to change parameters with increasing number of concurrent connections/requests/users

ServerLimit 16
StartServers 2
MaxClients 150
MinSpareThreads 25
MaxSpareThreads 75
ThreadsPerChild 25

First of all, whenever an apache is started, it will start 2 child processes determined by the StartServers parameter. Then each process will start with 25 threads determined by the ThreadsPerChild parameter, so this means 2 processes can service only 50 concurrent connections/clients, i.e., 25×2=50. If more concurrent users come, then another child process will start to serve another 25 users. But how many child processes can be started is controlled by the ServerLimit parameter. This means that in the configuration above, I can have 16 child processes in total, with each child process can handle 25 thread, in total handling 16×25=400 concurrent users. But if the number defined in MaxClients is less, which is 200 here, then this means that after 8 child processes, no extra process will start since we have defined an upper cap of MaxClients. This also means that if I set MaxClients to 1000, after 16 child processes and 400 connections, no extra process will start, and we cannot service more than 400 concurrent clients even if we have increased the MaxClient parameter. In this case, we need to also increase ServerLimit to 1000/25, i.e., MaxClients/ThreadsPerChild=40

So this is the optmized configuration to server 1000 clients

# worker MPM
# StartServers: initial number of server processes to start
# MaxClients: maximum number of simultaneous client connections
# MinSpareThreads: minimum number of worker threads which are kept spare
# MaxSpareThreads: maximum number of worker threads which are kept spare
# ThreadsPerChild: constant number of worker threads in each server process
# MaxRequestsPerChild: maximum number of requests a server process serves
<IfModule mpm_worker_module>
  ServerLimit         40
  StartServers        2
  MaxClients          1000
  MinSpareThreads     25
  MaxSpareThreads     75
  ThreadsPerChild     25
  MaxRequestsPerChild   0
</IfModule>

Hardware and Operating System Issues with apache

The single biggest hardware issue affecting webserver performance is RAM. A webserver should never have to swap, as swapping increases the latency of each request beyond a point that users consider “fast enough.” This causes users to hit stop and reload, further increasing the load. You can, and should, control the MaxClients setting so that your server does not spawn so many children it starts swapping. This procedure for doing this is simple: determine the size of your average Apache process by looking at your process list via a tool such as top, and divide this into your total available memory, leaving some room for other processes.

http://web.archive.org/web/20160415001028/http://www.genericarticles.com/mediawiki/index.php?title=How_to_optimize_apache_web_server_for_maximum_concurrent_connections_or_increase_max_clients_in_apache

Source