
Technical Deep Dive: Multi-Method Face and Person Detection in Python

In this technical post, we’ll dissect a Python script integrating several libraries and techniques for detecting faces and people in video footage. This script is an excellent example of how diverse computer vision tools can be merged to produce a robust solution for image analysis.

# import the necessary packages
import numpy as np
import cv2
import sys
import os
from datetime import datetime
import face_recognition
import dlib

inputVideo = sys.argv[1];
basenameVideo = os.path.basename(inputVideo);
outputDirectory = sys.argv[2];
datetimeNow = datetime.now().strftime("%m-%d-%Y %H:%M:%S");

#Creating the folder to save the output
videoOutputDirectory = outputDirectory + '/' + datetimeNow + '/' + basenameVideo + '/';

# initialize the HOG descriptor/person detector
hog = cv2.HOGDescriptor()

faceCascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_frontalface_default.xml");

# Initialize face detector, facial landmarks detector and face recognizer
faceDetector = dlib.get_frontal_face_detector()


## open webcam video stream
#cap = cv2.VideoCapture(0)
# create a VideoCapture object
cap = cv2.VideoCapture(inputVideo)

frameIndex = 0;

	# Capture frame-by-frame
	ret, frame = cap.read()

	# using a greyscale picture, also for faster detection
	gray = cv2.cvtColor(frame, cv2.COLOR_RGB2GRAY)

	if True:
		# detect people in the image
		persons, weights = hog.detectMultiScale(frame, winStride=(8,8) )

		persons = np.array([[x, y, x + w, y + h] for (x, y, w, h) in persons])
		print("[INFO][1][{0}] Found {1} Persons.".format(frameIndex, len(persons)));

		for (left, top, right, bottom) in persons:
			print("A person is located at pixel location Top: {}, Left: {}, Bottom: {}, Right: {}".format(top, left, bottom, right))
			match_image = frame[top:bottom, left:right];
			cv2.imwrite(videoOutputDirectory + str(frameIndex) + '_(' + str(top) + ',' + str(right) + ')(' + str(bottom) + ',' + str(left) + ')_persons_M1.jpg', match_image);

	if True:
		faces = faceCascade.detectMultiScale(
			minSize=(50, 50)

		faces = np.array([[x, y, x + w, y + h] for (x, y, w, h) in faces])
		print("[INFO][2][{0}] Found {1} Faces.".format(frameIndex, len(faces)));

		for (left, top, right, bottom) in faces:
			print("A face is located at pixel location Top: {}, Left: {}, Bottom: {}, Right: {}".format(top, left, bottom, right))
			match_image = frame[top:bottom, left:right];
			cv2.imwrite(videoOutputDirectory + str(frameIndex) + '_(' + str(top) + ',' + str(right) + ')(' + str(bottom) + ',' + str(left) + ')_faces_M2.jpg', match_image);

	if True:
		faces = face_recognition.face_locations(frame);
		print("[INFO][3][{0}] Found {1} Faces.".format(frameIndex, len(faces)));

		for (top, right, bottom, left) in faces:
			#print("[INFO] Object found. Saving locally.");
			print("A face is located at pixel location Top: {}, Left: {}, Bottom: {}, Right: {}".format(top, left, bottom, right))
			match_image = frame[top:bottom, left:right];
			cv2.imwrite(videoOutputDirectory + str(frameIndex) + '_(' + str(top) + ',' + str(right) + ')(' + str(bottom) + ',' + str(left) + ')_faces_M3.jpg', match_image);

	if True:
		faces = face_recognition.face_locations(frame, model="cnn");
		print("[INFO][4][{0}] Found {1} Faces.".format(frameIndex, len(faces)));

		for (top, right, bottom, left) in faces:
			#print("[INFO] Object found. Saving locally.");
			print("A face is located at pixel location Top: {}, Left: {}, Bottom: {}, Right: {}".format(top, left, bottom, right))
			match_image = frame[top:bottom, left:right];
			cv2.imwrite(videoOutputDirectory + str(frameIndex) + '_(' + str(top) + ',' + str(right) + ')(' + str(bottom) + ',' + str(left) + ')_faces_M4.jpg', match_image);

	if True:
		# detect faces in image
		faces = faceDetector(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))

		print("[INFO][5][{0}] Found {1} Faces.".format(frameIndex, len(faces)));
		# Now process each face we found
		for k, face in enumerate(faces):
			top = face.top()
			bottom = face.bottom()
			left = face.left()
			right = face.right()
			print("A face is located at pixel location Top: {}, Left: {}, Bottom: {}, Right: {}".format(top, left, bottom, right))
			match_image = frame[top:bottom, left:right];
			cv2.imwrite(videoOutputDirectory + str(frameIndex) + '_(' + str(top) + ',' + str(right) + ')(' + str(bottom) + ',' + str(left) + ')_faces_M5.jpg', match_image);
	frameIndex += 1

# When everything done, release the capture

Core Libraries and Initial Setup

The script begins by importing several critical libraries:

  • numpy: Essential for numerical computations in Python.
  • cv2 (OpenCV): A cornerstone in computer vision projects.
  • sys and os: For system-level operations and file management.
  • datetime: To handle date and time operations, crucial for timestamping.
  • face_recognition: A high-level facial recognition library.
  • dlib: A toolkit renowned for its machine learning and image processing capabilities.

Video File Handling

The script processes a video file whose path is passed as a command-line argument. It extracts the file name and prepares a unique output directory using the current date and time. This approach ensures that outputs from different runs are stored separately, avoiding overwrites and confusion.

Methodological Overview

The script showcases five distinct methodologies for detecting faces and people:

  1. HOG Person Detector with OpenCV: Uses the Histogram of Oriented Gradients (HOG) descriptor combined with a Support Vector Machine (SVM) for detecting people.
  2. Haar Cascade for Face Detection: Employs OpenCV’s Haar Cascade classifier, a widely-used method for face detection.
  3. Face Detection Using face_recognition (Method 1): Implements the face_recognition library’s default face detection technique.
  4. CNN-Based Face Detection Using face_recognition (Method 2): Utilizes a Convolutional Neural Network (CNN) model within the face_recognition library for face detection.
  5. Dlib’s Frontal Face Detector: Applies Dlib’s frontal face detector, effective for detecting faces oriented towards the camera.

Processing Workflow

The script processes the video on a frame-by-frame basis. For each frame, it:

  • Converts the frame to grayscale when necessary. This conversion can speed up detection in methods that don’t require color information.
  • Sequentially applies each of the five detection methods.
  • For each detected face or person, it outputs the coordinates and saves a cropped image of the detection to the output directory.

Iterative Frame Analysis

The script employs a loop to process each frame of the video. It includes a frame index to keep track of the number of frames processed, which is particularly useful for debugging and analysis purposes.

Resource Management

After processing the entire video, the script releases the video capture object, ensuring that system resources are appropriately freed.

Key Takeaways

This script is a rich demonstration of integrating various face and person detection techniques in a single Python application. It highlights the versatility and power of Python in handling complex tasks like video processing and computer vision. This analysis serves as a guide for developers and enthusiasts looking to understand or venture into the realm of image processing with Python.

Understanding the cURL Command for Performance Metrics

Breaking Down the Command

In the world of web development and network administration, the cURL command is a versatile tool used for transferring data using various protocols. One interesting application of this command is to measure the performance of a web server. Let’s dissect a specific cURL command to understand how it works:

curl -svo /dev/null -w "Connect: %{time_connect} \n TTFB: %{time_starttransfer} \n Total time: %{time_total} \n" https://bytefreaks.net/;

Components of the Command

  1. curl: This is the basic command call for using cURL, which initiates the data transfer.
  2. -sv: The -s flag stands for ‘silent’, which makes cURL less talkative by hiding the progress meter and error messages. The -v flag is for ‘verbose’, providing more information about the transaction. These flags might seem contradictory, but together, they suppress unnecessary details while keeping the essential info visible.
  3. /dev/null: This part redirects the output of the command to a special file that discards all data written to it. In essence, it’s used here to ignore the body of the response.
  4. -w: This flag is used to specify what data to display on the screen after the execution of the command. It stands for ‘write-out’.
  5. "Connect: %{time_connect} \n TTFB: %{time_starttransfer} \n Total time: %{time_total} \n": This is a formatted string that curl will use to display the timing statistics:
    • %{time_connect}: Shows the time it took to establish the connection to the server.
    • %{time_starttransfer}: Stands for ‘Time to First Byte’ (TTFB), indicating the time from the start until the first byte is received.
    • %{time_total}: Displays the total time taken for the operation.
  6. https://bytefreaks.net/: This is the URL to which the cURL request is made.

Practical Use

This command is particularly useful for testing the performance of web servers. By analyzing the connect time, TTFB, and total time, administrators and developers can get insights into potential bottlenecks or performance issues. For instance, a long TTFB might suggest server-side delays in processing requests.


The cURL command demonstrated here is a powerful tool for performance testing. It’s concise yet provides crucial metrics for understanding how a web server responds to requests. By mastering such commands, one can effectively monitor and optimize web server performance, ensuring better user experiences and efficient server management.

Bug Fix: Resolving TypeError in Face Descriptor Computation

In the realm of facial recognition software development, it’s not uncommon to encounter a TypeError when integrating different libraries. We recently addressed an issue with the following error message:

TypeError: compute_face_descriptor(): incompatible function arguments.

This error resulted from a misalignment in the expected color format of image data between OpenCV and the face_recognition library. Here’s how we resolved it.

The Problem

OpenCV, a powerful library for image processing, represents images in BGR (Blue, Green, Red) color space by default. Conversely, the face_recognition library, which excels at recognizing and manipulating faces in images, expects images in RGB (Red, Green, Blue) format. When the image data does not match the expected color format, the compute_face_descriptor() function raises a TypeError.

Our Solution

To fix this issue, we modified how we convert the image from OpenCV’s BGR format to the RGB format required by the face_recognition library. Previously, we used a slicing technique to reverse the color channels:

# Incorrect color conversion
rgb_frame = frame[:, :, ::-1]

However, this approach led to the aforementioned TypeError. We realized that using OpenCV’s cvtColor function provides a more reliable conversion:

# Correct color conversion
rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

The cvtColor function is explicitly designed for color space conversions, ensuring the image data is properly formatted for the face_recognition library.


With this small but crucial change, we eliminated the TypeError and improved the robustness of our facial recognition pipeline. This example serves as a reminder that understanding the intricacies of our libraries is essential for creating seamless and error-free integrations.For developers facing similar issues, consider the specific requirements of each function and library you’re using. It’s these details that can make or break your application.

Solving the ‘numpy._DTypeMeta’ Subscriptable Error in Python

Are you struggling with an unexpected TypeError when working with the OpenCV library in Python? If you’ve encountered the error message TypeError: ‘numpy._DTypeMeta’ object is not subscriptable and wonder what went wrong, you’re not alone. This error can be a significant roadblock when you’re trying to import the cv2 module, which is essential for computer vision tasks.

The root of this problem lies in a compatibility issue between numpy and other packages that rely on it, such as OpenCV. The ‘numpy._DTypeMeta’ object is a part of numpy’s core system, and when it’s not subscriptable, it suggests that the version of numpy you’re using is not playing nicely with the version of OpenCV.

Fortunately, the fix for this is often straightforward. By updating numpy to the latest version, you ensure that all the subscriptable features OpenCV needs are available. Here’s the magic command that usually does the trick:

pip install -U numpy;

This command asks pip, Python’s package installer, to upgrade numpy to the latest version available. The -U flag is short for --upgrade, which tells pip to overwrite the current version with the newest one.

Why does updating numpy work? It’s because newer versions of libraries like OpenCV often require the latest features and bug fixes from their dependencies. An outdated numpy package might not have the necessary functionality, leading to errors like the one mentioned.

So next time you face this TypeError, remember that a quick update of numpy could save the day. Keep your packages up to date, and you’ll minimize these sorts of issues significantly.

Remember, it’s good practice to keep your development environment updated. But always back up your work before doing so, as sometimes new versions can introduce their own issues. Happy coding!