Technical Deep Dive: Multi-Method Face and Person Detection in Python

Tux

2 έτη ago

In this technical post, we’ll dissect a Python script integrating several libraries and techniques for detecting faces and people in video footage. This script is an excellent example of how diverse computer vision tools can be merged to produce a robust solution for image analysis.

# import the necessary packages
import numpy as np
import cv2
import sys
import os
from datetime import datetime
import face_recognition
import dlib

inputVideo = sys.argv[1];
basenameVideo = os.path.basename(inputVideo);
outputDirectory = sys.argv[2];
datetimeNow = datetime.now().strftime("%m-%d-%Y %H:%M:%S");

#Creating the folder to save the output
videoOutputDirectory = outputDirectory + '/' + datetimeNow + '/' + basenameVideo + '/';
os.makedirs(videoOutputDirectory);

##METHOD 1 -- START
# initialize the HOG descriptor/person detector
hog = cv2.HOGDescriptor()
hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())
##METHOD 1 -- STOP

##METHOD 2 -- START
faceCascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_frontalface_default.xml");
##METHOD 2 -- STOP

##METHOD 5 -- START
# Initialize face detector, facial landmarks detector and face recognizer
faceDetector = dlib.get_frontal_face_detector()
##METHOD 5 -- STOP

cv2.startWindowThread()

## open webcam video stream
#cap = cv2.VideoCapture(0)
# create a VideoCapture object
cap = cv2.VideoCapture(inputVideo)

frameIndex = 0;

while(True):
	# Capture frame-by-frame
	ret, frame = cap.read()

	# using a greyscale picture, also for faster detection
	gray = cv2.cvtColor(frame, cv2.COLOR_RGB2GRAY)

##METHOD 1 -- START
	if True:
		# detect people in the image
		persons, weights = hog.detectMultiScale(frame, winStride=(8,8) )

		persons = np.array([[x, y, x + w, y + h] for (x, y, w, h) in persons])
		print("[INFO][1][{0}] Found {1} Persons.".format(frameIndex, len(persons)));

		for (left, top, right, bottom) in persons:
			print("A person is located at pixel location Top: {}, Left: {}, Bottom: {}, Right: {}".format(top, left, bottom, right))
			match_image = frame[top:bottom, left:right];
			cv2.imwrite(videoOutputDirectory + str(frameIndex) + '_(' + str(top) + ',' + str(right) + ')(' + str(bottom) + ',' + str(left) + ')_persons_M1.jpg', match_image);
##METHOD 1 -- STOP

##METHOD 2 -- START
	if True:
		faces = faceCascade.detectMultiScale(
			gray,
			scaleFactor=1.05,
			minNeighbors=7,
			minSize=(50, 50)
		);

		faces = np.array([[x, y, x + w, y + h] for (x, y, w, h) in faces])
		print("[INFO][2][{0}] Found {1} Faces.".format(frameIndex, len(faces)));

		for (left, top, right, bottom) in faces:
			print("A face is located at pixel location Top: {}, Left: {}, Bottom: {}, Right: {}".format(top, left, bottom, right))
			match_image = frame[top:bottom, left:right];
			cv2.imwrite(videoOutputDirectory + str(frameIndex) + '_(' + str(top) + ',' + str(right) + ')(' + str(bottom) + ',' + str(left) + ')_faces_M2.jpg', match_image);
##METHOD 2 -- STOP

##METHOD 3 -- START
	if True:
		faces = face_recognition.face_locations(frame);
		print("[INFO][3][{0}] Found {1} Faces.".format(frameIndex, len(faces)));

		for (top, right, bottom, left) in faces:
			#print("[INFO] Object found. Saving locally.");
			print("A face is located at pixel location Top: {}, Left: {}, Bottom: {}, Right: {}".format(top, left, bottom, right))
			match_image = frame[top:bottom, left:right];
			cv2.imwrite(videoOutputDirectory + str(frameIndex) + '_(' + str(top) + ',' + str(right) + ')(' + str(bottom) + ',' + str(left) + ')_faces_M3.jpg', match_image);
##METHOD 3 -- STOP

##METHOD 4 -- START
	if True:
		faces = face_recognition.face_locations(frame, model="cnn");
		print("[INFO][4][{0}] Found {1} Faces.".format(frameIndex, len(faces)));

		for (top, right, bottom, left) in faces:
			#print("[INFO] Object found. Saving locally.");
			print("A face is located at pixel location Top: {}, Left: {}, Bottom: {}, Right: {}".format(top, left, bottom, right))
			match_image = frame[top:bottom, left:right];
			cv2.imwrite(videoOutputDirectory + str(frameIndex) + '_(' + str(top) + ',' + str(right) + ')(' + str(bottom) + ',' + str(left) + ')_faces_M4.jpg', match_image);
##METHOD 4 -- STOP

##METHOD 5 -- START
	if True:
		# detect faces in image
		faces = faceDetector(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))

		print("[INFO][5][{0}] Found {1} Faces.".format(frameIndex, len(faces)));
		# Now process each face we found
		for k, face in enumerate(faces):
			top = face.top()
			bottom = face.bottom()
			left = face.left()
			right = face.right()
			print("A face is located at pixel location Top: {}, Left: {}, Bottom: {}, Right: {}".format(top, left, bottom, right))
			match_image = frame[top:bottom, left:right];
			cv2.imwrite(videoOutputDirectory + str(frameIndex) + '_(' + str(top) + ',' + str(right) + ')(' + str(bottom) + ',' + str(left) + ')_faces_M5.jpg', match_image);
##METHOD 5 -- STOP
	
	frameIndex += 1

# When everything done, release the capture
cap.release()

Core Libraries and Initial Setup

The script begins by importing several critical libraries:

numpy: Essential for numerical computations in Python.
cv2 (OpenCV): A cornerstone in computer vision projects.
sys and os: For system-level operations and file management.
datetime: To handle date and time operations, crucial for timestamping.
face_recognition: A high-level facial recognition library.
dlib: A toolkit renowned for its machine learning and image processing capabilities.

Video File Handling

The script processes a video file whose path is passed as a command-line argument. It extracts the file name and prepares a unique output directory using the current date and time. This approach ensures that outputs from different runs are stored separately, avoiding overwrites and confusion.

Methodological Overview

The script showcases five distinct methodologies for detecting faces and people:

HOG Person Detector with OpenCV: Uses the Histogram of Oriented Gradients (HOG) descriptor combined with a Support Vector Machine (SVM) for detecting people.
Haar Cascade for Face Detection: Employs OpenCV’s Haar Cascade classifier, a widely-used method for face detection.
Face Detection Using face_recognition (Method 1): Implements the face_recognition library’s default face detection technique.
CNN-Based Face Detection Using face_recognition (Method 2): Utilizes a Convolutional Neural Network (CNN) model within the face_recognition library for face detection.
Dlib’s Frontal Face Detector: Applies Dlib’s frontal face detector, effective for detecting faces oriented towards the camera.

Processing Workflow

The script processes the video on a frame-by-frame basis. For each frame, it:

Converts the frame to grayscale when necessary. This conversion can speed up detection in methods that don’t require color information.
Sequentially applies each of the five detection methods.
For each detected face or person, it outputs the coordinates and saves a cropped image of the detection to the output directory.

Iterative Frame Analysis

The script employs a loop to process each frame of the video. It includes a frame index to keep track of the number of frames processed, which is particularly useful for debugging and analysis purposes.

Resource Management

After processing the entire video, the script releases the video capture object, ensuring that system resources are appropriately freed.

Key Takeaways

This script is a rich demonstration of integrating various face and person detection techniques in a single Python application. It highlights the versatility and power of Python in handling complex tasks like video processing and computer vision. This analysis serves as a guide for developers and enthusiasts looking to understand or venture into the realm of image processing with Python.

This post is also available in: Αγγλικα