In this technical post, we’ll dissect a Python script integrating several libraries and techniques for detecting faces and people in video footage. This script is an excellent example of how diverse computer vision tools can be merged to produce a robust solution for image analysis.
# import the necessary packages
import numpy as np
import cv2
import sys
import os
from datetime import datetime
import face_recognition
import dlib
inputVideo = sys.argv[1];
basenameVideo = os.path.basename(inputVideo);
outputDirectory = sys.argv[2];
datetimeNow = datetime.now().strftime("%m-%d-%Y %H:%M:%S");
#Creating the folder to save the output
videoOutputDirectory = outputDirectory + '/' + datetimeNow + '/' + basenameVideo + '/';
os.makedirs(videoOutputDirectory);
##METHOD 1 -- START
# initialize the HOG descriptor/person detector
hog = cv2.HOGDescriptor()
hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())
##METHOD 1 -- STOP
##METHOD 2 -- START
faceCascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_frontalface_default.xml");
##METHOD 2 -- STOP
##METHOD 5 -- START
# Initialize face detector, facial landmarks detector and face recognizer
faceDetector = dlib.get_frontal_face_detector()
##METHOD 5 -- STOP
cv2.startWindowThread()
## open webcam video stream
#cap = cv2.VideoCapture(0)
# create a VideoCapture object
cap = cv2.VideoCapture(inputVideo)
frameIndex = 0;
while(True):
# Capture frame-by-frame
ret, frame = cap.read()
# using a greyscale picture, also for faster detection
gray = cv2.cvtColor(frame, cv2.COLOR_RGB2GRAY)
##METHOD 1 -- START
if True:
# detect people in the image
persons, weights = hog.detectMultiScale(frame, winStride=(8,8) )
persons = np.array([[x, y, x + w, y + h] for (x, y, w, h) in persons])
print("[INFO][1][{0}] Found {1} Persons.".format(frameIndex, len(persons)));
for (left, top, right, bottom) in persons:
print("A person is located at pixel location Top: {}, Left: {}, Bottom: {}, Right: {}".format(top, left, bottom, right))
match_image = frame[top:bottom, left:right];
cv2.imwrite(videoOutputDirectory + str(frameIndex) + '_(' + str(top) + ',' + str(right) + ')(' + str(bottom) + ',' + str(left) + ')_persons_M1.jpg', match_image);
##METHOD 1 -- STOP
##METHOD 2 -- START
if True:
faces = faceCascade.detectMultiScale(
gray,
scaleFactor=1.05,
minNeighbors=7,
minSize=(50, 50)
);
faces = np.array([[x, y, x + w, y + h] for (x, y, w, h) in faces])
print("[INFO][2][{0}] Found {1} Faces.".format(frameIndex, len(faces)));
for (left, top, right, bottom) in faces:
print("A face is located at pixel location Top: {}, Left: {}, Bottom: {}, Right: {}".format(top, left, bottom, right))
match_image = frame[top:bottom, left:right];
cv2.imwrite(videoOutputDirectory + str(frameIndex) + '_(' + str(top) + ',' + str(right) + ')(' + str(bottom) + ',' + str(left) + ')_faces_M2.jpg', match_image);
##METHOD 2 -- STOP
##METHOD 3 -- START
if True:
faces = face_recognition.face_locations(frame);
print("[INFO][3][{0}] Found {1} Faces.".format(frameIndex, len(faces)));
for (top, right, bottom, left) in faces:
#print("[INFO] Object found. Saving locally.");
print("A face is located at pixel location Top: {}, Left: {}, Bottom: {}, Right: {}".format(top, left, bottom, right))
match_image = frame[top:bottom, left:right];
cv2.imwrite(videoOutputDirectory + str(frameIndex) + '_(' + str(top) + ',' + str(right) + ')(' + str(bottom) + ',' + str(left) + ')_faces_M3.jpg', match_image);
##METHOD 3 -- STOP
##METHOD 4 -- START
if True:
faces = face_recognition.face_locations(frame, model="cnn");
print("[INFO][4][{0}] Found {1} Faces.".format(frameIndex, len(faces)));
for (top, right, bottom, left) in faces:
#print("[INFO] Object found. Saving locally.");
print("A face is located at pixel location Top: {}, Left: {}, Bottom: {}, Right: {}".format(top, left, bottom, right))
match_image = frame[top:bottom, left:right];
cv2.imwrite(videoOutputDirectory + str(frameIndex) + '_(' + str(top) + ',' + str(right) + ')(' + str(bottom) + ',' + str(left) + ')_faces_M4.jpg', match_image);
##METHOD 4 -- STOP
##METHOD 5 -- START
if True:
# detect faces in image
faces = faceDetector(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
print("[INFO][5][{0}] Found {1} Faces.".format(frameIndex, len(faces)));
# Now process each face we found
for k, face in enumerate(faces):
top = face.top()
bottom = face.bottom()
left = face.left()
right = face.right()
print("A face is located at pixel location Top: {}, Left: {}, Bottom: {}, Right: {}".format(top, left, bottom, right))
match_image = frame[top:bottom, left:right];
cv2.imwrite(videoOutputDirectory + str(frameIndex) + '_(' + str(top) + ',' + str(right) + ')(' + str(bottom) + ',' + str(left) + ')_faces_M5.jpg', match_image);
##METHOD 5 -- STOP
frameIndex += 1
# When everything done, release the capture
cap.release()
Core Libraries and Initial Setup
The script begins by importing several critical libraries:
- numpy: Essential for numerical computations in Python.
- cv2 (OpenCV): A cornerstone in computer vision projects.
- sys and os: For system-level operations and file management.
- datetime: To handle date and time operations, crucial for timestamping.
- face_recognition: A high-level facial recognition library.
- dlib: A toolkit renowned for its machine learning and image processing capabilities.
Video File Handling
The script processes a video file whose path is passed as a command-line argument. It extracts the file name and prepares a unique output directory using the current date and time. This approach ensures that outputs from different runs are stored separately, avoiding overwrites and confusion.
Methodological Overview
The script showcases five distinct methodologies for detecting faces and people:
- HOG Person Detector with OpenCV: Uses the Histogram of Oriented Gradients (HOG) descriptor combined with a Support Vector Machine (SVM) for detecting people.
- Haar Cascade for Face Detection: Employs OpenCV’s Haar Cascade classifier, a widely-used method for face detection.
- Face Detection Using
face_recognition
(Method 1): Implements the face_recognition
library’s default face detection technique.
- CNN-Based Face Detection Using
face_recognition
(Method 2): Utilizes a Convolutional Neural Network (CNN) model within the face_recognition
library for face detection.
- Dlib’s Frontal Face Detector: Applies Dlib’s frontal face detector, effective for detecting faces oriented towards the camera.
Processing Workflow
The script processes the video on a frame-by-frame basis. For each frame, it:
- Converts the frame to grayscale when necessary. This conversion can speed up detection in methods that don’t require color information.
- Sequentially applies each of the five detection methods.
- For each detected face or person, it outputs the coordinates and saves a cropped image of the detection to the output directory.
Iterative Frame Analysis
The script employs a loop to process each frame of the video. It includes a frame index to keep track of the number of frames processed, which is particularly useful for debugging and analysis purposes.
Resource Management
After processing the entire video, the script releases the video capture object, ensuring that system resources are appropriately freed.
Key Takeaways
This script is a rich demonstration of integrating various face and person detection techniques in a single Python application. It highlights the versatility and power of Python in handling complex tasks like video processing and computer vision. This analysis serves as a guide for developers and enthusiasts looking to understand or venture into the realm of image processing with Python.