Technical Deep Dive: Multi-Method Face and Person Detection in Python
In this technical post, we’ll dissect a Python script integrating several libraries and techniques for detecting faces and people in video footage. This script is an excellent example of how diverse computer vision tools can be merged to produce a robust solution for image analysis.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 | # import the necessary packages import numpy as np import cv2 import sys import os from datetime import datetime import face_recognition import dlib inputVideo = sys.argv[ 1 ]; basenameVideo = os.path.basename(inputVideo); outputDirectory = sys.argv[ 2 ]; datetimeNow = datetime.now().strftime( "%m-%d-%Y %H:%M:%S" ); #Creating the folder to save the output videoOutputDirectory = outputDirectory + '/' + datetimeNow + '/' + basenameVideo + '/' ; os.makedirs(videoOutputDirectory); ##METHOD 1 -- START # initialize the HOG descriptor/person detector hog = cv2.HOGDescriptor() hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector()) ##METHOD 1 -- STOP ##METHOD 2 -- START faceCascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_frontalface_default.xml" ); ##METHOD 2 -- STOP ##METHOD 5 -- START # Initialize face detector, facial landmarks detector and face recognizer faceDetector = dlib.get_frontal_face_detector() ##METHOD 5 -- STOP cv2.startWindowThread() ## open webcam video stream #cap = cv2.VideoCapture(0) # create a VideoCapture object cap = cv2.VideoCapture(inputVideo) frameIndex = 0 ; while ( True ): # Capture frame-by-frame ret, frame = cap.read() # using a greyscale picture, also for faster detection gray = cv2.cvtColor(frame, cv2.COLOR_RGB2GRAY) ##METHOD 1 -- START if True : # detect people in the image persons, weights = hog.detectMultiScale(frame, winStride = ( 8 , 8 ) ) persons = np.array([[x, y, x + w, y + h] for (x, y, w, h) in persons]) print ( "[INFO][1][{0}] Found {1} Persons." . format (frameIndex, len (persons))); for (left, top, right, bottom) in persons: print ( "A person is located at pixel location Top: {}, Left: {}, Bottom: {}, Right: {}" . format (top, left, bottom, right)) match_image = frame[top:bottom, left:right]; cv2.imwrite(videoOutputDirectory + str (frameIndex) + '_(' + str (top) + ',' + str (right) + ')(' + str (bottom) + ',' + str (left) + ')_persons_M1.jpg' , match_image); ##METHOD 1 -- STOP ##METHOD 2 -- START if True : faces = faceCascade.detectMultiScale( gray, scaleFactor = 1.05 , minNeighbors = 7 , minSize = ( 50 , 50 ) ); faces = np.array([[x, y, x + w, y + h] for (x, y, w, h) in faces]) print ( "[INFO][2][{0}] Found {1} Faces." . format (frameIndex, len (faces))); for (left, top, right, bottom) in faces: print ( "A face is located at pixel location Top: {}, Left: {}, Bottom: {}, Right: {}" . format (top, left, bottom, right)) match_image = frame[top:bottom, left:right]; cv2.imwrite(videoOutputDirectory + str (frameIndex) + '_(' + str (top) + ',' + str (right) + ')(' + str (bottom) + ',' + str (left) + ')_faces_M2.jpg' , match_image); ##METHOD 2 -- STOP ##METHOD 3 -- START if True : faces = face_recognition.face_locations(frame); print ( "[INFO][3][{0}] Found {1} Faces." . format (frameIndex, len (faces))); for (top, right, bottom, left) in faces: #print("[INFO] Object found. Saving locally."); print ( "A face is located at pixel location Top: {}, Left: {}, Bottom: {}, Right: {}" . format (top, left, bottom, right)) match_image = frame[top:bottom, left:right]; cv2.imwrite(videoOutputDirectory + str (frameIndex) + '_(' + str (top) + ',' + str (right) + ')(' + str (bottom) + ',' + str (left) + ')_faces_M3.jpg' , match_image); ##METHOD 3 -- STOP ##METHOD 4 -- START if True : faces = face_recognition.face_locations(frame, model = "cnn" ); print ( "[INFO][4][{0}] Found {1} Faces." . format (frameIndex, len (faces))); for (top, right, bottom, left) in faces: #print("[INFO] Object found. Saving locally."); print ( "A face is located at pixel location Top: {}, Left: {}, Bottom: {}, Right: {}" . format (top, left, bottom, right)) match_image = frame[top:bottom, left:right]; cv2.imwrite(videoOutputDirectory + str (frameIndex) + '_(' + str (top) + ',' + str (right) + ')(' + str (bottom) + ',' + str (left) + ')_faces_M4.jpg' , match_image); ##METHOD 4 -- STOP ##METHOD 5 -- START if True : # detect faces in image faces = faceDetector(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)) print ( "[INFO][5][{0}] Found {1} Faces." . format (frameIndex, len (faces))); # Now process each face we found for k, face in enumerate (faces): top = face.top() bottom = face.bottom() left = face.left() right = face.right() print ( "A face is located at pixel location Top: {}, Left: {}, Bottom: {}, Right: {}" . format (top, left, bottom, right)) match_image = frame[top:bottom, left:right]; cv2.imwrite(videoOutputDirectory + str (frameIndex) + '_(' + str (top) + ',' + str (right) + ')(' + str (bottom) + ',' + str (left) + ')_faces_M5.jpg' , match_image); ##METHOD 5 -- STOP frameIndex + = 1 # When everything done, release the capture cap.release() |
Core Libraries and Initial Setup
The script begins by importing several critical libraries:
- numpy: Essential for numerical computations in Python.
- cv2 (OpenCV): A cornerstone in computer vision projects.
- sys and os: For system-level operations and file management.
- datetime: To handle date and time operations, crucial for timestamping.
- face_recognition: A high-level facial recognition library.
- dlib: A toolkit renowned for its machine learning and image processing capabilities.
Video File Handling
The script processes a video file whose path is passed as a command-line argument. It extracts the file name and prepares a unique output directory using the current date and time. This approach ensures that outputs from different runs are stored separately, avoiding overwrites and confusion.
Methodological Overview
The script showcases five distinct methodologies for detecting faces and people:
- HOG Person Detector with OpenCV: Uses the Histogram of Oriented Gradients (HOG) descriptor combined with a Support Vector Machine (SVM) for detecting people.
- Haar Cascade for Face Detection: Employs OpenCV’s Haar Cascade classifier, a widely-used method for face detection.
- Face Detection Using
face_recognition
(Method 1): Implements theface_recognition
library’s default face detection technique. - CNN-Based Face Detection Using
face_recognition
(Method 2): Utilizes a Convolutional Neural Network (CNN) model within theface_recognition
library for face detection. - Dlib’s Frontal Face Detector: Applies Dlib’s frontal face detector, effective for detecting faces oriented towards the camera.
Processing Workflow
The script processes the video on a frame-by-frame basis. For each frame, it:
- Converts the frame to grayscale when necessary. This conversion can speed up detection in methods that don’t require color information.
- Sequentially applies each of the five detection methods.
- For each detected face or person, it outputs the coordinates and saves a cropped image of the detection to the output directory.
Iterative Frame Analysis
The script employs a loop to process each frame of the video. It includes a frame index to keep track of the number of frames processed, which is particularly useful for debugging and analysis purposes.
Resource Management
After processing the entire video, the script releases the video capture object, ensuring that system resources are appropriately freed.
Key Takeaways
This script is a rich demonstration of integrating various face and person detection techniques in a single Python application. It highlights the versatility and power of Python in handling complex tasks like video processing and computer vision. This analysis serves as a guide for developers and enthusiasts looking to understand or venture into the realm of image processing with Python.