Example 3: Segment persons in webcam videos
Introduction
This tutorial is based on Example 2: Face Detection with OpenCV. You can re-use some of the scripts already developed in the other tutorial.
Steps to do
Add the macro module developed in the previous example to your workspace.
Open the internal network of the module via middle mouse button and right click on the tab of the workspace showing the internal network. Select Show Enclosing Folder.
The file browser opens showing the files of your macro module. Copy the *.mlab file somewhere you can remember.
Create the macro module
Open the the Project Wizard via [ File → Run Project Wizard ] and select Macro Module. Click Run Wizard.
Define the module properties as shown below, though you can chose your own name. Click Next.
Define the module properties and select the copied *.mlab file. Make sure to add a Python file and click Next.
Leave the module field reference as is and click Create. Close Project Wizard and select [ Extras → Reload Module Database (Clear Cache) ].
Re-use script and Python code
Open the script file of the WebCamTest
module and copy the contents to your new PyTorch module. The result should be something like this:
PyTorchSegmentationExample.script
Interface {
Inputs {}
Outputs {}
Parameters {}
}
Commands {
source = $(LOCAL)/PyTorchSegmentationExample.py
}
Window {
h = 500
w = 500
destroyedCommand = releaseCamera
initCommand = setupInterface
Vertical {
Horizontal {
Button {
title = Start
command = startCapture
}
Button {
title = Pause
command = stopCapture
}
}
Horizontal {
expandX = True
expandY = True
Viewer View2D.self {
type = SoRenderArea
}
}
}
}
If you open the panel of your new module, you can see the UI elements added. You cannot use the buttons, because they require the Python function called. Copy the Python code to your new module, too.
PyTorchSegmentationExample.py
# from mevis import *
import cv2
import OpenCVUtils
_interfaces = []
camera = None
face_cascade = cv2.CascadeClassifier('C:/tmp/haarcascade_frontalface_default.xml')
# Setup the interface for PythonImage module
def setupInterface():
global _interfaces
_interfaces = []
interface = ctx.module("PythonImage").call("getInterface")
_interfaces.append(interface)
# Grab image from camera and update
def grabImage():
_, img = camera.read()
updateImage(img)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray, 1.1, 4)
for (x, y, w, h) in faces:
cv2.rectangle(img, (x, y), (x+w, y+h), (255, 0, 0), 2)
# Display the output
cv2.imshow('img', img)
# Update image in interface
def updateImage(image):
_interfaces[0].setImage(OpenCVUtils.convertImageToML(image), minMaxValues = [0,255])
# Start capturing WebCam
def startCapture():
global camera
if not camera:
camera = cv2.VideoCapture(0)
ctx.callWithInterval(0.1, grabImage)
# Stop capturing WebCam
def stopCapture():
ctx.removeTimers()
# Release camera in the end
def releaseCamera(_):
global camera, _interfaces
ctx.removeTimers()
_interfaces = []
if camera:
camera.release()
camera = None
cv2.destroyAllWindows()
You should now have the complete functionality of the Example 2: Face Detection with OpenCV.
Adapt the network
For PyTorch, we require some additional modules in our network. Open the internal network of your module and add another PythonImage
module. Connect a Resample3D
and an ImagePropertyConvert
module.
In Resample3D
module, define the Image Size 693, 520, 1. Change VoxelSize for all dimensions to 1.
Open the Panel of the ImagePropertyConvert
module and check World Matrix.
Then add a SoView2DOverlayMPR
module and connect it to the ImagePropertyConvert
and the View2D
. Change Blend Mode to Blend, Alpha to something between 0 and 1 and define a color for the overlay.
Save the internal network.
Remove OpenCV specific code
We want to use PyTorch for segmentation, therefore you need to add all necessary imports.
PyTorchSegmentationExample.py
import cv2
import OpenCVUtils
from torchvision.io.image import read_image
from torchvision.models.segmentation import fcn_resnet50, FCN_ResNet50_Weights
from torchvision.transforms.functional import to_pil_image
import torch
Additionally remove the face_cascade parameter from your Python code. This was necessary for detecting a face in OpenCV and is not necessary anymore in PyTorch. The only parameters you need here are:
PyTorchSegmentationExample.py
_interfaces = []
camera = None
You can also remove the OpenCV specific lines in grabImage. The function should look like this now:
PyTorchSegmentationExample.py
# Grab image from camera and update
def grabImage():
_, img = camera.read()
updateImage(img)
Adapt the function releaseCamera and remove the line cv2.destroyAllWindows().
PyTorchSegmentationExample.py
# Release camera in the end
def releaseCamera(_):
global camera, _interfaces
ctx.removeTimers()
_interfaces = []
if camera:
camera.release()
camera = None
Implement PyTorch segmentation
The first thing we need is a function for starting the camera. It closes the previous segmentation and calls the existing function startCapture.
PyTorchSegmentationExample.py
def startWebcam():
# Close previous segmentation
ctx.module("PythonImage1").call("getInterface").unsetImage()
# Start webcam
startCapture()
As this function is not called in our User Interface, we need to update the *.script file. Change the first Button to below script:
PyTorchSegmentationExample.script
Button {
title = "Start Webcam"
command = startWebcam
}
Now your new function startWebcam is called whenever touching the left button. As a next step, define a Python function segmentSnapshot. We are using a pre-trained network from torchvision. In case you want to use other PyTorch possibilities, you can find lots of examples on their website.
PyTorchSegmentationExample.py
def segmentSnapshot():
# Step 1: Get image from webcam capture
stopCapture()
inImage = ctx.field("PythonImage.output0").image()
img = inImage.getTile((0,0,0,0,0,0), inImage.imageExtent())[0,0,:,0,:,:]
# Step 2: Convert image into torch tensor
img = torch.Tensor(img).type(torch.uint8)
# Step 3: Initialize model with the best available weights
weights = FCN_ResNet50_Weights.DEFAULT
model = fcn_resnet50(weights=weights)
model.eval()
# Step 4: Initialize the inference transforms
preprocess = weights.transforms()
# Step 5: Apply inference preprocessing transforms
batch = preprocess(img).unsqueeze(0)
# Step 6: Use the model to segment persons in snapshot
prediction = model(batch)["out"]
normalized_masks = prediction.softmax(dim=1)
class_to_idx = {cls: idx for (idx, cls) in enumerate(weights.meta["categories"])}
mask = normalized_masks[0, class_to_idx["person"]]
# Step 7: Set output image to module
interface = ctx.module("PythonImage1").call("getInterface")
interface.setImage(mask.detach().numpy())
# Step 8: Resize network output to original image size
origImageSize = inImage.imageExtent()
ctx.field("Resample3D.imageSize").value = (origImageSize[0], origImageSize[1], origImageSize[2])
In order to call this function, we have to change the command of the right button by adapting the *.script file.
PyTorchSegmentationExample.script
Button {
title = "Segment Snapshot"
command = segmentSnapshot
}
In step 5 we selected the class person. Whenever you click Segment Snapshot, PyTorch will try to segment all persons in the video.
The following classes are available:
- aeroplane
- bicycle
- bird
- boat
- bottle
- bus
- car
- cat
- chair
- cow
- diningtable
- dog
- horse
- motorbike
- person
- pottedplant
- sheep
- sofa
- train
- tvmonitor
The final result of the segmentation should be a semi-transparent red overlay of the persons segmented in your webcam stream.
Summary
- You can install additional Python AI packages by using the
PythonPip
module. - PyTorch trained networks can be used directly in MeVisLab via
PythonImage
module. - You can integrate AI algorithms into your MeVisLab networks.