CS474: Human Computer Interaction - Modalities - Eye Tracking
Activity Goals
The goals of this activity are:- To identify alternative modalities for human-computer interaction
- To write a program that uses eye tracking for engagement
- To identify signifiers and affordances for a given application and modality
Supplemental Reading
Feel free to visit these resources for supplemental background reading material.- Filonov, S. - Tracking your eyes with Python
- Argawal, V. - Real-time eye tracking using OpenCV and Dlib
- Haar Cascade Training Data
- dlib shape68 training files
- CMake dependency download
The Activity
Directions
Consider the activity models and answer the questions provided. First reflect on these questions on your own briefly, before discussing and comparing your thoughts with your group. Appoint one member of your group to discuss your findings with the class, and the rest of the group should help that member prepare their response. Answer each question individually from the activity, and compare with your group to prepare for our whole-class discussion. After class, think about the questions in the reflective prompt and respond to those individually in your notebook. Report out on areas of disagreement or items for which you and your group identified alternative approaches. Write down and report out questions you encountered along the way for group discussion.Model 1: Eye Tracking
Download the Visual Studio installer and install the "Desktop Development for C++" module.
Alternatively:
pip install cmake wheel dlib opencv-python face_recognition numpy
Alternatively:
git clone https://github.com/davisking/dlib.git && cd dlib && python setup.py install --user --no DLIB_GIF_SUPPORT
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import cv2 | |
import dlib | |
import numpy as np | |
def shape_to_np(shape, dtype="int"): | |
# initialize the list of (x, y)-coordinates | |
coords = np.zeros((68, 2), dtype=dtype) | |
# loop over the 68 facial landmarks and convert them | |
# to a 2-tuple of (x, y)-coordinates | |
for i in range(0, 68): | |
coords[i] = (shape.part(i).x, shape.part(i).y) | |
# return the list of (x, y)-coordinates | |
return coords | |
def eye_on_mask(mask, side): | |
points = [shape[i] for i in side] | |
points = np.array(points, dtype=np.int32) | |
mask = cv2.fillConvexPoly(mask, points, 255) | |
return mask | |
def contouring(thresh, mid, img, right=False): | |
cnts, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_NONE) | |
try: | |
cnt = max(cnts, key = cv2.contourArea) | |
M = cv2.moments(cnt) | |
cx = int(M['m10']/M['m00']) | |
cy = int(M['m01']/M['m00']) | |
if right: | |
cx += mid | |
cv2.circle(img, (cx, cy), 4, (0, 0, 255), 2) | |
except: | |
pass | |
detector = dlib.get_frontal_face_detector() | |
predictor = dlib.shape_predictor('shape_68.dat') | |
left = [36, 37, 38, 39, 40, 41] | |
right = [42, 43, 44, 45, 46, 47] | |
cap = cv2.VideoCapture(0) | |
ret, img = cap.read() | |
thresh = img.copy() | |
cv2.namedWindow('image') | |
kernel = np.ones((9, 9), np.uint8) | |
def nothing(x): | |
pass | |
cv2.createTrackbar('threshold', 'image', 0, 255, nothing) | |
while(True): | |
ret, img = cap.read() | |
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) | |
rects = detector(gray, 1) | |
for rect in rects: | |
shape = predictor(gray, rect) | |
shape = shape_to_np(shape) | |
mask = np.zeros(img.shape[:2], dtype=np.uint8) | |
mask = eye_on_mask(mask, left) | |
mask = eye_on_mask(mask, right) | |
mask = cv2.dilate(mask, kernel, 5) | |
eyes = cv2.bitwise_and(img, img, mask=mask) | |
mask = (eyes == [0, 0, 0]).all(axis=2) | |
eyes[mask] = [255, 255, 255] | |
mid = (shape[42][0] + shape[39][0]) // 2 | |
eyes_gray = cv2.cvtColor(eyes, cv2.COLOR_BGR2GRAY) | |
threshold = cv2.getTrackbarPos('threshold', 'image') | |
_, thresh = cv2.threshold(eyes_gray, threshold, 255, cv2.THRESH_BINARY) | |
thresh = cv2.erode(thresh, None, iterations=2) #1 | |
thresh = cv2.dilate(thresh, None, iterations=4) #2 | |
thresh = cv2.medianBlur(thresh, 3) #3 | |
thresh = cv2.bitwise_not(thresh) | |
contouring(thresh[:, 0:mid], mid, img) | |
contouring(thresh[:, mid:], mid, img, True) | |
# for (x, y) in shape[36:48]: | |
# cv2.circle(img, (x, y), 2, (255, 0, 0), -1) | |
# show the image with the face detections + facial landmarks | |
cv2.imshow('eyes', img) | |
cv2.imshow("image", thresh) | |
if cv2.waitKey(1) & 0xFF == ord('q'): | |
break | |
cap.release() | |
cv2.destroyAllWindows() | |
Questions
- What kinds of applications can you think of that would benefit from eye tracking?
- How might eye tracking enhance the user experience in applications that might not traditionally incorporate it? In particular, how might eye tracking applications assist disabled persons using software?
- What are the pros and cons of using a threshold-based detection strategy? How might you automatically calibrate such a system, and how might you allow it to adapt to changing conditions over time?