14. Juli 2023
In the Smart Media Technology course of my media technology studies, I was significantly involved in the development of an interactive “rock, paper, scissors” game. Our group’s goal was to use a convolutional neural network (CNN) to analyze the users’ webcam images and recognize their hand gestures (scissors, rock, or paper) in order to compete against the computer.
The entire implementation was done in Python on the Google Colab platform. We created our own dataset with about 200 images for each hand gesture to train the CNN. In doing so, we refined a pre-trained ResNet50 model to suit our requirements. We used PyTorch as the framework, while Ray Tune and Tensorboard were used for training and visualizing the results.
Our game can be played in a specially developed Jupyter notebook on Google Colab. The notebook loads the latest model from a Git repository and performs image recognition. To improve gesture recognition, the webcam image is cropped to the recognized hand using Mediapipe before the actual prediction by our CNN. The user interface was designed using IPython widgets.
The project gave me practical experience in machine learning and artificial intelligence for interactive applications. It provided interesting insights into the challenges and possibilities of image recognition.