Monocular Human Position Estimator
|
Applications of Computer Vision are being rapidly deployed across diverse fields from simple object detection to more advanced self-driving vehicles. It is a field of artificial intelligence that enables computers and systems to derive meaningful information from digital images, videos, and other visual inputs and take actions or make recommendations based on that information. Acme Robotics is looking to leverage the field of Computer Vision by integrating a perception software module into its existing robotic hardware system to detect and track humans around a robot.
The scope of this project is to develop a software which detects and tracks one or more humans around a robot using a monocular camera. This is accomplished by using the pre-trained HOG (Histogram of Oriented Gradient) descriptor provided by OpenCV. Along with this we also used an SVM (Support Vector Machine) detector from opencv to detect the location of humans in the camera frame. These features output the parameters of the bounding boxes around the detected humans. These parameters are then used to determine the 2d position (x,y) of the detected humans with respect to the camera. Using appropriate camera parameters and known values of the size of bounding boxes for an average human standing at a given distance from a camera, the value of depth (distance between the camera and human/s) is calculated by the method of equivalent ratio. These 3d-positions are then transformed to the robot’s frame to determine the position of humans with respect to the robot using an appropriate transformation matrix between the robot and camera frame. These 3d-positions are then fed into the tracker part of the software which assigns unique identity to the detected humans.
Detailed developer level documentation and installation steps have been created for new developers to contribute and use it with ease. This project ensures it’s correctness by testing every class on multiple unit tests created in the gtest suite. Latest build status and code coverage tests are performed in continuous integration with the help of travis and coverall. To ensure implementation accuracy each class has been developed by unit testing it on multiple test cases using the gtest suite.
Product backlog can be found in google sheet
Sprint planning notes can found in google docs
Doxygen generated documentation can be found here
Run the following command in folder's root directory to generate new documentation
This generates a index.html page in the build/coverage sub-directory that can be viewed locally in a web browser.
To run realtime tracking using camera, run following command
To run tracking on video, run following command by providing file path to the existing video file
Existing results can be found in /results/cppcheck_output.txt
Existing results can be found in /results/cpplint_output.txt
Checked for memory leaks uing valgrind. No memory leaks found. output can be found here
Performed tests on 15 different images. Out of them 10 images were of humans and remaining 5 were of other objects. Detector output showed 80% accuracy with 2 false negatives and 1 false positive. csv file