The Road to >99% Precision: Overcoming Challenges in Cell Phone Detection Accuracy

Using your Cell phone while driving makes accidents nearly four times more likely. Now, detecting cell phone usage is not just a 'nice-to-have' feature, but an essential one. Fleets are actively seeking solutions to curb this risky behavior, and RideView's real-time alerts are the answer. Our AI team at LightMetrics has cracked the code with a distraction algorithm boasting over 99% precision. False alerts? Rare as they come. We've tackled challenges like tricky lighting and varied hand positions, all to ensure near-perfect accuracy without compromise. This is how we are making roads safer, one alert at a time.

Imagine you are cruising along a highway. The sun is shining down nicely, and the far horizon is visible as a shimmer. The powerful car you are driving is hitting its stride on the road. Suddenly, your cell phone rings—it's your office. You have the option of pulling over to the side and taking the call or, instead, continuing to drive and taking the call. Many people choose the latter option to save a few minutes. However, this decision dramatically increases the risk of accidents, almost fourfold. When using a cellphone while driving, your attention on the road decreases, and the reaction time needed for unexpected events increases. Talking on the cellphone while driving is a major contributor to road fatalities, as indicated by various reports from NHTSA.

Detecting cell phone usage using the RideView platform has now become a major feature for most of our partners. A year ago, many who considered this a nice-to-have feature now realize that calling and texting while driving is dangerous and they have no visibility into it. Fleets are actively looking to reduce the risk from cell phone usage with real-time alerts when drivers use the phone while driving. The cell phone detection in the RideView platform accomplishes precisely that. It gives real-time alerts to the driver to make the driver aware of the risky behavior and repeat behavior is escalated to the fleet owners and managers which enables them to initiate coaching sessions with the driver to address this risky behavior.

The AI team at LightMetrics has managed to push the bar in developing the cell phone distraction algorithm, with >99% precision. We have ensured that false alerts are very, very rare, thus leading to very high confidence from all stakeholders - drivers and fleet managers. The path to achieving over 99% precision has been challenging, particularly in maintaining an exceptionally low false positive rate while also achieving an exceptional true positive rate. Some of the challenges that the team faced when developing a solution that works at scale were

  1. Hands looking as if they are holding a phone
  2. Challenging lighting in the vehicle causes false positives
  3. Real-world data at scale for the different corners

Here is a brief account of how we achieved near-perfect precision without sacrificing recall:

Hand shapes that look very similar to calling cases

Look at the following set of photos, at first glance it looks almost like the person is talking on the phone but in actuality he/she is just keeping their hands in positions that look very similar to calling but is not holding any phone in hand.

The true calling cases with a phone in hand are below

As it can be observed the object in the hand is usually black and is barely visible at times due to blending with the background. The neural network model by default starts to focus on the hand/finger positions as a dominant feature and in many cases starts giving out cell phone alerts even when there is no object in the hand. Teaching the neural network to start focusing on the object in hand became a major focus when designing our training experiments.

Finally, an innovative variation of the popular contrastive loss techniques helped the neural network to give less weightage to the hand features and more weightage to the cellphone object in hand.

Different Lighting conditions that result in false positives

The neural network model would have been trained with a set of training data but in the field, it can encounter a completely different kind of data based on reflections or different light sources. This can result in making the model give wrong predictions. Some examples of the same can be seen below.

This was addressed by quantifying how much the data is out of distribution, and therefore likely to go wrong. Leveraging the research on Bayesian and other techniques of quantifying uncertainty, the team was able to accurately flag the cases where the neural net was likely to go wrong, without affecting the normal cases.

Getting enough data that covers all kinds of cases

When the solution is used by hundreds of thousands of users, the variation in mounting, appearances, etc are going to be huge. It is important that the model generalizes across all such variations and performs optimally. When the LM team started work on this project, we relied on an external dataset for the task but it became clear that the domain shift in that data and the data captured by our cameras was significant. Trying out different domain adaptation techniques helped us improve the performance of the models but the real improvement in performance finally came when we shifted the model training away from the external dataset with the training set comprising data from our cameras only. Initially, we thought it would affect the generalisability of the model but making the model focus on large variations of the same camera data helped to push the precision / recall boundaries beyond the limits. Having this large corpus of data helped in amplifying all the previous techniques that were used relating to architecture, loss function, and uncertainty estimation.

In conclusion, detecting small objects such as cell phones is not a very easy task to achieve with exceptional accuracy. For edge AI on affordable hardware, efficient inference is key and that makes this already challenging task more challenging. Training neural networks in a way that focuses only on the absolute essential is crucial for success in computationally constrained applications.