Building an efficient ‘hardware-aware’ object detection model in under 10 mins

Optimize model, compress model, search hyperparameters, pip package — all with just a few clicks.

Dr. Varshita Sher
12 min readNov 1, 2022

What are hardware-aware models?

Often times when data scientists think of an optimized model, they think in terms of one or the other metrics — such as accuracy, precision, recall, etc. In other words, efforts are made to ensure every last bit of accuracy is squeezed out of the training set by tuning the model architecture, applying regularization techniques, preprocessing datasets, or a combination of all of the above.

Experienced data scientists may take it one step further and examine an optimized model in terms of its inference speed by analyzing its computation complexity, often measured using FLOPs (floating point operations). The lower the FLOP, the better the inference efficiency (i.e. the amount of time it takes for the model to run inference on the incoming batch of data). However, merely improving FLOPs is not a guarantee that we would end up with a low-latency neural network because a previous study has shown models with similar FLOPs can have different inference speeds.

While these are perfectly good lenses to view an optimized model, there are other things equally (and possibly more) important. Imagine this for a moment, what happens if your super-awesome large language model (LLMs) suffers from low inference latency when deployed on an edge device? Or worse, can’t it even fit into the memory of the device at the time of deployment?

This is where hardware-aware models come into the picture!

Hardware aware models solve memory constraints at time of deployment. When designing optimal neural networks, they take into account both network architecture (i.e. # of traininable parametrs) and target hardware characteristics whilst guaranteeing inference efficiency.

The need for hardware-aware models is now more than ever, especially since there has been a massive influx of emerging smart devices that are equipped with very diverse processors, such as GPUs, VPUs, and various AI accelerators that have fundamentally different hardware designs. Given that the intent behind training any model is to deploy it in production, it becomes even more important to cater to different…



Dr. Varshita Sher

Senior Data Scientist | Explain like I am 5 | Oxford & SFU Alumni | | Top writer on Medium