

Joint ICTP-IAEA School on FPGA-based SoC and its application to nuclear and scientific Instrumentation Workshop

# Computer Vision with SoC

Sawal Hamid Md Ali

Universiti Kebangsaan Malaysia

Multidisciplinary Lab, ICTP

21 November 2022



- Image acquisition with FPGA/MCU
- Image recognition on mobile phone
- Vision processing on FPGA/SoC with Matlab
- Computer vision for Advance Driver Assistance System (ADAS)

# Image acquisition using FPGA

- Automatic Ingestion Monitoring System
  - Chewing Detection
  - Food image acquisition
  - Calorie consumption estimation
- Chewing Detection
  - Piezoresistive sensor (jaw movement sensor)
  - Chewing signal processing
- Food image acquisition
  - Capture and store image on SD Card for post-analysis
- Calorie consumption estimation
  - Images sent to nutritionist for calorie validation

Automatic Ingestion Monitoring – University of Alabama



Source: Edward Sazonov, University of Alabama

# Food Image Acquisition

- Image sensor to be attached in a wearable device
- System to continuously capture image at some interval (buffering images)
- Images to be stored in the SD Card once food intake is detected
- System should be small and low power

# Platform selection based on benchmarking

- Comparison was made with several different platforms
  - High performance ARM MCU with image acquisition capability
  - Low performance ARM MCU
  - ARM + FPGA
- Low power mode (Sleep, Standby etc) being used for power optimization
  - ARM MCU has a variety of low power operations
- Each peripherals being measured for power consumption





# Current consumption benchmarking

### STM32F407 (Cortex M4)

STM32F407 (Running at 168MHz)

|                                                      | Cur                    | rent Consumpt | ion                  |
|------------------------------------------------------|------------------------|---------------|----------------------|
| Operation                                            | Standby                | Run mode      | Sleep                |
| MCU Only (all periph off) Measured                   | ЗuА                    | 40mA          | 17mA                 |
| MCU + DCMI - Measured                                | ЗuА                    | 41mA          | 18mA                 |
| MCU + SDIO - Measured                                | ЗuА                    | 41mA          | 18mA                 |
| MCU + SDCard (write) - Measured                      | N/A                    | 42mA          | N/A                  |
| MCU(DCMI,SDIO) + CameraOV9650 +<br>SDCard - Measured | 0.54mA (No<br>capture) | 70mA          | 22mA (No<br>capture) |
| MCU + ADC+DMA (No camera, No<br>SDCard) - Measured   | 3uA                    | 45mA          | 28mA                 |
| Camera OV9650 (Datasheet)                            | 16uA                   | 27mA          | N/A                  |

### STM32L053 (Cortex M0)

|                                       | Cur                  | rent Consumpt               | tion                     |
|---------------------------------------|----------------------|-----------------------------|--------------------------|
| Operation                             |                      | LP Run Mode<br>(Clk: 32KHz) | LP Sleep<br>(Clk: 32KHz) |
| MCU Only (all periph OFF) - Datasheet | 7mA<br>(218uA/MHz)   | 22uA                        | 4.7uA                    |
| MCU Only (all periph OFF) - Measured  | N/A                  | 20.8uA                      | 3.7uA                    |
| MCU Only (all periph ON) - Datasheet  | 9mA<br>(280uA/MHz)   | 28uA                        | 6uA                      |
| MCU + ADC - Datasheet                 | 446uA<br>(Clk: 2MHz) | 22.2uA                      | 4.8uA                    |
| MCU + ADC - Measured                  | 500uA<br>(Clk: 2MHz) | N/A                         | N/A                      |
| MCU + DMA - Datasheet                 | 7.3mA                | 22.3uA                      | 4.9uA                    |
| MCU + USART - Datasheet               | 7.4mA                | 22.4uA                      | 5uA                      |

## Image acquisition – MCU + FPGA



# Image acquisition approach



- FPGA in freeze mode
- Wakeup at every 5s by MCU because using single shot mode.
- Image is written to the SRAM at every 5s as a queue of images.
- During writing to SRAM, fpga may only active about 1ms out of 5s.
- Image (from SRAM) is only transferred to the SDCARD only when initiated by MCU. It can be done outside of the food intake episode.
- FPGA will be active again during writing to SDCARD for about 2s.

## Experimental setup





Automatic food intake system – new prototype

- Contactless chewing detection
- Automatic food volume and calorie estimation
- Automatic food recognition (deep learning)

# Vision processing on FPGA/Soc with Matlab









© 2020 MathWorks

MathWorks<sup>®</sup>



© 2020 MathWorks



#### Iterate and Converge on Deep Learning FPGA Deployment from MATLAB





| HDL Workflow Advisor - LaneDetectionHardware/HDLLaneDetector                                                                                                                                                                            |                                                                                                                                                                                                                                                                                                                                                                                               | x |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---|
| File Edit Run Help                                                                                                                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                               |   |
| Find: • 🔷 🏟                                                                                                                                                                                                                             |                                                                                                                                                                                                                                                                                                                                                                                               |   |
| <ul> <li>IDL Workflow Advisor</li> <li>I. Set Target</li> <li>1.1. Set Target Levice and Synthesis Tool</li> <li>1.2. Set Target Interface</li> <li>2. Prepare Model For HDL Code Generation</li> <li>3. HDL Code Generation</li> </ul> | J. Set Target Device and Synthesis Tool         Analysis (^Triggers Update Diagram)         Set Target Device and Synthesis Tool for HDL code generation         Input Parameters         Target platform:         Generic Xilinx Platform         Synthesis tool:         Xilinx Vivado         Family:         Zynq         Package:         fbg576         Project folder:         hdl_prj |   |
| Camera<br>In<br>External<br>Memory<br>Buffer<br>(VDMA)<br>XI4-<br>Stream<br>Video<br>Module                                                                                                                                             | Pixels       FPGA Lane       Left Lane       AXI4-Lite       Software on         Signals       FPGA Lane       Left Lane       AXI4-Lite       Software on         Ready       Ready       Environment       AXI4-Lite       Software on                                                                                                                                                      |   |
|                                                                                                                                                                                                                                         | Generated HDL IP core                                                                                                                                                                                                                                                                                                                                                                         |   |

# Computer vision for Advance Driver Assistance System (ADAS)

FSSI

# Object Detection

- Object detection models are used to determine the object present in the image
- It will draw the bounding boxes around the detected objects
- Classification is used to identify the object's class inside the bounding box
- Example models are RCNN, Fast-RCNN, YOLO, SSD, etc





# YOLO models (You Only Look Once)

- Splitting the input image into a grid of cells
- Each cell is responsible to predict a bounding box if the centre of an object falls into a grid cell
- Each grid cell predicts bounding boxes, confidence scores for those boxes and conditional class probabilities

Solution and advantages:

- Simpler network structure
- Faster, even with real-time property; able to process realtime streaming video with lower latency
- Maintaining a proper accuracy range

### Hardware Setup



AI Accelerator (NCS2)

Raspberry Pi Board with PiCam (edge device)

# Compilation Workflow





# Benchmark Analysis and Inference Performance

| Device                            | Price                | FPS                                  |
|-----------------------------------|----------------------|--------------------------------------|
| Intel Core i5-9300H<br>CPU        | High                 | 27.2                                 |
| Intel UHD Graphics<br>630         | High                 | 46.6                                 |
| Dell Laptop (edge) +<br>NCS2      | Average              | 20.7                                 |
| RPi 3B + NCS2                     | Low                  | 3.2                                  |
|                                   |                      |                                      |
| Device                            | Latency (ms)         | Confidence Percentage<br>(class car) |
| Device<br>Intel Core i5-9300H CPU | Latency (ms)<br>36.7 | _                                    |
|                                   |                      | (class car)                          |
| Intel Core i5-9300H CPU           | 36.7                 | (class car)<br>99.4                  |

# Thank You

Sawal Hamid Md Ali Department of Electrical, Electronic and Systems Engineering Faculty of Engineering and Built Envrionment University Kebangsaan Malaysia

> <u>sawal@ukm.edu.my</u> +6012-2592475

