Chapter 4

Hardware selection

Hardware selection was guided by the given requirements so that the control-tower prototype would operate as flawlessly and reliably as possible. It was also taken into account that the control tower should be as easy as possible to reconfigure for different vessel types. The requirements set for the system were:

A similar system is intended to be applied to different vessel types in the future, so the components were chosen from among widely used and readily available devices.

Control unit / onboard computer

The control unit is a device that processes information from the cameras attached to it and forwards the results to the autopilot. The onboard computer sits in the vessel’s control tower, surrounded by three cameras. Analytical algorithms are applied to the information from the cameras (in this context, real-time video); their output – detected objects and their relative positions in the frame – is passed to the autopilot, providing additional information about the surrounding environment, on the basis of which the autopilot decides how the vessel should behave.

There are several single-board computers suitable for this task (Single-Board Computer, SBC [19 ], System on Module (SoM)), the most popular being the Nvidia Jetson series [20 ], which is specialised for AI, computer-vision, and machine-learning applications, and the Raspberry Pi [21 ]. The Raspberry Pi is ruled out here because the control-tower system involves working with four cameras in parallel (the Raspberry Pi has only one camera input). The task requires substantial compute, but the Raspberry Pi lacks both the necessary throughput and the requisite hardware (the device’s CPU does include an integrated graphics processor, but it is too weak for the task at hand). For parallel work, a discrete graphics processor (Graphics Processing Unit (GPU) [22 ]) together with specialised CUDA cores (CUDA Cores [23 ]) is particularly beneficial – one of the Nvidia Jetson computers’ main advantages.

The Jetson family contains four devices: Nano, TX2, Xavier NX, and AGX Xavier, the last of which is the most capable (see Table 1). The Nano is the most basic version, suitable for the simplest computer-vision applications; the TX2 is somewhat more capable in CPU and GPU performance; and the Xavier NX and AGX Xavier are aimed at industrial use cases, with six- and eight-core CPUs and Volta-architecture GPUs that also include matrix-operation accelerators (Tensor Cores) [24 ] (see Figure 5).

Parameters of the Jetson-series computers [24 ]
ParametersJetson NanoJetson TX2 Series (NX, TX2i)Jetson Xavier Series (NX, AGX)
AI Performance472 GFLOPs1.33 TFLOPs21/30 TOPs
GPU128-core NVIDIA Maxwell™256-core NVIDIA Pascal™384/512-core NVIDIA Volta™ with 48/64 Tensor Cores
CPUQuad-Core Arm Cortex-A57Dual-Core NVIDIA Denver and Quad-Core Arm Cortex-A576/8-core NVIDIA Carmel Arm
DL Accelerator


2x NVDLA v1
Vision Accelerator


2x PVA v1
Memory

4 GB

64-bit LPDDR4

4/8 GB

128-bit LPDDR4

8/16/32/64 GB

128-bit LPDDR4x

Storage16 GB16/32 GB16/35/64 GB
CSI Camera

Up to 4 cameras

(up to 18 Gbps)

Up to 5/6 cameras (up to 30 Gbps)

Up to 6 cameras

(up to 62 Gbps)

Given that information from three cameras must be analysed in parallel and with high accuracy, it makes the most sense to use a Xavier-series system. Compared with the other series, the Xavier-series computers offer their performance through a larger number of GPU cores, a newer architecture (Volta [25 ]) that includes matrix accelerators (Tensor Cores, which neural networks need for matrix computations), and Deep Learning Accelerators, which enable more efficient processing with multiple camera inputs.

Within this series both the NX and AGX Jetsons are available, and the best candidates have been chosen (see Table 2). The most suitable among these is the Jetson AGX Xavier, because the device must be able to run a large neural network and handle several operations at once (reading information from the cameras, processing it in the neural network, and presenting the processed information in the output video), which requires substantial computational resources.

ParametersJetson AGX XavierJetson Xavier NX 16GB
AI Performance32 TOPs21 TOPs
GPU512-core NVIDIA Volta™ GPU with 64 Tensor Cores384-core NVIDIA Volta™ GPU with 48 Tensor Cores
CPU8-core NVIDIA Carmel Arm®v8.2 64-bit CPU 8MB L2 + 4MB L36-core NVIDIA Carmel Arm®v8.2 64-bit CPU 6MB L2 + 4MB L3
DL Accelerator2x NVDLA v12x NVDLA v1
Vision Accelerator2x PVA v12x PVA v1
Memory32 GB 256-bit LPDDR4x 136,5 GB/s16 GB 128-bit LPDDR4x 59,7 GB/s
Storage32 GB eMMC 5.116 GB eMMC 5.1
CSI CameraUp to 6 cameras (up to 62 Gbps)Up to 6 cameras (up to 30 Gbps)

Parameters of the Jetson AGX Xavier and the Xavier NX [24 ]

Figure 5
Figure 5.

The chosen computer: the Jetson AGX Xavier (right, without the heat-sink module) [26 ]

Tensor cores and CUDA

Tensor Cores are programmable matrix-operation accelerators that perform their operations in parallel with the CUDA [23 ] cores. These cores implement a newer kind of mixed-precision floating-point and integer operations – HMMA (Half-Precision Matrix Multiply and Accumulate) and IMMA (Integer Matrix Multiply and Accumulate) – which accelerate linear algebra, signal processing, and deep-learning inference [27 ] (see Figure 6).

Figure 6
Figure 6.

A Tensor Core multiplying and summing three-dimensional 4×4×4 matrices [27 ]

CUDA is a parallel-computing platform and programming model that increases the number of computations multi-fold by using the GPU in place of the CPU [23 ].

The purpose of CUDA is to accelerate parallel computation, and a CUDA core is analogous to a CPU core. The only difference is in their construction: a CPU core is able to solve complex computations sequentially, while a CUDA core handles simpler computations in the GPU. The GPU’s advantage over the CPU is that thousands of CUDA cores are placed on a single chip, which allows complex computations to be divided among many cores and therefore performed faster than a CPU with a smaller number of more capable cores. This technology is integrated and widely used in the machine-learning field by Nvidia.

Cameras

When selecting cameras, several parameters must be considered to ensure that the system meets the requirements and operates successfully. Because in this task the computer-vision system is housed in the control tower, which is mounted on the vessel’s mast, the cameras (or the housing in which they are placed) must be capable of operating across wide temperature ranges and changing weather conditions. The camera software should also be chosen to favour the newest and clearly documented material, and its compatibility with the host computer must be considered. This ensures better camera configuration and makes the deployed computer-vision software easier to manage.

The following parameters were considered when selecting cameras:

At higher resolution, detection accuracy improves but inference speed decreases, because the amount of input data grows. A high-resolution photograph contains more information, which in turn allows the neural network to give more confident predictions.

A polychromatic sensor, on the other hand, suits environments where the contrast between background and object is not very large, or where object detection requires more information. With this kind of image transmission, three values are given as a matrix per pixel, the RGB (Red, Green, Blue) colours each in the range 0\dots255 or 0.0\dots1.0 (for example [255, 0, 255]). This transmission mode is three times as data-heavy as transmitting a monochromatic image, since three values are given per pixel.

Devices that would meet the parameters mentioned above are listed below (see Table 3):

Cameras matching the parameters
Camera nameResolution (px)Frame rate (FPS)Operating temp. (°C)Interface
SurveilsQUAD – Sony IMX290 System [28 ]1920x1080120-3085CSI
OpenCV AI Kit: OAK—D [29 ]1280x800 (stereo) 4056x3040 (centre)

120 (stereo)

60 (centre)

N/AN/AUSB-C/PoE
OpenCV AI Kit: OAK—1-PoE [30 ]4056x304060N/AN/AUSB-C/PoE
Atlas IP67 7.1 MP Model [31 ]3208x220074-2055PoE
Atlas IP67 2.8 MP Model [32 ]1936x1464173-2055PoE
Arducam 12MP IMX477 [33 ]4056x304060N/AN/AUSB-C
Arducam Fisheye Camera [34 ]2592×194430N/AN/ACSI and Ethernet

*N/A – data not available

At the time of writing, the SurveilsQUAD cameras were initially used, since they had been prepared in advance for the computer system, and the relevant software modifications for the Xavier had already been made by project parties at the Department of Computer Systems. The project under development at the time of writing was in its development phase, so the budget was limited; we therefore made do, at first, with cameras whose baseline capabilities sufficed for the experiments. During the control-tower design it became apparent that the placement of the chosen cameras was limited by their short cables, and the manufacturer did not supply longer cables for the units.

Therefore the Arducam Mini camera was chosen (see Figure 7), connected to a Raspberry Pi and from there, via an Ethernet cable, to an Ethernet switch and ultimately to the Jetson Xavier. This also solved the cable-length issue, and it became possible to install the latest software on the Xavier, which the older cameras’ outdated drivers did not support.

Figure 7
Figure 7.

Module with the Arducam Fisheye camera [34 ]

Control tower

The control tower sits at the bow of the vessel (see Figure 9) and its task is to take over control of the vessel on the captain’s command and handle it autonomously with as little captain intervention as possible. To do this, the tower has a brain in the form of the Jetson AGX, which gathers information from all sensors and cameras, processes it, and then forwards a control decision to the autopilot, which in turn sends signals to the vessel’s engines and rudder. The Jetson receives information about the surrounding environment from cameras, X-band radar, AIS, and a weather-station wind sensor. Data exchange within the control-tower system takes place via an Ethernet switch and a 4G router, which enables data to be sent and received from the user. Developers can also push software updates to the vessel’s systems. AIS (Automatic Identification System) helps determine the vessel’s position relative to other watercraft using satellites, the radar detects nearby objects as point clouds, and the cameras and the computer vision running on them identify vessels and other watercraft among the detected objects. AIS, radar, and the cameras with computer vision complement one another, helping the vessel detect and orient itself in its surroundings. The system is powered by a direct-current power source (see Figure 8).

Figure 8
Figure 8.

Simplified operating diagram of the control tower

Figure 9
Figure 9.

The control tower’s placement on the patrol vessel [35 ]