2018 Winner: CNNs on FPGAs for Track Reconstruction

Project Information
CNNs on FPGAs for Track Reconstruction
Santa Cruz Institute of Particle Physics
The Large Hadron Collider (LHC) is the world’s largest and most powerful particle collider, producing 10 megabytes of data per 25 nanoseconds. A single particle collision at the LHC is called an event, with each event containing multiple tracks (particle trajectories) that must be inferred from detector information called hits. The quantity of data being produced is too large to store for later analysis so a large quantity of the data needs to be discarded as it is produced, a step known as the level 1 (L1) trigger. The most computationally expensive step in the L1 trigger is inferring tracks from hits, a problem called the tracking problem. As it currently stands, during the L1 trigger, it is necessary to identity 50 million particle tracks per second with latency lower than 10 microseconds per track. The LHC is undergoing upgrades and is projected to produce 10 times more tracks than it has in the past, an increase which current algorithmic implementations can 4not support. To address this problem we have started investigating the feasibility and performance of Convolutional Neural Networks (CNNs) implemented on Field Programmable Gate Arrays (FPGAs). CNNs are a machine learning algorithm which have shown promising preliminary results for solving the tracking problem. Current popular programming libraries which can be used to program CNNs are all heavily reliant on Graphics Processing Units (GPUs) to shoulder the bulk of heavy computation. Unfortunately, current and projected GPU architectures do not meet the latency requirements of the L1 trigger. FPGAs are programmable integrated circuits which mimic fabricated circuits, and thus can be much more efficient than general purpose processors such as GPUs. FPGAs are generally programmed at firmware level using Hardware Description Languages (HDLs), but can also be programmed using higher level languages such as OpenCL. We demonstrated a pipelined CNN in firmware which can be scaled to maximize FPGA resource usage, along with an OpenCL implementation of the same network. We found that Digital Signal Processors (DSPs), a resource on the FPGA which can be used as a multiplier, are a limiting resource. Our firmware implementation allows larger FPGAs with more DSPs to maximize resource usage to further parallelize individual steps in the CNN. Finally, we analyzed the feasibility of CNN architectures used for tracking with very stringent requirements. Although we developed this implementation as a solution to trigger level tracking, it can also be used for other latency sensitive applications of CNNs such as autonomous driving, stock trading, and other real time problems.
PDF icon 1091.pdf
  • Thomas D Boser (Eight)