# Empower Wireless Revolution with the Magic of Software Kun Tan Lead Researcher Wireless and Networking Group, MSR Asia ### The Dream of Software Radio #### From Hardware to Software #### Radio architecture ### Approaches liah Performance # Programmable hardware (FPGA + DSP) - Difficult to program - Not to scale - Expensive #### GPP Software - High-performance - **Easy** to program #### Conventional GPP Software - Slow - Limited capability - No real-time Low High #### Programmability ## Fundamental Challenges - System interface throughput - Large volume of high-fidelity digital samples - From 1Gbps to 10Gbps - Computation - Large amount of arithmetic calculations for digital signal processing - Tens of GOPS estimated - Real-time support - Hard deadline and accurate timing control for wireless protocols - From 10ms (multi-media) to 1 $\mu$ s ### The Sora Project - A multi-year research project (started 2007) - Bet on the commodity multi-core PC - Successfully turning a commodity PC into a powerful software radio - Orders of magnitude better than prior art - First standard compliant WiFi (802.11 a/b/g) - First demo LTE (uplink) ## Meeting the Challenges - Take advantage of modern PC bus technologies - Driven by high-speed interconnection among PC subsystems - PCIe-based interface (> 10Gbps) - Ride the wave of multi-core technology - Sustain Moore's law when CPU hits the heat wall - Software innovation to unlatch the full power of modern CPU ### Sora Architecture #### Sora Radio Control Board ## Software Optimizations - High performance PHY implementation on multi-core - Trade memory for computation - Exploit data parallel with SIMD - Streamline across multiple cores - Core dedication for real-time support ### Trade Memory for Computation - Exploit large high-speed cache memory of CPU - Multiple mega byte L2/L3 cache - Extensive use of lookup tables (LUT) to store the computation Direct impl. 8 ops per bit LUT impl. 2 Look-up op for 8 bits! (size 32KB) Applicable for more than half of the common algorithms; speedup ranges from 1.5x to 22x ### Exploit Data Parallelism with SIMD - Utilize short-vector SIMD extension in CPU - Simultaneously perform calculations on multiple elements of vectors - Applicable to many PHY algorithms with significant speedups $(1.6x \sim 50x)$ ### Pipeline across Cores - Partition PHY processing work across cores - Interconnecting sub-pipeline with lightweight, synchronized FIFOs ### Core Dedication for Realtime - Software is considered "uncertain" in traditional OS - Multiplexed with multiple tasks/processes/threads - Interrupts and contention in memory/bus - RTOS complex/overhead/limited functionality - Core dedication a dumb idea whose time comes - Exclusively allocate enough cores for RT tasks - I/O Polling, instead of interrupt - Precise timing control at CPU clock level - Simple abstraction, and easier to implement in standard OSes - Even implemented in Windows © ### Worldwide Recognition ing designs, such as LTE and AP virtu alization, to be built fully in software. research uses the GNU Radio/USRP platform. Despite the limitations of this platform, previous attempts at replacing it with more capable plat related with the more capable design. One of the classic papers we teach our better) tends to appear first becaus its simplicity. Once adopted, the sys tem will gradually improve until it is almost the right design. One may a t to spread. Once it was accepted, i demic institutions to enable them to experiment with this new platform. It will be interesting to see whether Sora #### NSDI'09 Best Paper, Best Demo - Called out at SIGCOMM'09 keynote speech - "Honorable Mention" Demo, Mobicom'09 - Highlighted at CACM January 2011 "One the most significant wireless papers" - SIGCOMM'10 Best Demo Award #### **Technical Perspective** #### Sora Promises Lasting Impact ne measor was defined radio (SDR) | with the programmability and flexibil | time demos. It has enabled demand first appeared in 1992 and referred | ty of general purpose processors. To | inglestigns, such as LTE and AP virtue to a radio transceiver where the basic | do so, for must overcome the follow | alization, to be built fully in software signal processing components (for ex ample, filtering, frame detection, syn chronization, and demodulation) are all done in a general purpose proces sor. The coal of an SDR was to enable a with a software patch. While the concept of SDR has been around for decades, only recently have There are many reasons why the following paper about Sora stands out as one of the most significant wireless papers in the past few years ing challenge: How can a radio deliver high throughput and support real time protocols when all signal processing is done in software on a PC? Sora's approach uses various fea tures common in today's multicore ar the digital waveform samples from the radio board to the PC requires very high bus throughput. While alternative SDR technologies employ USB 2.0 or Giga bit Bthernet, Sora opts for PCI Express. This design decision enables Sora to achieve significantly higher transfer rates, which are important for high bandwidth multi antenna designs. The choice of PCI express also enables Sora to reduce the transfer latency to instruction multiple data (SIMD), and dedicates certain cores exclusively to real time signal processing. There are many reasons why the following paper about Sora stands out as one of the most significant wireless papers in the past few years. First, it presents the first SDR platform that fully implements IEBE 802.11b/g on standard PCs. Second, the design choices it makes (for example, the use of PCIe, SIMD, trading computation for memorylookups, and core dedica tion) are highly important if software radios are ever to meet their original goal of one radio for all wireless technologies. Third, the paper is a beautiful and impressive piece of en gineering that spans signal process ing, hardware design, multicore pro gramming, kernel optimization, and o on. For all these reasons, this paper The Sora platform has been used Sora Academic Kit release! - HW/SW available for academic research - Over 230 units shipped worldwide #### Sora Kit Evolution Setter programmability And more capability - Sora ver 1.02 - Many assembly code for SIMD - Kernel mode only - Integrated programs - 802.11b samples - Sora ver 1.1 - Vector1 Library - Kernel mode only - Integrated programs - -802.11a/b/g samples ### Sora Kit Evolution (cont.) Better programmability And more capability - Sora ver 1.5 (lastest Sept 2011) - Kernel mode and User Mode Extension - Full capability with Windows XP - HW verification tool - Integrated program - Sora ver 1.6 (coming soon) - UMX reflection - Brick modular architecture - 802.11b Brick Sample - DebugPlot tool ### Sora Kit Evolution (cont.) Better programmability And more capability - Sora ver 1.7 (Mar 2012) - 802.11abg Brick sample - Sora ver 2.0 - MIMO support - 802.11n sample ### Software Architecture in Sora 1.5 #### **UMX** Architecture - Direct memory mapping for data path - High performance/zero copy - Device I/O for control path - Protection - · User-mode Sora thread library for realtime ### **UMX** Reflection - Integrate UMX app into network stack - Will be shipped in Sora Ver 1.6 ### **Brick Library** A modular software architecture for DSP programs - Design goals - Modularize and flexible - Better code reuse - High performance - Introduce minimal overhead #### **Brick Model** A brick is a basic functional module Mode DSP program as a directed graph #### Define a Brick #### Ports Data type to read and write ### Interface - Process: the main method called to operate on input data and write results to outport - Reset: reinitialize the module - Flush: end a data stream ### Context binding - Access shared state variables ### Brick Context Hierarchy - A mechanism to enable sharing states among bricks in a processing graph - Ex. the decoding stage; channel coefficients ### Write a SDR program ``` ISource* CreateModGraph () { CREATE_BRICK_SINK (modsink, TModSink, fb11bModCtx); CREATE_BRICK_FILTER (pack, TPackSample16to8, fb11bModCtx, modsink); CREATE_BRICK_FILTER (shaper, TQuickPulseShaper, fb11bModCtx, pack); CREATE_BRICK_FILTER (spread, TBarker11Spread, fb11bModCtx, shaper); CREATE_BRICK_FILTER (bpsk, TDBPSKMap, fb11bModCtx, spread); CREATE_BRICK_FILTER (qpsk, TQBPSKMap, fb11bModCtx, spread); CREATE_BRICK_DEMUX2 (mrsel, T11bMRSel, fb11bModCtx, bpsk, qpsk); CREATE_BRICK_FILTER (sc741, TSc741, fb11bModCtx, mrsel); CREATE_BRICK_SOURCE (ssrc, TB11bPHYSrc, fb11bModCtx, sc741); return ssrc; ``` ## Previously requires 6000+ lines of code! ### Multi-threading in Brick graph CREATE\_BRICK\_FILTER (spread, TBarker11Spread, fb11bModCtx, shaper); CREATE\_BRICK\_SEPARATOR (sep, Tseparator<COMPLEX16,1>, fb11bModCtx, spread); CREATE\_BRICK\_FILTER (bpsk, TDBPSKMap, fb11bModCtx, sep); CREATE\_BRICK\_FILTER (qpsk, TQBPSKMap, fb11bModCtx, sep); ## Brick implementation - C++ template library, instead of C++ virtual function - Enable inline function optimization - Avoid function call overhead - Enable compiler optimization at compile time ### DebugPlot - A graphic tool to help debug/monitor the real-time SDR programs - Easy-to-use and light-weight APIs to plot graphs on DbgPlot viewer ## DebugPlot (cont.) ### Sora MIMO Kit and Whitespace - MIMO Kit - External radio unit with PCIe cable extension - -4x4 MIMO, 20/40MHz channel - Extensible to 8x8 - Whitespace front-end ### Summary High Performance # Programmable hardware (FPGA + DSP) - Difficult to program - Not to scale - Expensive #### Sora - High-performance - Easy program - Vector lib - Brick model - UMX Reflection - Debugging tools # Conventional GPP Software - Slow - Limited capability - No real-time C Low High ### Thanks! Questions? ## Coming to Sora demo this afternoon...