Intel Microsoft Environment
Intel® I/O Acceleration Technology
Improved application response with higher network I/O performance and efficiency.
Today, more than ever, business success relies on the rapid transfer and processing of data. To improve data transfer to and from applications, IT managers continue to invest in new networking infrastructure to achieve higher performance. Server I/O latency has emerged as the bottleneck that must be overcome for IT to be able to realize value from these investments. This performance issue is addressed with Intel® I/O Acceleration Technology, an integrated platform I/O solution that gets data to and from applications and the network faster and with greater efficiency.
Historical Perspective
Historically, network infrastructure investments included migration from Fast Ethernet to Gigabit Ethernet (GbE) and now 10-Gigabit Ethernet (10GbE). Throughout these evolutionary stages, network server performance easily kept pace with network traffic increases-that is, until recently.
Increasingly, network traffic demands are outpacing the ability of servers to keep up, and the gap continues to widen with ever-increasing network communications and transaction processing workloads. As a result, IT managers are now asking: "After investing in a 10X improvement in network bandwidth, why aren't we seeing comparable improvements in application response time and reliability?
The answer is that since the 1980’s, the method of processing Input/Output (I/O) data hasn’t changed significantly. In particular, the Transmission Control Protocol over Internet Protocol (TCP/IP) stack has changed little since the inception of Ethernet and the Internet. However, TCP/IP protocol processing is not the only issue. In addition, server I/O overhead, memory access, and storage I/O speed and reliability also play roles in server I/O inefficiency. These are not single-element deficiencies, but rather system-wide issues that demand a comprehensive solution. Intel’s R&D teams looked at the entire server I/O architecture to see how they could make improvements in order to take advantage of the higher Gigabit and 10 Gigabit Ethernet rates and the improvements in mass storage. In response, they developed a system-wide solution—Intel I/O Acceleration Technology—that moves data to and from applications much faster, allowing IT to take advantage of the increased network and storage capabilities.
The Growing Server I/O Traffic Jam
Understanding the importance of Intel I/O Acceleration Technology (Intel® I/OAT) requires an understanding of the real nature of the problem being addressed. This is best done by examining the flow of a client request as it is received, processed and responded to by the server. This flow is illustrated in Figure 1, where the following numbered descriptions correspond to the circled numbers in the illustration:
- A client sends a request in the form of TCP/IP data packets that the server receives through its network interface. The data packet contains header information for packet identification and routing and the actual data payload relating to the client request.
- The server processes the TCP/IP packets and routes the packet payloads to the designated application. This processing includes protocol computations involving the TCP/IP stack, multiple server memory accesses for packet descriptors and payload moves, and various other system overhead activities (e.g., interrupt handling, buffer management, etc.).
- The application acts on the client request and recognizes that it needs data from storage in order to respond to the request.
- The application accesses storage to obtain the necessary data to satisfy the client request.
- Storage returns the requested data to the application.
- The application completes processing of the client request using the additional data received from storage.
- The server routes the response back through the network connection to be sent as TCP/IP packets to the client.
The above client-request/server-response cycle has existed since the inception of networks and continues to exist as the essential process of today’s networks. In the past, when traffic volumes and speeds were low, server performance was more than adequate for the task. Today, packet traffic volumes and speeds are much higher. The result of these higher packet volumes and faster rates of arrival is a direct increase in CPU overhead for networking and a corresponding reduction in available cycles for applications. Server application throughput is throttled and datacenter performance degrades.
Intel® I/OAT—A Multifaceted Solution for a Multifaceted Problem
Figure 1 illustrates the point that application response latency is a multi-faceted, system-wide issue. Packets must be received, recognized and processed in order to deliver a payload to the application. The application, while acting on a client request, often must fetch necessary data from storage and return a response to the server. And finally the server must transform the response into a TCP/IP packet and route it back to the client.
Intel I/OAT uses two approaches to address the overall problem—network I/O acceleration and storage I/O acceleration. Relative to Figure 1, network I/O acceleration addresses items relating to packet transformation and packet-related data movement, while storage I/O acceleration addresses items relating to data storage access and reliability. The goal is to get data to and from applications faster for greater application throughput without requiring modification of the application. Intel I/OAT can improve application response time with efficient protocol processing, lower CPU overhead for data movement and more reliable data transfer. The result? Not only does Intel I/OAT preserve existing application integrity, but it provides a platform acceleration technology that also scales with future processor improvements.
Increasing Server Network I/O Performance
Figure 2 shows the system overhead segments addressed by the network I/O portion of Intel I/OAT. It is important to note that the sizes of these segments relative to each other vary depending on packet payload sizes. For example, TCP protocol computation constitutes a 12% segment for 1K payloads typical of e-mail messages and financial transactions, but it nearly doubles to a 22% segment for 64K payloads typical of transferring or backing up large files. Similarly, memory access grows from 30% for 1K payloads to 53% for 64K payloads.
For TCP processing and memory access, it is reasonable to ask: “Aren’t those issues already covered by TCP Offload Engines (TOE) and NICs enabled with Remote Direct Memory Access (RNICs)?”
The answer is yes and no. TOE is an interface-oriented, TCP-focused approach that does not fully address the other performance bottleneck segments shown in Figure 2. The result is that TOE is only effective for large back-end file transfers and can actually degrade performance on applications running on front-end and mid-tier applications that comprise 90% of the dual-processor server market segment. This is illustrated in Figure 3 where TOE is most effective in back-end servers while Intel I/OAT yields consistently high performance across the range of datacenter servers and applications.
As for RNICs, the Remote Direct Memory Access (RDMA) protocol supports direct placement of payload data into an application’s memory space in order to reduce data movement overhead. The RDMA protocol is in addition to existing network protocols like TCP/IP and exhibits its own overhead to arrange each data transfer. Rather than place this burden on the CPU, RDMA uses TOE engines as a resource to run this new workload. The combination of TOE and RDMA improves data movement and offloads the processing of two protocols. However, since RDMA requires modification of applications to make them RDMA-aware, only a few server environments can implement the RDMA protocol. Additionally, RDMA is an end-to-end protocol, so its implementation must include all systems that will transfer data. Because of these implementation issues and narrow application range, RNICs have seen limited industry acceptance.
By contrast, Intel I/OAT addresses all segments of the server I/O bottleneck problem, and it does it using TCP/IP without requiring any modification of existing or future applications. The system-wide network I/O acceleration technologies applied by Intel I/OAT are shown in Figure 4 and include:
Network Flow Affinity—partitions the Network Stack Processing dynamically across multiple physical or logical CPUs. This allows CPU cycles to be allocated to the application for faster execution.
Asynchronous Low-Cost Copy—provides enhanced Direct Memory Access, allowing payload data copies from the NIC buffer in system memory to the application buffer with far fewer CPU cycles, returning the saved CPU cycles to productive application workloads.
Improved TCP/IP Protocol with Optimized TCP/IP Stack—implements separate packet data and control paths to optimize processing of the packet header from the packet payload. This and other stack-related enhancements reduce protocol processing cycles.
The synergistic combination of the above technologies for the network I/O acceleration side of Intel I/OAT gets data to applications up to 30% faster than previously possible.
Increasing Storage I/O Performance
Another key element of the Intel I/OAT solution—storage I/O acceleration—uses hardware-based acceleration for faster data transfers to and from the application (see Figure 4). This includes use of a storage processor to retain this workload as a peripheral function and adding RAID 6 (Redundant Array of Independent Disks) technology for more robust error checking during data transfers with near RAID 5 throughput performance. This ensures faster data transfers with no data loss or modifications as the data transitions through the disks and disk storage subsystem.
Byte parity checks are used to assure data integrity through the storage subsystem. Additionally, parity checks of data are written to the disk drive, preventing data loss in the event of multiple disk failures or bad data blocks during rebuilds. This increases system availability and reliability with shorter backup windows, faster disk rebuild times and superior protection for end-to-end data integrity.
A System-Wide Solution for Higher Performance, Reliability and Efficiency
Unlike TOE solutions, Intel I/OAT is a system-wide solution that addresses all packet and payload processing bottlenecks throughout the server platform. It increases CPU efficiency and delivers data to and from applications faster than possible with current server platforms. And it mitigates the risks associated with data transfers to and from storage with comprehensive error checking. Most importantly, Intel I/OAT scales with future platform improvements, providing a path for further reducing infrastructure costs by consolidating hardware and software, and ultimately growing your business.
For More Information
To find out more about Intel I/O Acceleration Technology, visit www.intel.com/go/ioat.
|