Why?


Search This Blog

Saturday, March 26, 2016

Intel 10Gb x520-da2 Performance Tuning Windows

Ripped from:
http://www.intel.com/content/www/us/en/support/network-and-i-o/ethernet-products/000005811.html

Adapter installation suggestions
  • Install the Intel® Network Adapter in a slot that matches or exceeds the bus width of the adapter.
    • Example 1: if you have a 32-bit PCI adapter put it in a 32-bit or 64-bit PCI or PCI-X* slot.
    • Example 2: if you have a 64-bit PCI-X adapter put it in a 64-bit PCI-X slot.
    • Example 3: if you have an x4 PCIe* adapter put it in an x4, x8, or x16 PCIe* slot.
    Note Some PCIe* slots are physically wired with fewer channels than the dimensions of the slot would indicate. In that case, a slot that matches an x8 dimensions would have the functionality of an x4, x2 or x1 slot. Check with your system manufacturer.
  • For PCI and PCI-X*, install the Intel Network Adapter in the fastest available slot.
    • Example 1: if you have a 64-bit PCI adapter put it in a 66 MHz 64-bit PCI slot.
    • Example 2: if you have a 64-bit PCI-X adapter put in a 133 MHz (266 or 533 if available) 64-bit PCI-X slot.
    Note The slowest board on a bus dictates the maximum speed of the bus. Example: when a 66MHz and a 133 MHz add-in card are installed in a 133 MHz bus, then all devices on that bus function at 66 MHz.
  • Try to install the adapter in a slot on a bus by itself. If add-in cards share a bus, they compete for bus bandwidth.
Driver configuration suggestions
  • For Intel® Ethernet 10 Gigabit Converged Network Adapters, you can choose a role-based performance profile to automatically adjust driver configuration settings.
  • Reduce Interrupt Moderation Rate to Low, Minimal, or Off
    • Also known as Interrupt Throttle Rate (ITR).
    • The default is "Adaptive" for most roles.
    • The low latency profile sets the rate to off.
    • The storage profiles set the rate to medium.
    Note Decreasing Interrupt Moderation Rate increases CPU utilization.
  • Enable Jumbo Frames to the largest size supported across the network (4KB, 9KB, or 16KB)
    • The default is Disabled.
    Note Enable Jumbo Frames only if devices across the network support them and are configured to use the same frame size.
  • Disable Flow Control.
    • The default is Generate & Respond.
    Note Disabling Flow Control can result in dropped frames.
  • Increase the Transmit Descriptors buffer size.
    • The default is 256. Maximum value is 2048.
    Note Increasing Transmit Descriptors increases system memory usage.
  • Increase the Receive Descriptors buffer size.
    • The default is 256. Maximum value is 2048.
    Note Increasing Receive Descriptors increases system memory usage.
TCP configuration suggestions
  • Tune the TCP window size (Applies to Windows* Server editions before Windows Server 2008*).
    Notes Optimizing your TCP window size can be complex as every network is different. Documents are available on the Internet that explain the considerations and formulas used to set window size.
    Before Windows Server 2008, the network stack used a fixed-size receive-side window. Starting with Windows Server 2008, Windows provides TCP receive window auto-tuning. The registry keywords TcpWindowSize, NumTcbTablePartitions, and MaxHashTableSize, are ignored starting with Windows Server 2008.
Teaming considerations and suggestions
When teaming multiple adapter ports together to maximize bandwidth, the switch needs to be considered. Dynamic or static 802.3ad link aggregation is the preferred teaming mode, but this teaming mode demands multiple contiguous ports on the switch. Give consideration to port groups on the switch. Typically, a switch has multiple ports grouped together that are serviced by one PHY. This one PHY can have a limited shared bandwidth for all the ports it supports. This limited bandwidth for a group may not be enough to support full utilization of all ports in the group.
Performance gain can be limited to the bandwidth shared, when the switch shares bandwidth across contiguous ports. Example: Teaming 4 ports on Intel® Gigabit Network Adapters or LAN on motherboards together in an 802.3ad static or dynamic teaming mode. Using this example, 4 gigabit ports share a total PHY bandwidth of 2 Gbps. The ability to group switch ports is dependent on the switch manufacturer and model, and can vary from switch to switch.
Alternative teaming modes can sometimes mitigate these performance limitations. For instance, using Adaptive Load Balancing (ALB), including Receive Load Balancing. ALB has no demands on the switch and does not need to be connected to contiguous switch ports. If the link partner has port groups, an ALB team can be connected to any port of the switch. Connecting the ALB team this way distributes connections across available port groups on the switch. This action can increase overall network bandwidth.
Performance testing considerations
  • When copying a file from one system to another (1:1) using one TCP session, throughput is significantly lower than doing multiple simultaneous TCP sessions. Low throughput performance on 1:1 networks is because of latency inherent in a single TCP/IP session. A few file transfer applications support multiple simultaneous TCP streams. Some examples are: bbFTP*, gFTP*, and FDT*.
    This graph is intended to show (not guarantee) the performance benefit of using multiple TCP streams. These are actual results from an Intel® 10 Gigabit CX4 Dual Port Server Adapter, using default Advanced settings under Windows 2008* x64.
  • Direct testing of your network interface throughput capabilities can be done by using tools like: iperf*, and Microsoft NTTTCP*. These tools can be configured to use one or more streams.
  • When copying a file from one system to another, the hard drives of each system can be a significant bottle neck. Consider using high RPM, higher throughput hard drives, striped RAIDs, or RAM drives in the systems under test.
  • Systems under test should connect through a full-line rate, non-blocking switch.
  • Theoretical Maximum Bus Throughput:
    • PCI Express* (PCIe*) Theoretical Bi-Directional Bus Throughput.
      PCI Express Implementation Encoded Data Rate Unencoded Data Rate
      x1 5 Gb/sec 4 Gb/sec (0.5 GB/sec)
      x4 20 Gb/sec 16 Gb/sec (2 GB/sec)
      x8 40 Gb/sec 32 Gb/sec (4 GB/sec)
      x16 80 Gb/sec 64 Gb/sec (8 GB/sec)
    • PCI and PCI-X Bus Theoretical Bi-Directional Bus Throughput.
      Bus and Frequency 32-Bit Transfer Rate 64-Bit Transfer Rate
      33-MHz PCI 1,064 Mb/sec 2,128 Mb/sec
      66-MHz PCI 2,128 Mb/sec 4,256 Mb/sec
      100-MHz PCI-X Not applicable 6,400 Mb/sec
      133-MHz PCI-X Not applicable 8,192 Mb/sec
      Note The PCIe* link width can be checked in Windows* through adapter properties. Select the Link Speed tab and click the Identify Adapter button. Intel® PROSet for Windows* Device Manager must be loaded for this utility to function.

2 comments: