The interconnecting busses from the CPU cores to the NICs must be capable of handling the bandwidth present on the NICs. DDR2 and older memory technologies should also be avoided. Low power/low speed DDR3 should be avoided for serving at over a gigabit. DDR5, DDR4, or faster DDR3 is recommended for serving multiple 10Gbps clients. DDR3 based servers may be challenged by multiple 10Gbps or greater workloads. With clients at gigabit speeds, the memory type of the server generally does not matter. This can directly impact latency and add jitter. There is an arbitrary performance penalty per CPU thread with these technologies enabled. In general, we recommend disabling hyperthreading or SMT if present, or purchasing CPUs without hyperthreading/SMT. There are other factors that impact packet rate at various workloads, such as the turbo boost version, where later versions permit more dynamic mixes of slightly boosted cores with others idle, the HyperTransport/QPI bus speed, and amount of IO interconnect lanes. There are a lot of variables in a CPU choice that can directly impact the packet rate outside of MHz per core values, such as amount of cache and cache levels, additional CPU instruction sets that offload particular protocols or assist with IO operations, and various offload engines in the network interface chipsets that reduce CPU instruction rate pressure such as TCP TOE, TCP LRO and TCP LSO. This maximum speed may change depending on Turbo boost, where if other cores are idle the maximum speed may increase, but otherwise it may remain at a constant lower level than the turbo boost maximum.įor the most part, there is no longer an “easy” ratio of 1:1 CPU Core Mhz to achievable packet rate. The limiting factor in any multi-core CPU for any single task is basically the maximum speed of any single core. Even in the case of multi-threaded workloads, any one particular thread can become “pegged” or run as fast as possible on a CPU core while the rest of the cores remain idle. ![]() Performance of any given core in a multi-core CPU can limit the input and output of any given workload if it's not multi-threaded. Turbo Boost, Clock Frequency, and Offload Engines Multi-chiplet single socket NUMA architectures can present this high speed topology and are generally preferred over multi-socket NUMA. For example, if serving up an aggregate of 40Gbps, the NUMA transaction rate must be able to exceed 40Gbps + the CPU architectures transactional overhead while adding minimal (sub nanosecond) latency, which will likely be around 56Gbps internally. It is generally preferable to avoid multi-socket systems with NUMA memory, but if a NUMA system is present the transaction speed must be capable of low latency throughout the servers intended max bandwidth. The OoklaServer software is multithreaded and can take advantage of multiple cores. There are hardware configurations which can directly alter network performance in potentially unexpected ways. Speedtest clients and the OoklaServer service are almost entirely dependent upon sufficient CPU, bandwidth, IO capability, memory constraints, and power efficiency. The servers hardware deployment, operating configuration, and network connectivity should be planned to meet demand for concurrent speed tests. Optimal Server Sizing with Hardware Considerations Note that there are no configuration options in the OoklaServer software for controlling server performance. While Ookla does not offer direct assistance with server or network performance optimization, we can share general recommendations to help you ensure that your OoklaServer service is performing at an accurate and efficient level. Performance tuning for the variety of scenarios that these clients present often involves working with variables across a server and the networking equipment in conjunction to produce positive meaningful results. The number of connections, connection type, test duration, and other parameters are set by Ookla for each test based on perpetual data analysis that is used to provide the most accurate results for Web, Native, CLI, and Speedtest Powered (embedded) clients. Speed tests can be taken using a variety of connection types such as basic TCP, WebSockets, or XML HTTP Requests. There are considerations for the server hardware, the network configuration, and the server's TCP parameters that can vary differently between one environment to the next that directly impact an OoklaServers ability to efficiently deliver accurate tests to clients. ![]() OoklaServer performance involves several aspects of data center and enterprise computing.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |