A bottleneck in a computer. Megahertz in the "bottleneck"

18.10.2020

Browsers

If you ask a modern person what basic parts a computer consists of, then, perhaps, he will give a rather long list, in the first lines of which there will be system unit, keyboard and monitor. It is easy to guess that such a list is hardly suitable for characterizing a computer that controls a microwave oven, a car ignition system, or a spacecraft. In general, it has no more in common with reality than in the statement that the crimson jacket is the hallmark of all vertebrates.

Any computer, regardless of its architectural features and purpose, consists of a central processor and RAM, which, if necessary, can be supplemented with peripheral devices. The latter are mainly used to enable the computer to exchange information with the outside world. But in general, its performance is determined by the coordinated work of the processor and memory. And just here a bottleneck has emerged recently.

The IBM PC, the first mainstream 16-bit personal computer, appeared a little over 20 years ago, used the Intel 8088 processor - the younger brother of the Intel 8086, which differs from it with a twice narrower external data bus. This solution was economically justified, since it allowed the use of eight-bit peripherals, so that the new computer did not differ too much in price from its eight-bit counterparts. But if the previous Intel 8086 processor carried out synchronous fetch and execution of commands, then the new processor performed these actions asynchronously - a queue of commands appeared, which was filled when there was no intensive exchange of the processor with the data area. This allowed for more efficient use of the data bus bandwidth, and halving its width did not lead to a significant performance drop.

At that time, the memory practically did not delay the execution of commands: the processor worked at a clock frequency of 4.77 MHz and even calculated the address much longer than the memory issued the necessary data. However, the processor speed is characterized by the clock frequency, while the memory speed is characterized by the access time, which is not subject to such dizzying progress: the clock frequency has increased by almost 500 times, and the access time has decreased by only about an order of magnitude. But if the access time is, say, 100 ns, then at 10 MHz this corresponds to one processor cycle, at 40 MHz - four cycles, and at 100 MHz - already ten. In addition, the architecture of processors was improved, so that the same instructions began to be executed in a much smaller number of cycles (see Table 1).

The developers took into account emerging trends. The IBM PC AT came out with a full 16-bit data bus, and the IBM AT-386 with a full 32-bit one. The same was the bit capacity of computers on the 486 processor. Further more. The width of the external data bus of the Pentium was 64-bit, that is, it was twice the bit width of the processor. And for GPUs (often called 3D accelerators) it is already 128-256 bits.

Increasing the bus width is not the only way to get around low memory speeds. Starting with the older 386 models, computers began to use cache memory - a small amount of high-speed memory, which is like a buffer between the slow main memory and the processor.

Modern memory modules are designed to operate in a narrow frequency range, so their timing diagrams are optimized for only one, nominal, frequency. Therefore, using lower frequencies will reduce memory performance proportionally.

Until recently, the frequency of the front side bus (FSB) of the processor had to match the clock frequency of the RAM. The low bus frequency of Celeron processors (66 MHz) limited the performance of this class of processors. The latest chipsets allow the memory to be clocked at a higher frequency, which can significantly affect the overall performance. To investigate this issue, we used a Gigabyte GA-6VTX motherboard on the VIA Apollo Pro 133T chipset, which allows you to independently set both the external processor frequency and the clock frequency of the RAM. Two processors with different external bus frequencies were installed on it in turn: Celeron-566 (FSB 66 MHz) and Celeron-1000 (FSB 100 MHz). The test suite is traditional for our magazine. All tests were carried out in a single-tasking operating system (DOS Mode Windows 98 SE). Naturally, the cache memory was not turned off during measurements, which also had a significant impact (sometimes decisive, but more on that below) on the results.

When reading, writing and transferring a 4-MB array, quite definite patterns emerged (see Table 2).

First, increasing the memory clock frequency from 100 to 133 MHz with 66 MHz FSB did not change the results. This is true not only for sequential access, but in general for all experiments performed. What's the matter: whether the FSB is not capable of "digesting" the more than 1.5-fold increase in the memory frequency, or the real clock frequency is "frozen" by 100 when BIOS Setup shows 133 - it's hard to say.

Secondly, the speed of a significant part of operations depends only on the memory frequency, and not on the processor frequency.

Thirdly, the actually measured memory access speeds often turn out to be significantly lower than what might be expected based on the simplest estimates.

As an alternative to sequential access, you can use random access. Inside the 32-MB area, an address was randomly calculated, and then one byte was read or written to this address (Fig. 1).

The extremely low exchange rate is due to two factors. First, it takes a long time to set the address initially (see sidebar). Secondly, read / write operations are buffered by the cache, and exchange with it is carried out only in portions of 32 bytes. In other words, in order to read 1 byte from memory, it is necessary to transfer it to cache 32. In conclusion, I note that the procedure for calculating a random address, of course, also takes some time, nevertheless, the applied algorithm allows, when the amount of data is reduced to the size of the L2 cache, a sample is made at a speed more than 70 MB / s for the used 1 GHz processor.

The different nature of the dependence of performance on the frequency of the processor and memory for different types of applications can be seen in Fig. 2.

For consistency, the unit of performance is a 566 MHz processor with 66 MHz memory. The curves represent the following test programs:

Solution of a system of partial differential equations (PDE) on a 566 MHz processor with a data volume of 40 MB.
SDUCHP solution on 1000 MHz processor under the same conditions.
Finding prime numbers by the "sieve of Eratosthenes" (RE) method on a 566 MHz processor with 40 MB of arrays.
RE on a 1000 MHz processor with the same amount of arrays.
Quicksort a 16MB array on a 566 MHz processor.
Quicksort a 16MB array on a 1GHz processor.
Finding the shortest path in a graph using Dijkstra's method on a 566-MHz processor. The size of the arrays is 300 KB (more than twice the size of the cache memory).
Finding the shortest path in a graph on a 1 GHz processor under the same conditions.

It can be seen from the above data that most of the applications are most sensitive to the processor frequency. In Fig. 2, they correspond to horizontal curves near unity for a frequency of 566 MHz and near 1.7-1.8 for 1000 MHz. The "sieve of Eratosthenes" turned out to be the most sensitive to the memory frequency, while the performance with 66 MHz memory was practically independent of the processor frequency. Graphs for such applications look like slanted lines, for which a doubling of the frequency corresponds to the same increase in performance, while there is no dependence on the processor frequency. Some compromise is observed when solving a system of differential equations. The performance depends on the memory frequency, but not in direct proportion, but is much weaker, in addition, at a memory frequency of 66 MHz, the 1 GHz Celeron processor demonstrates only a quarter higher performance compared to 566 MHz.

I would like to make a comparison using one more problem as an example, as well as explore the effect of cache memory.

In those distant days, when computers occupied several floors of a building and were used exclusively for scientific calculations, a lot of attention was already paid to assessing performance. True, there were no memory-related delays then, and floating point calculations were considered the most difficult. It is for such calculations that the Dongarr test was written - the solution of a system of linear equations. The results of some generalization of this test are shown in Fig. 3. Now it turned out that the floating point calculations themselves can be performed much faster than writing the results of these calculations into memory.

The low performance for small arrays is explained by the fact that modern superscalar processors with a pipelined architecture perform well a continuous sequence of instructions, while loops and procedure calls are somewhat worse, and the overhead for these operations increases with decreasing array sizes. Until the amount of data reaches the cache size, performance increases, and is determined solely by the processor frequency. When the cache is full, we see a sharp drop in performance, reaching tenfold. In this case, the curves in the transition region first diverge, and then converge again, but in a completely different way - in terms of memory frequency. The processor clock frequency loses its role, the memory frequency comes to the fore.

Fortunately, a significant part of real-world applications do not reach such volumes of simultaneously processed data, at which the processor clock speed no longer plays a role. The texture processing required at each frame is taken over by the GPU, and there are completely different frequencies and bus widths. And the rest of the bulk data, be it a video film, archive or multi-page document, as a rule, is processed in small portions that fit into the cache memory. But on the other hand, the cache memory also needs to be filled. And the decrease in productivity "only" by half or three or even by tens of percent instead of ten times can hardly comfort.

One more conclusion follows from the performed measurements: if the chipset allows asynchronous operation of the processor and memory, this neutralizes the difference in performance due to the difference in FSB, which is available, for example, in Celeron and Pentium !!!.

From the editor: Although the conclusions obtained can be safely attributed only to motherboards based on the VIA Apollo 133T chipset, on the whole, this approach can be applied to assess the efficiency of switching to the 533 MHz bus in modern motherboards.

How dynamic memory works

The central part of the dynamic memory chip is an MxN capacitor array, where M and N are usually two to some degree. Each capacitor can be in one of two states: charged or discharged, thus it stores 1 bit of information.

The address is transferred to the memory microcircuit in two stages: first, the lower half of the address is fixed in the address register by the RAS signal (row address strobe), and then the older half by the CAS signal (column address strobe). When reading data from memory after fixation, the lower part of the address is fed to the row decoder, and from it to the matrix, as a result of which the row of matrix capacitors is entirely connected to the input of the reading amplifiers. In the process of reading, the capacitors are discharged, which means that information in the matrix is \u200b\u200blost. To prevent this, the just read line of data is again written to the line of the matrix capacitors - the memory is regenerated. By the time a row from the matrix entered the buffer of the read amplifier, the upper half of the address has already been submitted to the column address decoder, and with the help of this decoder, one single bit of information stored at the address fixed in the address register is selected. After that, the read data can be fed to the output of the microcircuit. When writing information, first the entire line is also read, then the required bit is changed in it and the line is written to the same place. Increasing the bit width to 1, 2, 4 or 8 bytes is achieved by the parallel operation of several memory chips or several matrices in one chip.

As you can see, to access a dynamic memory cell, you need to do many sequential operations, and therefore the access time turns out to be quite long - today it is 35-50 ns, which corresponds to 5-7 clock cycles of the external bus.

Memory operating as described above (DRAM - Dynamic Random Access Memory) was used in the first personal computers. In this case, the amount of information up to 64 kbit was stored in one case. But if address operations inevitably take a long time, then is there any way around this limitation? After all, the processor often needs long strings of bytes stored in memory in a row, for example, when executing a sequence of commands or when processing strings and data arrays. And the solution was found: after transferring the address of the first element to the microcircuit, several subsequent ones were read only using control bus signals, without transferring a new address, which turned out to be about twice as fast. This memory was called FPM RAM (fast paging memory) and for a long time became the only type of RAM used in personal computers. To indicate the temporal characteristics of such a memory, sequences of numbers were used: for example, "7-3-3-3" meant that it took 7 system bus cycles to get the first piece of data, and 3 cycles each for the next. However, the gap between the processor clock frequency and the frequency the system bus, on the one hand, and the progress of technology, which made it possible to reduce the number of cycles consumed by the computing unit per operation, on the other hand, raised the question of further improving the technology of working memory.

The next step was the development of EDO RAM - memory with increased data output time, when it became possible to combine the receipt of the next data block with the transmission of the "request" for the next one. This made it possible to reduce the access time by one step: "6-2-2-2". However, EDO RAM was very soon replaced by SD RAM (synchronous), due to the interleaving of blocks it was possible to reduce the access time to "5-1-1-1-1-1-1-1". At the same time, a certain marketing ploy was applied: if when designating the access time to memory such as FPM and EDO RAM, it was customary to indicate the time of the first access, which was 60-80 ns, then for SD RAM they began to indicate the time of the second and subsequent ones, which was already 10-12 ns for the same clock frequencies and, therefore, the close time of the first visit. At the same time, the performance of the memory subsystem increased by tens of percent, while the numbers denoting the memory access time decreased several times.

SDRAM is still the main memory type for intel processors Pentium !!! and Celeron. Along with it, newer developments can also be used: DDR RAM (more precisely, DDR SDRAM, but we will use the indicated designation), used mainly with AMD Athlon and Duron processors, operating at the same frequencies (100-133 MHz), but allowing transmit data up to two times per clock: on the leading and trailing edges (this is why such a concept as the effective frequency, in this case 200-266 MHz), and oriented to use in systems with Pentium-4 RDR RAM (Rambus RAM), operating at frequencies of 300-533 MHz (effective frequency 600-1066 MHz).

If for SDRAM (now often called SDR DRAM) the designations PC-100 and PC-133 were adopted, meaning the ability to work at 100 and 133 MHz, respectively, then for new types of memory, say PC-2100, the numbers no longer indicate a frequency of more than 2 GHz , but only the "peak" data transfer rate. The word "peak" is put in quotation marks because in whatever ideal conditions we take measurements, the resulting ratio of the amount of transmitted information to the time spent on this will not only not be equal to the indicated numbers, but will not even tend to them asymptotically. The fact is that this speed is given for the part of the packet with the first piece of data cut off, that is, as for SDRAM, only for the "second and subsequent" ones. For DDR RAM, the first access time is the same as for SDRAM, and the subsequent ones are half as fast. Therefore, with sequential access, the performance gain is tens of percent, and with arbitrary access, there is no gain at all.

In the last windows versions there was a function for determining the power rating for different PC components. This provides an overview of the performance and bottlenecks of the system. But here you will not find any details about the speed parameters of the components. In addition, this diagnostics does not allow stress testing the hardware, which is useful for understanding peak loads during the launch of modern games. Third-party benchmarks of the 3DMark family also give only an estimated characteristic in conditional points. At the same time, it is no secret that many of the computer hardware manufacturers optimize the operation of video cards and other components in such a way as to get the maximum number of points precisely when passing 3DMark. This program even allows you to compare the performance of your equipment with similar ones from its database, but you will not get specific values.

Therefore, PC testing should be carried out separately, taking into account not only the performance assessment by the benchmark, but also the real specificationsrecorded as a result of equipment inspection. We have selected for you a set of utilities (both paid and free) that allow you to get specific results and identify weak links.

Image processing speed and 3D

Testing video cards is one of the most important steps in assessing the power of a PC. Manufacturers of modern video adapters equip them with special software and drivers that allow using the GPU not only for image processing, but also for other calculations, such as video encoding. Therefore the only reliable way to find out how efficiently computer graphics are processed - to resort to a special application that measures the performance of the device.

Checking the stability of the video card

Program: FurMark 1.9.1 Website: www.ozone3d.net FurMark is one of the fastest and easiest tools to test your video adapter. The utility tests the performance of the video card based on OpenGL technology. The proposed rendering algorithm uses multi-pass rendering, each layer of which is based on GLSL (OpenGL Shader Language).

To load the graphics card processor, this benchmark renders an abstract 3D picture with a torus covered in fur. The need to process a large amount of hair leads to the maximum possible load on the device. FurMark checks the stability of the video card, and also shows the changes in the temperature of the device with increasing load.

In the FurMark settings, you can specify the resolution at which the hardware testing will be carried out, and upon completion the program will present a short report on the PC configuration with the final score in conditional points. This value is useful for general comparison of the performance of several video cards. You can also check the "standby" resolutions of 1080p and 720p.

Virtual Stereo Walk

Program: Unigine Heaven DX11 Benchmark Website: www.unigine.com One of the surest ways to test what a new computer can do is to run games on it. Modern games make full use of hardware resources - graphics card, memory and processor. However, not everyone has the opportunity and desire to spend time on such entertainment. Alternatively, you can use the Unigine Heaven DX11 Benchmark software. This test is based on the Unigine game engine (such games as Oil Rush, Dilogus: The Winds of War, Syndicates of Arkon and others are built on it), which supports graphics APIs (DirectX 9, 10, 11 and OpenGL). After launching it, the program will create a demo visualization by rendering the virtual environment in real time. The user will see a short video that will include a virtual walk through the fantasy world. These scenes are created by the graphics card. In addition to 3D objects, the engine simulates complex lighting, simulating a global system with multiple reflections of light rays from elements of the scene.

Computer testing can be performed in stereo mode, and in the benchmark settings it is possible to select a 3D video image standard: anaglyph 3D, separate frame output for the right and left eyes, etc.

Despite the fact that the name of the program mentions the eleventh version of DirectX, this does not mean that Unigine Heaven is intended only for modern video cards. In the settings of this test, you can select one of the earlier versions of DirectX, as well as set an acceptable level of picture detail and specify the rendering quality of shaders.

Weak link detection

In a situation where the user is seized by the desire to increase the performance of his computer, the question may arise: which component is the weakest? What will make the computer faster - replacing the video card, processor, or installing a huge amount of RAM? To answer this question, you need to test individual components and identify the "weak link" in the current configuration. Find it using a unique multitesting utility.

Load simulator

Program: PassMark PerformanceTest Website: www.passmark.com PassMark PerformanceTest analyzes virtually any device in your PC configuration, from motherboard and memory to optical drives.

A feature of PassMark PerformanceTest is that the program uses a large number of different tasks, scrupulously measuring the performance of a computer in different situations. At a certain moment, it may even seem that someone took control of the system into their own hands - windows are opened in an arbitrary way, their contents are scrolled, images are displayed on the screen. All of this is the result of a benchmark that simulates the most common tasks typically required in Windows. At the same time, the speed of data compression is checked, the time required to encrypt information is recorded, filters are applied to photographs, the speed of rendering vector graphics is set, short demo three-dimensional videos are played, etc.

At the end of testing, PassMark PerformanceTest gives a total score in points and offers to compare this result with the data obtained on a PC with different configurations. For each of the tested parameters, the application creates a diagram on which weak computer components are very clearly visible.

Disk system check

Disk system bandwidth can be the bottleneck in PC performance. Therefore, it is extremely important to know the real characteristics of these components. Testing the hard drive will not only determine its read and write speeds, but also show how reliably the device works. We recommend trying two small utilities to test your drive.

Exams for HDD

Programs: CrystalDiskInfo and CrystalDiskMark Website: http://crystalmark.info/software/index-e.html These programs are created by one developer and complement each other perfectly. Both of them are free and can work without installation on a computer, directly from a USB flash drive.

Most hard drives are equipped with SMART self-diagnostic technology, which allows predicting possible malfunctions in the drive. With the help of the CrystalDiskInfo program, you can assess the real state of your HDD in terms of reliability: it reads SMART data, determines the number of problem sectors, the number of read head positioning errors, the time it takes to spin up the disk, and the current temperature of the device. If the latter is too high, then the media to failure will be very short. The program also shows the firmware version and provides data on the duration of use hard disk.

CrystalDiskMark is a small application that measures write and read speeds. This tool for checking disks differs from similar utilities in that it makes it possible to use different conditions for writing and reading data - for example, measuring readings for blocks of different sizes. The utility also allows you to set the number of tests to run and the amount of data used for them.

Speedometer for web surfing

The real speed of a network connection usually differs from that indicated in its settings or declared by the provider, and, as a rule, in a lower direction. A lot of factors can affect the speed of data transfer - the level of electromagnetic interference in the room, the number of users simultaneously working on the network, the quality of the cable, etc.

Estimating network speed

Program: SpeedTest Website: www.raccoonworks.com If you want to know the actual data transfer rate of your local network, the SpeedTest program will help you. It allows you to determine whether the provider adheres to the declared parameters. The utility measures the speed of data transfer between two users' work machines, as well as between a remote server and a personal computer.

The program consists of two parts - server and client. To measure the speed of information transfer from one computer to another, the first user needs to start the server part and specify an arbitrary file (preferably large) that will be used for the test. The second test participant must start the client component and specify the server parameters - address and port. Both applications establish a connection and start exchanging data. During file transfer, SpeedTest builds a graphical dependency and collects statistics about the time it took to copy data over the network. If you test several remote PCs, the program will add new curves to the plotted graph over and over again.

In addition, SpeedTest will check the speed of the Internet: in the "Web Page" mode, the program tests the connection to any site. This parameter can also be estimated by going to the specialized resource http://internet.yandex.ru.

Malfunctions of RAM may not appear immediately, but under certain loads. To be sure that the selected modules will not let you down in any situation, it is better to test them thoroughly and choose the fastest ones.

Meme Olympiad

Program: MaxxMEM2 - PreView Website: www.maxxpi.net This program is designed to test the speed of memory. In a very short period, it performs several tests: it measures the time of copying data into RAM, determines the speed of reading and writing data, and shows the memory latency parameter. In the settings of the utility, you can set the priority of the test, and the result can be compared with the actual values \u200b\u200bobtained by other users. From the program menu, you can quickly go to online statistics on the official MaxxMEM2 website and find out which memory is the most productive.

Speed \u200b\u200bis not important for sound

When testing most devices, processing speed is usually important. But with regard to a sound card, this is not the main indicator. It is much more important for the user to check the characteristics of the analog and digital audio paths - to find out how much the sound is distorted during playback and recording, to measure the noise level, etc.

Comparison with the reference

Program: RightMark Audio Analyzer 6.2.3 Website: http://audio.rightmark.org The creators of this utility offer several ways to check audio performance. The first option is sound card self-diagnosis. The device reproduces the test signal through the audio path and immediately records it. The waveform of the received signal should ideally match the original. Deviations indicate that the sound is distorted by the audio card installed in your PC.

The second and third test methods are more accurate - using a reference signal generator or using an additional sound card. In both cases, the quality of the signal source is taken as a standard, although additional devices also introduce a certain error. When using a second audio card, the signal distortion factor at the output should be minimal - the device should have better characteristics than the sound card under test. At the end of the test, you can also determine such parameters as the frequency response of the audio card, its noise level, the generated harmonic distortion, etc.

In addition to the basic functions available in the free edition, the more powerful version of RightMark Audio Analyzer 6.2.3 PRO also includes support for the professional ASIO interface, four times more detailed spectrum resolution and the ability to use Kernel Streaming direct data transmission.

It is important that no one interferes

When running any performance test, keep in mind that many factors affect the final results, especially the operation of background services and applications. Therefore, for the most accurate assessment of the PC, it is recommended to first disable the anti-virus scanner and close all running applications, up to the mail client. And, of course, to avoid errors in measurements, you should stop all work until the program has finished testing the equipment.

Good day!

Good day, nothing boded bad. But then a problem came - the speed of some application became unacceptably low, and even a week / month / day ago everything was fine. It must be solved quickly, spending as little time as possible. Problem server on windows based Server 2003 or later.

I hope the following writings will be quite short and understandable, and also useful for both novice administrators and more serious comrades, because you can always find something new for yourself. Don't rush to investigate the behavior of the application right away First of all, you should see if the server performance is sufficient at the moment? Are there any bottlenecks limiting its performance?

This is where perfmon is a powerful tool that comes with Windows. Let's start by defining a "bottleneck" - a resource that has reached its limit of use. They usually arise from improper resource scheduling, hardware problems, or incorrect application behavior.

If we open perfmon, we will see dozens and hundreds of all kinds of sensors, and the number of them does not contribute to a quick investigation of this problem. So to begin with, let's highlight 5 main possible bottlenecks to narrow down the list of probes under investigation.

These will be the processor, RAM, storage system (HDD / SSD), network and processes. Next, we will consider each of these points, which sensors we need and the threshold values \u200b\u200bfor them.

CPU

An overloaded processor is clearly not conducive to fast application performance. To study its resources, we will highlight only 4 sensors:

Processor \\% Processor Time

Measures the ratio of processor uptime to idle time as a percentage. The most understandable sensor, CPU load. MS recommends changing the processor to a faster one if the value is above 85%. But it depends on many factors, you need to know your needs and characteristics, since this value can vary.

Processor \\% User Time

Shows how much time the processor spends in user space. If the value is large, it means that applications take up a lot of processor time, it is worth taking a look at them, since the need to optimize them is imminent.

Processor \\% Interrupt Time

Measures the time the processor spends waiting for an interrupt response. This sensor can indicate the presence of "iron" problems. MS recommends starting to worry if this value is over 15%. This means that some device is starting to respond very slowly to requests and should be checked.

System \\ Processor Queue Length

Shows the number of threads in the queue waiting for their execution time. MS recommends thinking about changing the processor to one with a larger number of cores, if this value exceeds the number of cores multiplied by two.

RAM

Lack of RAM can severely affect overall system performance, forcing the system to actively use a slow HDD for swaping. But even if there seems to be a lot of RAM installed on the server, the memory can "leak". A memory leak is an uncontrolled process of decreasing the amount of free memory associated with errors in programs. It's also worth mentioning that for Windows, the amount of virtual memory is the sum of RAM and the paging file.

Memory \\% Committed Bytes in Use

Shows the virtual memory usage. If the value has exceeded 80%, then you should think about adding RAM.

Memory \\ Available Mbytes

Shows the RAM usage, namely the number of available megabytes. If the value is less than 5%, then again you should think about adding RAM.

Memory \\ Free System Page Table Entries

The number of free elements in the page table. And it is limited, in addition, these days, pages of 2 MB or more are gaining popularity, instead of the classic 4kB, which does not contribute to their large number. A value of less than 5000 may indicate a memory leak.

Memory \\ Pool Non-Paged Bytes

The size of this pool. This is a piece of kernel memory that contains important data and cannot be swap out. If the value has exceeded 175 MB, then it is most likely a memory leak. This is usually accompanied by the appearance of 2019 events in the system log.

Memory \\ Pool Paged Bytes

Similar to the previous one, but this area can be swapped to disk (swap) if they are not used. For this counter, values \u200b\u200babove 250 MB are considered critical, usually accompanied by the appearance of 2020 events in the system log. Also talks about a memory leak.

Memory \\ Pages per Second

The number of requests (read / write) to the page file per second due to the lack of necessary data in RAM. Again, a value over 1000 hints at a memory leak.

HDD

An important enough element that can make a significant contribution to system performance.

LogicalDisk \\% Free Space

The percentage of free space. Only sections containing system files - OS, swap file / files, etc. MS recommends taking care of increasing the disk space if the free space is less than 15%, since under critical loads it can suddenly run out (temp files, windows updates or the same swap file). But, as they say, “it depends” and you need to look at the really available size of the space, because the same paging file can be hard-coded, temp "s have quotas that prevent them from growing, and updates are distributed in portions and rarely, or none at all.

PhysicalDisk \\% Idle Time

Shows how long the disk has been idle. It is recommended to replace the disk with a more efficient one if this counter is below 20% of the limit.

PhysicalDisk \\ Avg. Disk Sec / Read

The average time it takes for a hard drive to read data from itself. Above 25ms is already bad, for SQL Server and Exchange 10ms or less is recommended. The recommendation is identical to the previous one.

PhysicalDisk \\ Avg. Disk Sec / Write

Identical to PhysicalDisk \\ Avg. Disk Sec / Read, write-only. The critical threshold is also 25ms.

PhysicalDisk \\ Avg. Disk Queue Length

Shows the average number of I / O operations waiting when hDD will become available to them. It is recommended to start worrying if this number is twice the number of spindles in the system (in the absence of raid arrays, the number of spindles is equal to the number of hard disks). The advice is the same - a more efficient HDD.

Memory \\ Cache Bytes

The amount of memory used for cache, some of which is file-based. A volume of more than 300MB may indicate a problem with HDD performance or the presence of an application that actively uses the cache.

Network

In the modern world, there is nowhere without it - a huge amount of data is broadcast over the network.

Network Interface \\ Bytes Total / Sec

The amount of data sent (send / receive) through the network adapter. A value greater than 70% of the interface bandwidth indicates a possible problem. You need to either replace the card with a more productive one, or add another one to unload the first.

Network Interface \\ Output Queue Length

Shows the number of packets waiting to be sent. If the value has exceeded 2, then you should think about replacing the card with a more productive one.

Processes

Server performance can degrade catastrophically if there is an unoptimized application or the application starts to behave "incorrectly".

Process \\ Handle Count

The number of descriptors processed by the process. These can be both files and registry keys. The number of these in excess of 10,000 may indicate that the application is not working properly.

Process \\ Thread Count

The number of threads within the process. It is worth taking a closer look at the behavior of the application if the difference between the minimum and maximum number of them exceeds 500.

Process \\ Private Bytes

Shows the amount of memory allocated by a process that cannot be allocated to other processes. If the fluctuation of this indicator exceeds 250 between the minimum and maximum, then this indicates a possible memory leak.

Most of the above counters do not have a clear indication that a bottleneck has appeared in the system. All the given values \u200b\u200bwere based on average statistical results and can vary for different systems in a wide range. In order to use these counters correctly, we must know at least the indicators of the system during its normal operation. This is called baseline performance - a perfmon log taken from a running freshly installed system (the latter is optional, it's never too late to remove this log or keep track of baseline performance changes in the long term) of a system that has no problems. This is a rather important point, often overlooked by many, although in the future it can seriously reduce the possible downtime of the system and explicitly speed up the analysis of the data obtained from the above counters.

Taken from https: //ru.intel.com/business/community/? Automodule \u003d blog & blogid \u003d 57161 & sh ...

0 0

Recently, various it bloggers have gained popularity. And with all due respect to them - obeying the profit on HYIPs, they created a lot of strange information that various users use, absolutely not understanding what they are talking about.

But really simple information is nowhere to be found - or a lot of superfluous (for ordinary mortals) is written in complex language. Or some grains slip through, which have to be collected from foreign forums, resources, etc.

Therefore, I decided to make a series of blogs about how games generally work, what affects what, what is "video card disclosure", etc. And describe it in the most simple and accessible language.

A.1. "How does it work? Simpler, please!"

In order that in the future even simple things do not seem to be "Chinese literacy" - let's figure out what a "game" is and how the device shows us.

The game is essentially a 3D application. And like any application, it is initially assembled from different "pieces", like "Lego".

By analogy we get:

1) The processor is the main brain, builds vertices, calculates physics (in most cases). To make an analogy, this is the one who reads the assembly instructions

2) a video card - it covers with textures, hangs effects, "makes it beautifully" - by analogy, this is the one who collects according to dictated instructions.

3) hard disk - stores the game files directly - by analogy, this is a box with a designer.

4) Random access memory and video memory - stores frequently accessed data, video memory - stores textures. These are pieces of the constructor that you pull out and throw next to you so as not to reach into the box.

As we can see, every component of our device, be it a PC, console or even a smartphone, performs certain actions to launch our game. This, of course, is the most primitive representation, but this is already enough to understand how it works.

A.2. Will the processor reveal the graphics card?

There were a lot of conversations on this topic. Much has been said about whether such a concept exists at all. In my opinion, yes, it exists, in a sense.

There is such a concept - "bottleneck". To put it simply, someone does something slowly, because of this, the whole process is stalled. If we return to our analogies, then this is either the instruction is read slowly, or the baby video card does not have time to make the "bricks", or, even, the parts were simply put too far away and you have to go after them.

Now let's figure out how the processor and the video card are "friends" and who reveals whom?

Situation 1. Bottleneck - video card:

As a result, we get 15 frames per second on the screen. In this case, the video card will work at full, the processor will work at half. This is the most ideal option, in which case they say that "the processor opens the video card completely." During the game, the processor also needs to process various programs of the system itself, make sure that Skype \\ Viber \\ Teamspeak and much more work. Therefore, a small "margin" of the processor should remain.

What does it give us? In the case of a PC, we can lower the graphics settings so that the video card can make more "machines" -frames. This will give us more FPS in the game.

There is also a reverse option:

Here we also get 15 frames. At the same time, the processor is working at full capacity, and the video card is "idle" (resting). In this case, they say that the processor does not reveal the video card.

What does it give us? In this situation, we will not be able to "jump over our heads." We will not be able to see more FPS than the processor gives us. But, since our video card is resting, we can make it assemble not from ordinary plastic bricks, but from metal ones with drawings and rhinestones. If, according to the game settings, we can set a higher resolution, better effects, more progressive anti-aliasing - until the moment when the card starts working at 100%, producing the same 15 frames.

p2.1 So how do you understand which processor and video card to take?

The internet is full of hardware tests. When they test a video card, they create ideal conditions for it, so that it would give all the best in any case. When testing processors, they do the same.

what do we need for the game to run in 60 frames without problems? Let's take a look at the Witcher3 as an example. it was tested for everything that is possible.

We need to determine which processor will allow us to see 60 frames in the game. At the same time, on the good, we need to take with a margin, so that the processor has something to process background tasks.

As we can see, even Phenom2 is enough for this game. And with it we will see 60 frames in this game, if the video card does not become the bottleneck. Actually, let's see which card suits us:

What do we see? To play at the maximum settings of 60fps, we need a 980 or higher card.

And now the most interesting thing - it is in this game, exactly at these settings, the 980th card will be revealed by an old hair dryer. Thus, when asking the question "will my processor open such and such a video card" just look at what FPS your processor shows in the games you are interested in. And then look at what FPS the video card can give.

In the second part I plan to talk about the hard drive, ssd, rAM and video memory (their effect on the game.)

P.s. Thank you. what you read. This is my first blog post, so constructive criticism is welcome. If you find any inaccuracies, errors, etc. - write in the comments, I will fix it.

The theory of system constraints was formulated in the 80s of the twentieth century. and concerned the management of industrial enterprises. Briefly, its essence boils down to the fact that in every production system there are constraints that restrain efficiency. If you remove a key limitation, the system will work much more efficiently than if you try to influence the entire system at once. Therefore, the process of improving production should start with removing bottlenecks.

Now the term bottleneck can be used in any industry - services, software development, logistics, everyday life.

What is bottleneck

The definition of bottleneck sounds like a place in a production system where congestion occurs because the flow of materials comes in too quickly but cannot be recycled as quickly. This is often a station with less power than the previous node. The term originated from the analogy with the narrow neck of the bottle that slows down the path of the liquid outward.

Bottleneck is a bottleneck in the manufacturing process

In manufacturing, the bottleneck effect causes downtime and production costs, reduces overall efficiency and increases shipping times to customers.

There are two types of bottlenecks:

Short term bottlenecks- caused by temporary problems. A good example is sick leave or vacation of key employees. No one on the team can fully replace them, and work stops. In production, this may be a breakdown of one of a group of machines when its load is distributed between the working equipment.
Long term bottlenecks - operate constantly. For example, a constant delay in monthly reports in a company due to the fact that one person has to process a huge amount of information that will come to him as an avalanche at the very end of the month.

How to identify bottleneck in a manufacturing process

There are several ways to find a bottleneck in production of different levels of complexity, with the use of special tools and without. Let's start with simpler observational methods.

Queues and congestion

The process on a production line that picks up the largest line of work-in-progress items in front of it is usually a bottleneck. This way of finding bottleneck is suitable for piece conveyor production such as bottling lines. You can clearly see where bottles accumulate in the line, and which mechanism has insufficient power, often breaks or is serviced by an inexperienced operator. If there are several congestion points on the line, then the situation is more complicated, and additional methods must be used to find the most critical bottleneck.

Bandwidth

The throughput of the entire production line directly depends on the output of the bottleneck equipment. This characteristic will help you find the main bottleneck of the manufacturing process. Increasing the output of a piece of equipment that is not a bottleneck will not significantly affect the overall output of the line. By checking all the equipment one by one, it is possible to identify a bottleneck - that is, the step whose increase in power will most of all affect the output of the entire process.

Full power

Most production lines track the percentage of utilization of each piece of equipment. The machines and stations have a fixed capacity and are used in the production process at a certain percentage of the maximum capacity. The station that uses the maximum power is bottleneck. Such equipment limits the percentage of power utilization of other equipment. If you increase the bottleneck capacity, then the capacity of the whole line will increase.

Expectation

The manufacturing process also takes into account downtime and waiting times. When there is a bottleneck on the line, the equipment that goes directly to it is idle for a long time. Bottleneck delays production and the next machine is not getting enough material to run continuously. When you find a machine with a long wait time, look for the bottleneck in the previous step.

In addition to monitoring production, the following tools are used to identify bottlenecks:

Value Stream Mapping - value stream creation map

Once you figure out the cause or causes of bottlenecks, action needs to be taken to expand the bottleneck and increase production. You may need to move employees to a problem area or hire additional personnel and purchase equipment.

Bottlenecks can occur where operators reconfigure equipment to produce a different product. In this case, you need to think about how to reduce downtime. For example, change the production schedule to reduce the number of changeovers or reduce their impact.

How to reduce the impact of bottlenecks

Bottleneck management suggests that manufacturing companies use three approaches to reduce the impact of bottlenecks.

First approach

Increasing the capacity of existing bottlenecks.

There are several ways to increase the capacity of bottlenecks:

Add resources to the bounding process. You don't have to hire new employees. Cross-functional training of personnel can reduce the impact of bottlenecks with little cost. In this case, workers will serve several stations at once and facilitate the passage of bottlenecks.
Ensure smooth flow of parts to the bottleneck. Always keep an eye on the work in progress ahead of the bottleneck, manage the supply of resources to the bottleneck station, consider the overtimes during which the equipment must also always have parts to process.
Make sure the bottleneck only works with quality parts. Don't waste the power and time of the bottleneck processing scrap. Locate QC points in front of bottleneck stations. This will increase the throughput of the process.
Check the production schedule. If there are several different products in the process that require different bottleneck run times, adjust the production schedule so that the total bottleneck demand decreases
Increase the operating time of the limiting equipment. Let the bottleneck last longer than other equipment. Designate an operator to handle the process during lunch breaks, planned downtime, and overtime if needed. While this method will not decrease cycle time, it will keep the bottleneck running while the rest of the equipment is idle.
Reduce downtime. Avoid planned and unplanned downtime. If the bottleneck equipment breaks down during the working process, immediately send a repair team to get it repaired and up and running. Also try to shorten the changeover time from one product to another.
Refine the process precisely in the bottleneck. Use VSM to eliminate non-value-added activities and reduce time to add value by eliminating waste. You end up with a shorter cycle time.
Redistribute the load on the bottleneck. If possible, split the operation into parts and assign them to other resources. The end result is shorter cycle times and more power.

Second approach

Selling surplus produced by non-bottleneck equipment.

For example, you have 20 injection presses on your line and you only use 12 of them, because bottleneck equipment cannot process the output of all 20 presses. In this case, you can find other companies that are interested in subcontracting injection molding operations. You will be in profit because you get more from the subcontractors than your variable costs.

Third approach

Reduction of unused power.

The third option for optimizing production is to sell equipment with extra capacity and reduce or relocate the personnel that maintains it. In this case, the capacity of all equipment will be equalized.

Bottleneck examples out of production

Transport

A classic example is traffic jams, which can constantly form in certain places, or appear temporarily during an accident or road works. Other examples are a sluice on a river, a forklift, a railway platform.

Computer networks

A slow WiFi router connected to an efficient high bandwidth network is a bottleneck.

Communication

A developer who spends six hours a day in meetings and only writes code for two hours.

Software

The application also has bottlenecks - these are the code elements on which the program "slows down", making the user wait.

Computer hardware

Computer bottlenecks are hardware constraints that limit the power of an entire system to a single component. The processor is often seen as the limiting component of the graphics card.

Bureaucracy

In everyday life, we often encounter bottlenecks. For example, when passports or driver's licenses suddenly run out and the whole system stops. Or when you need to undergo a medical examination, and the fluorography office works only three hours a day.

Verdict

Bottlenecks in production, management and life are points of potential improvement.

The bottleneck expansion will provide tangible performance and efficiency gains.

And not paying attention to the limiting elements of the system means getting less profit and working worse than your capabilities.