How to track GPU usage in Windows Task Manager. Graphics processor: what is it and why is it used? How it works

17.09.2020

Archivers

The integrated GPU plays an important role for both gamers and undemanding users.

The quality of games, movies, watching videos on the Internet and images depends on it.

Principle of operation

The graphics processor is integrated into the motherboard of the computer - this is how the integrated graphics looks like.

As a rule, they use it to remove the need to install a graphics adapter -.

This technology helps to reduce the cost of the finished product. In addition, due to their compactness and low power consumption, such processors are often installed in laptops and low-power desktop computers.

Thus, integrated GPUs have filled this niche so much that 90% of laptops on US store shelves have such a processor.

Instead of a conventional video card, the computer's RAM itself is often an auxiliary tool in integrated graphics.

However, this solution somewhat limits the performance of the device. Yet the computer itself and the GPU use the same bus for memory.

So this “neighborhood” affects the performance of tasks, especially when working with complex graphics and during gameplay.

Kinds

Built-in graphics have three groups:

Shared memory graphics are a device based on shared memory management with the main processor. This significantly reduces cost, improves energy efficiency, but degrades performance. Accordingly, for those working with complex programs, this kind of integrated GPU is more likely to be unsuitable.
Discrete graphics - a video chip and one or two video memory modules are soldered on the motherboard. This technology significantly improves image quality and makes it possible to work with 3D graphics with the best results. True, you will have to pay a lot for this, and if you are looking for a high-power processor in all respects, then the cost can be incredibly high. In addition, the electricity bill will rise slightly - the power consumption of discrete GPUs is higher than usual.
Hybrid discrete graphics - a combination of the two previous types, which provided the creation of the PCI Express bus. Thus, access to the memory is carried out both through the unsoldered video memory, and through the operational one. With this solution, manufacturers wanted to create a compromise solution, but it still does not level the disadvantages.

Manufacturers

As a rule, large companies are engaged in the manufacture and development of integrated graphics processors, and, but many small enterprises are also involved in this area.

This is not difficult to do. Find Primary Display or Init Display First. If you don't see something like that, look for Onboard, PCI, AGP or PCI-E (it all depends on the buses installed on the motherboard).

Choosing PCI-E, for example, you enable the PCI-Express video card, and disable the built-in integrated one.

Thus, to enable the integrated video card, you need to find the appropriate parameters in the BIOS. The start-up process is often automatic.

Disable

Disabling is best done in BIOS. This is the simplest and most unpretentious option, suitable for almost all PCs. The only exceptions are some laptops.

Again, find Peripherals or Integrated Peripherals in BIOS if you are working on a desktop.

For laptops, the name of the function is different, and not always the same. So just find something related to graphics. For example, the required options can be placed in the Advanced and Config sections.

Disconnection is also done in different ways. Sometimes it's enough just to click “Disabled” and put the PCI-E video card first in the list.

If you are a laptop user, do not be alarmed if you cannot find a suitable option, you may not have such a function a priori. For all other devices, the same rules are simple - no matter how the BIOS itself looks, the filling is the same.

If you have two video cards and both of them are shown in the device manager, then the matter is quite simple: click on one of them with the right side of the mouse and select “disable”. However, keep in mind that the display may go out. Most likely it will.

However, this is a solvable problem. It is enough to restart the computer or software.

Perform all subsequent settings on it. If this method does not work, roll back your actions using safe mode... You can also resort to the previous method - through the BIOS.

Two programs - NVIDIA Control Center and Catalyst Control Center - configure the use of a specific video adapter.

They are the most unpretentious in comparison with the other two methods - the screen is unlikely to turn off, through the BIOS you also will not accidentally lose the settings.

For NVIDIA, all settings are in the 3D section.

You can choose your preferred video adapter for the entire operating system and for certain programs and games.

In Catalyst software, the same function is located in the Power option under the Switchable Graphics sub-item.

Thus, switching between GPUs is not difficult.

There are different methods, in particular, both through programs and through BIOS. Enabling or disabling one or another integrated graphics may be accompanied by some failures, mainly related to the image.

It may go out or just appear distortion. Nothing should affect the files themselves in the computer, unless you have inserted something in the BIOS.

Conclusion

As a result, integrated graphics processors are in demand due to their low cost and compactness.

For this you will have to pay with the level of performance of the computer itself.

In some cases, integrated graphics are essential - discrete processors are ideal for working with 3D images.

Plus, the industry leaders are Intel, AMD and Nvidia. Each of them offers its own graphics accelerators, processors and other components.

The latest popular models are Intel HD Graphics 530 and AMD A10-7850K. They are quite functional, but they have some flaws. In particular, this applies to capacity, productivity and cost of the finished product.

You can enable or disable a graphics processor with an embedded kernel or independently through BIOS, utilities and various programs, but the computer itself may well do it for you. It all depends on which video card is connected to the monitor itself.

Modern devices use a graphics processor, which is also referred to as GPU. What is it and what is its principle of operation? GPU (Graphics is a processor whose main task is to process graphics and floating point calculations. The GPU facilitates the work of the main processor when it comes to heavy games and applications with 3D graphics.

What is it?

The GPU creates graphics, textures, colors. A processor that has multiple cores can run at high speeds. The graphics have a lot of cores that operate primarily at low speeds. They do pixel and vertex calculations. The latter are mainly processed in the coordinate system. The graphics processor handles various tasks, creating a three-dimensional space on the screen, that is, objects in it move.

Principle of operation

What does a GPU do? He is involved in 2D and 3D graphics processing. Thanks to the GPU, the computer can complete important tasks faster and easier. The peculiarity of the GPU is that it increases the calculation speed at the maximum level. Its architecture is designed in such a way that it allows processing visual information more efficiently than the central CPU of a computer.

He is responsible for positioning 3D models in the frame. In addition, each of the processor filters the triangles included in it. It determines which ones are in sight, removes those that are hidden behind other objects. Draws light sources, determines how these light sources affect color. The graphics processor (what it is - is described in the article) creates an image, displays it to the user on the screen.

Efficiency

What makes the GPU work efficiently? Temperature. One of the problems with PCs and laptops is overheating. This is what becomes the main reason why the device and its elements quickly fail. GPU problems start when the CPU temperature exceeds 65 ° C. In this case, users notice that the processor starts to work weaker, skips clock cycles in order to lower the increased temperature on its own.

Temperature condition 65-80 ° С - critical. In this case, a system reboot (emergency) starts, the computer turns off by itself. It is important for the user to monitor that the temperature of the GPU does not exceed 50 ° C. It is considered normal t 30-35 ° С in idle time, 40-45 ° С with many hours of load. The lower the temperature, the better the performance of the computer. The motherboard, video card, case and hard drives have their own temperature conditions.

But many users are also worried about how to reduce the temperature of the processor in order to increase its efficiency. First you need to find out the cause of overheating. This may be a clogged cooling system, dried thermal grease, malware, processor overclocking, raw BIOS firmware. The simplest thing a user can do is replace the thermal grease found on the processor itself. In addition, you need to clean the cooling system. Experts also advise installing a powerful cooler, improving air circulation in the system unit, increasing the rotation speed of the cooler's graphics adapter. All computers and GPUs have the same temperature reduction scheme. It is important to monitor the device and clean it in time.

Specificity

The GPU is located on the video card, its main task is to handle 2D and 3D graphics. If a GPU is installed on the computer, then the device's processor does not perform unnecessary work, therefore it functions faster. The main feature of the graphic is that its main goal is to increase the speed of calculating objects and textures, that is, graphic information. The processor architecture allows them to work much more efficiently and process visual information. An ordinary processor cannot do this.

Kinds

What is a GPU? This is a component included in the video card. There are several types of chips: embedded and discrete. Experts say that the second one copes with its task better. It is installed on separate modules, since it differs in its power, but it needs excellent cooling. Almost all computers have an integrated graphics processor. It is installed in the CPU to make the power consumption several times lower. It cannot be compared with discrete power, but it also has good characteristics and demonstrates good results.

Computer graphics

What's this? This is the name of the field of activity in which computer technology is used to create images and process visual information. Modern computer graphics, including scientific, allows you to graphically process the results, build diagrams, graphs, drawings, and also perform various kinds of virtual experiments.

Technical products are created using constructive graphics. There are other types of computer graphics:

animation;
multimedia;
artistic;
advertising;
illustrative.

From a technical point of view, computer graphics are two-dimensional and three-dimensional images.

CPU and GPU: the difference

What is the difference between these two designations? Many users are aware that the GPU (which is described above) and the video card perform different tasks. Moreover, they differ in their internal structure. Both the CPU and the GPU - which have many similarities, but are made for different purposes.

The CPU executes a specific chain of instructions in a short amount of time. It is made in such a way that it forms several chains at the same time, splits the stream of instructions into many, executes them, then merges them into one whole again in a specific order. The instruction in the thread is dependent on those that follow it, therefore the CPU contains a small number of execution units, here the main priority is given to the speed of execution, to reduce downtime. All of this is accomplished using a pipeline and cache memory.

The GPU has another important function - rendering visual effects and 3D graphics. It works easier: it receives polygons at the input, performs the necessary logical and mathematical operations, and outputs the pixel coordinates at the output. The GPU's job is handling a large stream of different tasks. Its peculiarity is that it is endowed with large but slow performance compared to the CPU. Besides, modern GPUs have more than 2000 execution units. They differ among themselves in the methods of access to memory. For example, graphics do not need large cached memory. The GPU has more bandwidth. In simple terms, the CPU makes decisions in accordance with the tasks of the program, and the GPU performs many of the same calculations.

Basic components of the video card:

outputs;
interfaces;
cooling system;
graphics processor;
video memory.

Graphic technologies:

dictionary;
gPU architecture: functions
vertex / pixel units, shaders, fill rate, texture / raster units, pipelines;
gPU architecture: technology
technical process, GPU frequency, local video memory (size, bus, type, frequency), solutions with several video cards;
visual functions
DirectX, high dynamic range (HDR), full screen anti-aliasing, texture filtering, high definition textures.

Glossary of basic graphic terms

Refresh Rate

Just like in a movie theater or TV, your computer simulates motion on a monitor by displaying a sequence of frames. The refresh rate of the monitor indicates how many times per second the picture will be refreshed on the screen. For example, 75 Hz corresponds to 75 updates per second.

If the computer is processing frames faster than the monitor can display, then games may have problems. For example, if the computer renders 100 frames per second, and the monitor refresh rate is 75 Hz, then due to overlays, the monitor can display only part of the picture during its refresh period. As a result, visual artifacts appear.

As a solution, you can enable V-Sync (vertical sync). It limits the number of frames emitted by the computer to the refresh rate of the monitor, preventing artifacts. If you enable V-Sync, then the number of frames rendered in the game will never exceed the refresh rate. That is, at 75 Hz, the computer will output no more than 75 frames per second.

Pixel

The word "Pixel" stands for " picture element "is an image element. It is a tiny dot on the display that can glow in a specific color (in most cases, a hue is derived from a combination of three basic colors: red, green, and blue). If the screen resolution is 1024 × 768, then you can see a matrix of 1024 pixels in width and 768 pixels in height. Together, the pixels make up the image. The picture on the screen is updated from 60 to 120 times per second, depending on the type of display and the data produced by the output of the video card. CRT monitors update the display line by line, while flat panel LCD monitors can update each pixel individually.

Vertex

All objects in the 3D scene are composed of vertices. A vertex is a point in three-dimensional space with coordinates X, Y and Z. Several vertices can be grouped into a polygon: most often it is a triangle, but more complex shapes are also possible. Then a texture is applied to the polygon, which makes the object look realistic. The 3D cube shown in the illustration above has eight vertices. More complex objects have curved surfaces, which actually consist of a very large number of vertices.

Texture

A texture is simply a 2D image of any size that is superimposed on a 3D object to simulate its surface. For example, our 3D cube has eight vertices. Before texture mapping, it looks like a simple box. But when we apply the texture, the box becomes colored.

Shader

The pixel shader software allows the graphics card to produce impressive effects, such as that water in Elder Scrolls: Oblivion.

Today there are two types of shaders: vertex and pixel. Vertex shaders can modify or transform 3D objects. Pixel shaders allow you to change the colors of pixels based on data. Imagine a light source in a 3D scene that makes the illuminated objects glow brighter, while casting shadows on other objects at the same time. All this is realized by changing the color information of the pixels.

Pixel shaders are used to create complex effects in your favorite games. For example, shader code can make the pixels surrounding the 3D sword glow brighter. Another shader can process all the vertices of a complex 3D object and simulate an explosion. Game developers are increasingly using sophisticated shaders to create realistic graphics. Almost every modern game with rich graphics uses shaders.

With the release of the next Microsoft DirectX 10 Application Programming Interface (API), a third type of shader called geometry shaders will be released. With their help, it will be possible to break objects, modify and even destroy them depending on the desired result. The third type of shader can be programmed in the same way as the first two, but its role will be different.

Fill Rate

Very often on the box with a video card, you can find the fill rate value. Basically, the fill rate indicates how fast the GPU can deliver pixels. In older video cards, you could find the triangle fill rate. But today there are two types of fill rates: pixel fill rate and texture fill rate. As mentioned, the pixel fill rate corresponds to the pixel output rate. It is calculated as the number of raster operations (ROP) multiplied by the clock frequency.

ATi and nVidia calculate texture fill rates differently. nVidia thinks that speed is obtained by multiplying the number of pixel pipelines by the clock speed. ATi multiplies the number of texture units by the clock speed. In principle, both methods are correct, since nVidia uses one texture unit per pixel shader unit (that is, one per pixel pipeline).

With these definitions in mind, let me move on and discuss the most important GPU functions, what they do, and why they are so important.

GPU architecture: features

The realism of 3D graphics is highly dependent on the performance of the video card. The more pixel shader blocks the processor contains and the higher the frequency, the more effects can be applied to a 3D scene to improve its visual perception.

The GPU contains many different functional blocks. By the number of some components, you can estimate how powerful the GPU is. Before moving on, let me review the most important functional blocks.

Vertex processors (vertex shader units)

Like pixel shader units, vertex processors execute shader code that touches vertices. Since a larger vertex budget allows you to create more complex 3D objects, the performance of vertex processors is very important in 3D scenes with complex objects or a large number of them. However, vertex shader units are still not as obviously affecting performance as pixel processors.

Pixel Processors (Pixel Shader Units)

A pixel processor is a component of the graphics chip dedicated to processing pixel shader programs. These processors perform pixel-only calculations. Because pixels contain color information, pixel shaders can achieve impressive graphical effects. For example, most of the water effects you've seen in games are created using pixel shaders. Typically, the number of pixel processors is used to compare the pixel performance of video cards. If one card is equipped with eight pixel shader units, and the other with 16 units, then it is quite logical to assume that a video card with 16 units will process complex pixel programs faster. The clock speed should also be considered, but today doubling the number of pixel processors is more energy efficient than doubling the frequency of the graphics chip.

Unified shaders

Unified (uniform) shaders have not yet arrived in the PC world, but the upcoming DirectX 10 standard relies on a similar architecture. That is, the structure of the code of vertex, geometric and pixel programs will be the same, although the shaders will do different jobs. The new spec can be viewed on the Xbox 360, where the GPU was specially designed by ATi for Microsoft. It will be quite interesting to see what potential the new DirectX 10 has.

Texture Mapping Units (TMU)

Textures should be selected and filtered. This work is done by the texture mapping units, which work in conjunction with the pixel and vertex shaders. The TMU's job is to apply texture operations to the pixels. The number of texture units in a GPU is often used to compare the texture performance of video cards. It is quite reasonable to assume that a video card with a higher number of TMUs will give higher texture performance.

Raster Operator Units (ROPs)

RIPs are responsible for writing pixel data into memory. The rate at which this operation is performed is the fill rate. In the early days of 3D accelerators, ROPs and fill rates were very important characteristics of graphics cards. Today, ROP performance is still important, but the performance of a video card is no longer limited by these blocks, as it was before. Therefore, the performance (and number) of ROPs is already rarely used to estimate the speed of a video card.

Conveyors

Pipelines are used to describe the architecture of video cards and provide a very visual representation of the performance of the GPU.

Conveyor is not a strict technical term. The GPU uses different pipelines to perform different functions. Historically, a pipeline was understood as a pixel processor that was connected to its own texture mapping unit (TMU). For example, the Radeon 9700 video card uses eight pixel processors, each of which is connected to its own TMU, therefore, the card is considered to have eight pipelines.

But it is very difficult to describe modern processors by the number of pipelines. Compared to previous designs, the new processors use a modular, fragmented structure. ATi can be considered an innovator in this area, which, with the X1000 line of video cards, switched to a modular structure, which allowed achieving performance gains through internal optimization. Some CPU blocks are used more than others, and to improve GPU performance, ATi has tried to balance the number of blocks needed and die area (this cannot be oversized). In this architecture, the term "pixel pipeline" has already lost its meaning, since pixel processors are no longer connected to their own TMUs. For example, the ATi Radeon X1600 GPU has 12 Pixel Shaders and a total of four TMUs. Therefore, one cannot say that the architecture of this processor has 12 pixel pipelines, just like say that there are only four of them. However, by tradition, pixel pipelines are still mentioned.

Taking these assumptions into account, the number of pixel pipelines in a GPU is often used to compare video cards (with the exception of the ATi X1x00 line). For example, if we take video cards with 24 and 16 pipelines, then it is quite reasonable to assume that a card with 24 pipelines will be faster.

GPU architecture: technology

Technical process

This term refers to the size of one element (transistor) of the chip and the precision of the manufacturing process. Improving technological processes allows you to get smaller elements. For example, the 0.18 micron process produces larger elements than the 0.13 micron process, so it is not as efficient. Smaller transistors operate on lower voltages. In turn, a decrease in voltage leads to a decrease in thermal resistance, which gives a decrease in the amount of heat generated. Improving the technical process allows to reduce the distance between the functional blocks of the chip, and data transfer takes less time. Shorter distances, lower voltages, and other improvements allow higher clock speeds to be achieved.

The understanding is somewhat complicated by the fact that today both micrometers (μm) and nanometers (nm) are used to denote the technical process. In fact, everything is very simple: 1 nanometer is equal to 0.001 micrometer, so 0.09-micron and 90-nm technical processes are one and the same. As noted above, a smaller process technology allows you to get higher clock speeds. For example, if we compare video cards with 0.18 micron and 0.09 micron (90 nm) chips, then it is quite reasonable to expect a higher frequency from a 90 nm card.

GPU clock speed

GPU clock speeds are measured in megahertz (MHz), which is millions of clock cycles per second.

The clock speed directly affects the performance of the GPU. The higher it is, the more work can be done in a second. For the first example, let's take nVidia GeForce 6600 and 6600 GT graphics cards: the 6600 GT GPU runs at 500 MHz, while the regular 6600 card runs at 400 MHz. Since the processors are technically identical, a 20% increase in the 6600 GT's clock speed translates into better performance.

But clock speed is not everything. It should be borne in mind that architecture greatly affects performance. For the second example, let's take the GeForce 6600 GT and GeForce 6800 GT video cards. The 6600 GT has a GPU frequency of 500 MHz, but the 6800 GT only runs at 350 MHz. Now let's take into account that the 6800 GT uses 16 pixel pipelines, while the 6600 GT uses only eight. Therefore, a 6800 GT with 16 pipelines at 350 MHz will give about the same performance as a processor with eight pipelines and twice the clock speed (700 MHz). With that said, the clock speed can be used to compare performance.

Local video memory

Video card memory has a huge impact on performance. But different memory parameters affect differently.

Video memory size

The amount of video memory can probably be called the most overrated parameter of a video card. Inexperienced consumers often use video memory for comparison different cards among themselves, but in reality, the volume has little effect on performance compared to such parameters as the memory bus frequency and interface (bus width).

In most cases, a card with 128 MB of video memory will perform almost the same as a card with 256 MB. Of course, there are situations where more memory leads to increased performance, but remember that more memory will not automatically lead to an increase in speed in games.

Where volume is useful is in games with high resolution textures. Game developers provide several texture sets for the game. And the more memory there is on the video card, the higher resolution the loaded textures can have. High resolution textures give higher clarity and detail in the game. Therefore, it is quite reasonable to take a card with a large amount of memory if all other criteria are the same. Let us remind you once again that the memory bus width and its frequency affect performance much more strongly than the amount of physical memory on the card.

Memory bus width

Memory bus width is one of the most important aspects of memory performance. Modern buses are 64 to 256 bits wide, and in some cases even 512 bits. The wider the memory bus, the more information it can transmit per clock cycle. And this directly affects performance. For example, if we take two buses with equal frequencies, then theoretically a 128-bit bus will transfer twice as much data per clock as a 64-bit bus. And the 256-bit bus is twice as large.

Higher bus bandwidth (expressed in bits or bytes per second, 1 byte \u003d 8 bits) results in higher memory performance. That is why the memory bus is much more important than its size. At equal frequencies, the 64-bit memory bus operates at a speed of only 25% of the 256-bit one!

Let's take the following example. A video card with 128 MB of video memory, but with a 256-bit bus, gives much higher memory performance than a 512 MB model with a 64-bit bus. It is important to note that for some ATi X1x00 cards the manufacturers indicate the specifications for the internal memory bus, but we are interested in the parameters of the external bus. For example, the X1600's internal ring bus is 256 bits wide, but the external one is only 128 bits wide. And in reality, the memory bus operates at 128-bit performance.

Memory types

Memory can be divided into two main categories: SDR (single data transfer) and DDR (double data transfer), in which data is transferred twice as fast per clock. Today, SDR single transmission technology is obsolete. Since DDR memory transfers data twice as fast as SDR memory, it is important to remember that video cards with DDR memory are usually indicated at twice the frequency, and not the physical one. For example, if DDR memory is listed as 1000 MHz, then this is the effective frequency that regular SDR memory must operate at to give the same bandwidth. In fact, the physical frequency is 500 MHz.

For this reason, many are surprised when the frequency of 1200 MHz DDR is indicated for the memory of their video card, and the utilities report 600 MHz. So you have to get used to it. DDR2 and GDDR3 / GDDR4 memory works in the same way, that is, with twice the data transfer. The difference between DDR, DDR2, GDDR3 and GDDR4 lies in the manufacturing technology and some details. DDR2 can run at higher frequencies than DDR memory, and DDR3 can run even higher than DDR2.

Memory bus frequency

Like a processor, memory (or, more accurately, the memory bus) operates at specific clock speeds, measured in megahertz. Here, increasing clock speeds directly affects memory performance. And the memory bus frequency is one of the parameters used to compare the performance of video cards. For example, if all other characteristics (memory bus width, etc.) are the same, then it is quite logical to say that a video card with 700 MHz memory is faster than a 500 MHz one.

Again, clock speed isn't everything. A 700 MHz memory with a 64-bit bus will be slower than a 400 MHz memory with a 128-bit bus. The performance of 400 MHz memory on a 128-bit bus is roughly equivalent to 800 MHz memory on a 64-bit bus. It should also be remembered that the frequencies of the GPU and memory are completely different parameters, and they usually differ.

Graphics card interface

All data transferred between the video card and the processor goes through the video card interface. Today, three types of interfaces are used for video cards: PCI, AGP and PCI Express. They differ in bandwidth and other characteristics. It is clear that the higher the bandwidth, the higher the exchange rate. However, only the most modern cards can use high bandwidth, and even then only partially. At some point, the interface speed has ceased to be a "bottleneck", today it is simply enough.

The slowest bus for which video cards were produced is PCI (Peripheral Components Interconnect). If you do not go into history, of course. PCI really hurt the performance of video cards, so they switched to the AGP (Accelerated Graphics Port) interface. But even the AGP 1.0 and 2x specifications limited performance. When the standard increased the speed to AGP 4x, we began to approach the practical limit of the bandwidth that video cards can use. The AGP 8x specification doubled the bandwidth once again compared to AGP 4x (2.16 GB / s), but we did not get a tangible increase in graphics performance.

The newest and fastest bus is PCI Express. Newer graphics cards typically use PCI Express x16, which combines 16 PCI Express lanes for a total bandwidth of 4 GB / s (one way). This is twice the bandwidth of AGP 8x. The PCI Express bus provides the mentioned bandwidth for both directions (data transfer to and from the video card). But the speed of the AGP 8x standard was already sufficient, so we have not yet encountered a situation where the transition to PCI Express gave a performance increase compared to AGP 8x (if other hardware parameters are the same). For example, the AGP version of the GeForce 6800 Ultra will work identically to the 6800 Ultra for PCI Express.

Today it is best to buy a card with a PCI Express interface, it will hold out on the market for several more years. The most productive cards are no longer available with the AGP 8x interface, and PCI Express solutions, as a rule, are easier to find than AGP analogs, and they cost less.

Multi-GPU solutions

Using multiple graphics cards to increase graphics performance is not a new idea. In the early days of 3D graphics, the 3dfx digger entered the market with two graphics cards running in parallel. But with the disappearance of 3dfx, the technology of collaboration of several consumer video cards was consigned to oblivion, although ATi produced similar systems for professional simulators since the release of the Radeon 9700. A couple of years ago the technology returned to the market: with the advent of nVidia SLI solutions and, a little later, ATi Crossfire.

Sharing multiple graphics cards provides enough performance to run the game at high quality settings in high definition. But choosing one solution or another is not so easy.

To begin with, solutions based on multiple video cards require a lot of energy, so the power supply must be powerful enough. All this heat will have to be removed from the video card, so you need to pay attention to the PC case and cooling so that the system does not overheat.

Also, remember that SLI / CrossFire requires an appropriate motherboard (either for one technology or another), which usually costs more than standard models. The nVidia SLI configuration will only work on certain nForce4 boards, and ATi CrossFire cards will only work on motherboards with a CrossFire chipset or on certain Intel models. To complicate matters further, some CrossFire configurations require one of the cards to be special: the CrossFire Edition. After the release of CrossFire for some models of video cards, ATi allowed the inclusion of the technology of collaboration via the PCI Express bus, and with the release of new driver versions, the number of possible combinations increases. Still, hardware CrossFire with a corresponding CrossFire Edition card gives better performance. But CrossFire Edition cards are also more expensive than regular models. For now, you can enable CrossFire software mode (no CrossFire Edition card) on Radeon X1300, X1600 and X1800 GTO graphics cards.

There are other factors to consider. While two graphics cards working together give a performance boost, it is far from doubled. But you will give twice as much money. Most often, the productivity gain is 20-60%. And in some cases, due to additional computational costs for reconciliation, there is no gain at all. For this reason, multi-card configurations are unlikely to justify themselves with cheaper models, as a more expensive video card usually always outperforms a couple of cheaper cards. In general, it makes no sense to take an SLI / CrossFire solution for most consumers. But if you want to enable all the quality enhancement options or play at extreme resolutions, for example, 2560 × 1600, when you need to render more than 4 million pixels per frame, then you can't do without two or four paired video cards.

Visual functions

In addition to purely hardware specifications, different generations and models of GPUs may differ in feature set. For example, it is often said that cards of the ATi Radeon X800 XT generation are compatible with Shader Model 2.0b (SM), while the nVidia GeForce 6800 Ultra is compatible with SM 3.0, although their hardware specifications are close to each other (16 pipelines). Therefore, many consumers make a choice in favor of one solution or another, without even knowing what this difference means.

Microsoft DirectX and Shader Model versions

These names are most often used in controversy, but few people know what they really mean. To understand, let's start with a history of graphics APIs. DirectX and OpenGL are graphical APIs, which are Application Programming Interfaces - open code standards available to everyone.

Before the advent of graphics APIs, each GPU manufacturer used its own mechanism for communicating with games. Developers had to write separate code for each GPU they wanted to support. A very expensive and ineffective approach. To solve this problem, APIs for 3D graphics were developed so that developers could write code for a specific API, and not for a particular video card. After that, compatibility problems fell on the shoulders of video card manufacturers, who had to ensure that the drivers would be API compatible.

The only complication is that today there are two different APIs, namely Microsoft DirectX and OpenGL, where GL stands for Graphics Library. Since the DirectX API is more popular in games today, we will focus on it. And this standard influenced the development of games more strongly.

DirectX is Microsoft's creation. In fact, DirectX includes several APIs, only one of which is used for 3D graphics. DirectX includes APIs for sound, music, input devices, and more. The Direct3D API is responsible for 3D graphics in DirectX. When they talk about video cards, they mean it, therefore, in this respect, the terms DirectX and Direct3D are interchangeable.

DirectX is updated periodically as graphics technology advances and game developers introduce new ways to program games. As the popularity of DirectX soared, GPU manufacturers began to tweak new product releases to match DirectX capabilities. For this reason, video cards are often tied to hardware support for one DirectX generation or another (DirectX 8, 9.0, or 9.0c).

To complicate matters, parts of the Direct3D API can change over time without changing DirectX generations. For example, the DirectX 9.0 specification specifies Pixel Shader 2.0 support. But the DirectX 9.0c update includes Pixel Shader 3.0. Thus, although the cards are classified as DirectX 9, they can support different sets of functions. For example, the Radeon 9700 supports Shader Model 2.0, and the Radeon X1800 supports Shader Model 3.0, although both cards can be attributed to the DirectX 9 generation.

Remember that when creating new games, developers take into account the owners of old machines and video cards, because if you ignore this segment of users, then the sales level will be lower. For this reason, several code paths are embedded in games. A game of the DirectX 9 class probably has a DirectX 8 path for compatibility, and even a DirectX 7 path. Usually, if the old path is chosen, some virtual effects that are on new video cards disappear in the game. But at least you can play even on the old hardware.

Many new games require the latest version of DirectX to be installed, even if the graphics card is from the previous generation. That is, a new game that will use the DirectX 8 path still requires the latest version of DirectX 9 for a DirectX 8 class video card to be installed.

What are the differences between the different versions of the Direct3D API in DirectX? Early versions of DirectX - 3, 5, 6, and 7 - were relatively straightforward in terms of the Direct3D APIs. Developers could choose visual effects from a list and then test their performance in the game. The next important step in graphics programming was DirectX 8. It introduced the ability to program a video card using shaders, so developers for the first time got the freedom to program effects the way they want. DirectX 8 supported Pixel Shader 1.0 to 1.3 and Vertex Shader 1.0. DirectX 8.1, an updated version of DirectX 8, received Pixel Shader 1.4 and Vertex Shader 1.1.

In DirectX 9, you can create even more complex shader programs. DirectX 9 supports Pixel Shader 2.0 and Vertex Shader 2.0. DirectX 9c, an updated version of DirectX 9, includes the Pixel Shader 3.0 specification.

DirectX 10, the upcoming version of the API, will accompany the new windows version Vista. Installing DirectX 10 on Windows XP will fail.

HDR lighting and OpenEXR HDR

HDR stands for High Dynamic Range, high dynamic range. Playing with HDR lighting can produce a much more realistic picture than playing without it, and not all graphics cards support HDR lighting.

Before the advent of DirectX 9 graphics cards, GPUs were seriously limited by the accuracy of lighting calculations. Until now, lighting could only be calculated with 256 (8 bit) internal levels.

When DirectX 9 graphics cards were introduced, they were able to deliver high-fidelity lighting - a full 24 bits or 16.7 million levels.

With 16.7 million levels and taking the next step in DirectX 9 / Shader Model 2.0 graphics performance, HDR lighting is now possible on computers. This is a rather complex technology, and you need to watch it in dynamics. In simple terms, HDR lighting increases contrast (dark shades appear darker, light shades brighter), while at the same time increasing the amount of lighting detail in dark and light areas. Playing with HDR lighting feels livelier and more realistic than without it.

GPUs compliant with the latest Pixel Shader 3.0 specification allow for higher 32-bit precision lighting and floating point blending. Thus, video cards of the SM 3.0 class can support the special OpenEXR HDR lighting method, specially designed for the film industry.

Some games that only support HDR lighting using OpenEXR will not support HDR lighting on Shader Model 2.0 graphics cards. However, games that do not rely on the OpenEXR method will run on any DirectX 9 graphics card. For example, Oblivion uses the OpenEXR HDR method and only allows HDR lighting on the latest graphics cards that support the Shader Model 3.0 specification. For example, nVidia GeForce 6800 or ATi Radeon X1800. Games that use the Half-Life 2 3D engine, the same Counter-Strike: Source and the upcoming Half-Life 2: Aftermath, allow you to enable HDR rendering on older DirectX 9 graphics cards that only support Pixel Shader 2.0. Examples include the GeForce 5 or ATi Radeon 9500 series.

Finally, keep in mind that all forms of HDR rendering require significant processing power and can bring even the most powerful GPUs to their knees. If you want to play the latest games with HDR lighting, then high-performance graphics are essential.

Full screen anti-aliasing

Full-screen anti-aliasing (AA abbreviated) allows you to eliminate the characteristic "ladders" at the boundaries of polygons. However, it should be borne in mind that full-screen anti-aliasing consumes a lot of computing resources, which leads to a drop in frame rates.

Anti-aliasing is highly dependent on video memory performance, so a high-speed video card with fast memory will be able to render full-screen anti-aliasing with less damage to performance than an inexpensive video card. Anti-aliasing can be enabled in various modes. For example, 4x anti-aliasing will give a better picture than 2x anti-aliasing, but this will be a big performance hit. If 2x anti-aliasing doubles the horizontal and vertical resolution, 4x mode quadruples it.

Texture filtering

Textures are applied to all 3D objects in the game, and the larger the angle of the displayed surface, the more distorted the texture will look. To eliminate this effect, GPUs use texture filtering.

The first filtration method was called bilinear and produced characteristic stripes that were not very pleasant to the eye. The situation has improved with the introduction of trilinear filtering. Both options work on modern graphics cards with little or no performance loss.

Today the most the best way filtering textures is anisotropic filtering (AF). Like full-screen anti-aliasing, anisotropic filtering can be enabled at different levels. For example, 8x AF gives a better filtering quality than 4x AF. Like full-screen anti-aliasing, anisotropic filtering requires a certain amount of processing power, which increases as the AF level rises.

High resolution textures

All 3D games are built with specific specifications in mind, and one such requirement determines the texture memory that a game will need. All necessary textures must fit into the memory of the video card during the game, otherwise the performance will drop dramatically, since rAM gives a considerable latency, not to mention the paging file on the hard disk. Therefore, if a game developer counts on 128 MB of video memory as the minimum requirement, then the set of active textures should not exceed 128 MB at any time.

Modern games have several sets of textures, so the game will work without problems on older video cards with less video memory, as well as on newer cards with more video memory. For example, a game may contain three sets of textures: 128 MB, 256 MB, and 512 MB. There are very few games that support 512 MB of video memory today, but they are still the most objective reason for buying a video card with this amount of memory. While the increase in memory has little or no effect on performance, you will get better visual quality if the game supports the appropriate texture set.

What you need to know about video cards?

In contact with

In 2016, hopes for a full-fledged generational change in GPUs, which were previously hampered by the lack of manufacturing capabilities necessary to release chips with significantly higher transistor density and clock speeds, than the proven 28 nm process technology allowed. The 20nm technology we hoped for two years ago has proven to be commercially disadvantageous for chips as large as discrete GPUs. Since TSMC and Samsung, which could be contractors for AMD and NVIDIA, did not use FinFET at 20 nm, the potential increase in performance per watt compared to 28 nm was such that both companies chose to wait for the mass adoption of 14 / 16- nm norms, already using FinFET.

However, the years of painful waiting have passed, and now we can evaluate how GPU manufacturers have used the capabilities of the updated technical process. As practice has shown once again, “nanometers” by themselves do not guarantee high energy efficiency of the chip, therefore the new architectures from NVIDIA and AMD turned out to be very different in this parameter. And an additional intrigue was introduced by the fact that companies no longer use the services of one factory (TSMC), as they did in previous years. AMD has chosen GlobalFoundries for its 14nm FinFET-based Polaris GPUs. NVIDIA, on the other hand, is still partnering with TSMC, which has a 16nm FinFET process, on all Pascal chips except the junior GP107 (which Samsung makes). It was the Samsung 14nm FinFET line that was licensed by GlobalFoundries at one time, so the GP107 and its rival Polaris 11 give us a convenient opportunity to compare the engineering achievements of AMD and NVIDIA on a similar manufacturing base.

However, let's not dive into the technical details prematurely. In general, the proposals of both companies based on the next generation GPU are as follows. NVIDIA has created a complete line of Pascal accelerators based on three consumer GPUs - the GP107, GP106, and GP104. However, the place of the flagship adapter, which will surely be named GeForce GTX 1080 Ti, is now vacant. A candidate for this position is a card with a GP102 processor, which so far is used only in the "pro-sumer" NVIDIA TITAN X accelerator. And finally, the main pride of NVIDIA is the GP100 chip, which the company, apparently, is not even going to implement in gaming products and left for Tesla computing accelerators.

AMD's success so far is more modest. Two processors of the Polaris family have been released, products based on which belong to the lower and middle categories of gaming graphics cards. The upper echelons will be occupied by the upcoming Vega GPUs, which are expected to feature a comprehensively redesigned GCN architecture (while Polaris does not differ significantly from the 28nm Fiji and Tonga chips in this regard).

⇡ NVIDIA Tesla P100 and the new TITAN X

Through the efforts of Jensen Huang, NVIDIA's permanent CEO, the company is already positioning itself as a maker of general-purpose computing processors, no less than a maker of gaming GPUs. A signal that NVIDIA is taking the supercomputing business more seriously than ever was the division of the Pascal GPU line into gaming positions on the one hand and computing positions on the other.

Once the 16nm FinFET process hit TSMC, NVIDIA pioneered the GP100 supercomputer chip, which debuted earlier than the consumer Pascal product line.

The GP100 is distinguished by an unprecedented number of transistors (15.3 billion) and shader ALUs (3840 CUDA cores). In addition, it is the first accelerator to be equipped with HBM2 memory (16 GB) combined with a GPU on a silicon substrate. The GP100 is used as part of the Tesla P100 accelerators, initially limited to the supercomputer field due to a special form factor with the NVLINK bus, but later NVIDIA released the Tesla P100 in a standard PCI Express expansion card format.

Experts initially speculated that the P100 could appear in gaming graphics cards. NVIDIA apparently did not deny this possibility, because the chip has a full pipeline for rendering 3D graphics. But now it is clear that it is unlikely to ever go beyond the computing niche. For graphics, NVIDIA has a related product - the GP102, which has the same set of shader ALUs, texture mapping units and ROPs as the GP100, but lacks the ballast in the form of a large number of 64-bit CUDA cores, not to mention other architectural changes (less schedulers, stripped down L2 cache, etc.). The result is a more compact (12 billion transistors) core, which, combined with the abandonment of HBM2 memory in favor of GDDR5X, allowed NVIDIA to expand the GP102 to a wider market.

Now GP102 is reserved for the pro-sumer TITAN X accelerator (not to be confused with the GeForce GTX TITAN X based on the GM200 chip of the Maxwell architecture), which is positioned as a board for low-precision calculations (in the range from 8 to 32 bits, among which 8 and 16 are NVIDIA's favorite deep training) even more than for games, although wealthy gamers can purchase a video card for $ 1,200. Indeed, in our gaming tests, TITAN X does not justify its cost with a 15-20 percent advantage over GeForce GTX 1080, but it comes to the rescue overclocking. If we compare the overclocked GTX 1080 and TITAN X, then the latter is already 34% faster. However, the new GP102-based gaming flagship is likely to have fewer active computing units or lose support for any computing functions (or both together).

In general, releasing such massive GPUs as the GP100 and GP102 at an early stage of the 16nm FinFET process is a great achievement for NVIDIA, especially considering the difficulties the company faced in the 40 and 28 nm period.

⇡ NVIDIA GeForce GTX 1070 and 1080

NVIDIA launched the line of GeForce 10-series gaming accelerators in its usual sequence - from the most powerful models to the more budgetary ones. The GeForce GTX 1080 and other subsequent Pascal gaming cards have most strikingly demonstrated that NVIDIA has taken full advantage of the 14/16 nm FinFET process to make the chips more dense and energy efficient.

In addition, by creating Pascal, NVIDIA not only improved performance in various computational tasks (as the GP100 and GP102 example showed), but also supplemented the Maxwell chip architecture with functions that optimize graphics rendering.

Let's briefly highlight the main innovations:

improved color compression with ratios up to 8: 1;
simultaneous Multi-Projection function of the PolyMorph Engine, which allows creating up to 16 projections of the scene geometry in one pass (for VR and multi-display systems in NVIDIA Surround configuration);
the ability to interrupt (preemption) during the execution of a draw call (during rendering) and a flow of commands (during calculations), which, together with the dynamic allocation of GPU computing resources, provides full support for asynchronous computing (Async Compute) - an additional source of performance in games under API DirectX 12 and reduced latency in VR.

The last point is especially interesting, since Maxwell chips were technically compatible with asynchronous computations (simultaneous operation with computational and graphic command queues), but the performance in this mode left much to be desired. Asynchronous computations in Pascal work as they should, allowing more efficient GPU loading in games with a separate thread for physics calculations (although, admittedly, for NVIDIA chips, the problem of fully loading shader ALUs is not as acute as for AMD GPUs).

The GP104 processor used in the GTX 1070 and GTX 1080 is the successor to the GM204 (the second-tier chip in the Maxwell family), but NVIDIA achieved clock speeds so high that the GTX 1080 outperforms the GTX TITAN X (based on the larger GPU) on average. by 29%, and all this within the framework of a more conservative thermal package (180 versus 250 W). Even the GTX 1070, which is much more chopped than the GTX 970 was when compared to the GTX 980 (besides, the GTX 1070 uses GDDR5 memory instead of GDDR5X in the GTX 1080), is still 5% faster than GTX TITAN X.

NVIDIA has updated the display controller in Pascal, which is now compatible with DisplayPort 1.3 / 1.4 and HDMI 2.b, which means it allows you to output a picture with a higher resolution or refresh rate over a single cable - up to 5K at 60 Hz or 4K at 120 Hz. 10/12-bit color representation provides dynamic range (HDR) support on the few screens that have this capability. Dedicated Pascal hardware is capable of encoding and decoding HEVC (H.265) video at up to 4K resolution, 10-bit color (12-bit decode) and 60Hz.

Finally, Pascal removes the limitations of the previous version of the SLI bus. The developers raised the frequency of the interface and released a new, two-channel bridge.

You can read more about these features of the Pascal architecture in our GeForce GTX 1080 review. However, before moving on to other innovations of the past year, it is worth mentioning that in the 10th GeForce lineup NVIDIA will for the first time release reference design cards throughout the life of the corresponding models. They are now called the Founders Edition and are selling for more than the retail price recommended for partner graphics cards. For example, for the GTX 1070 and GTX 1080, the recommended prices are $ 379 and $ 599 (which is already higher than for the GTX 970 and GTX 980 in their youth), and the Founders Edition versions are priced at $ 449 and $ 699.

⇡ GeForce GTX 1050 and1060

The GP106 chip has extended the Pascal architecture to the mass segment of gaming accelerators. Functionally, it is no different from the older models, and in terms of the number of computing units it is half of the GP104. True, the GP106, unlike the GM206 (which was half of the GM204), uses a 192-bit memory bus. In addition, NVIDIA removed SLI connectors from the GTX 1060 board, upsetting fans of a gradual upgrade of the video subsystem: when this accelerator exhausts its capabilities, you will not add a second video card to it (except for those games under DirectX 12 that allow you to distribute the load between GPUs bypassing driver).

The GTX 1060 was originally equipped with 6GB GDDR5, a fully functional GP106 chip, and went on sale for $ 249/299 (partner cards and Founders Edition respectively). But then NVIDIA released a graphics card with 3 GB of memory and a recommended price of $ 199, which also reduced the number of computational units. Both video cards have an attractive TDP of 120W, and are similar in speed to the GeForce GTX 970 and GTX 980.

The GeForce GTX 1050 and GTX 1050 Ti belong to the lowest category adopted by Pascal architecture. But no matter how modest they look against the background of their older brothers, NVIDIA has made the greatest step forward in the budget niche. The GTX 750/750 Ti, which occupied it earlier, belong to the first iteration of the Maxwell architecture, so the GTX 1050/1050 Ti, unlike other accelerators of the Pascal family, has advanced not one, but one and a half generations. Thanks to a significantly larger GPU and memory clocked at an increased frequency, the GTX 1050/1050 Ti delivers more performance over its predecessors than any other Pascal series (90% difference between GTX 750 Ti and GTX 1050 Ti).

Although the GTX 1050/1050 Ti draws slightly more power (75 versus 60W), they still fall within the power requirements for PCI Express cards without a power connector. Younger accelerators NVIDIA did not release in the Founders Edition format, and the recommended retail prices were $ 109 and $ 139.

⇡ AMD Polaris: Radeon RX 460/470/480

AMD's response to Pascal is the Polaris chip family. The Polaris line now includes only two chips, based on which AMD produces three video cards (Radeon RX 460, RX 470 and RX 480), in which the amount of on-board RAM is additionally varied. As you can easily see, even from model numbers, the upper echelon of performance is not occupied in the 400 series Radeon. AMD will have to fill it with Vega silicon-based products. Back in the era of 28 nm, AMD acquired this habit - to test innovations on relatively small chips and only then implement them into flagship GPUs.

It should be noted right away that in the case of AMD, the new family of GPUs is not identical. new version the underlying architecture GCN (Graphics Core Next), but reflects a combination of architecture and other product features. For GPUs built using the new process technology, AMD has abandoned various "islands" in the codename (Northern Islands, South Islands, etc.) and denotes them with the names of stars.

Nevertheless, the GCN architecture in Polaris received another, third in a row update, thanks to which (along with the transition to the 14 nm FinFET process technology) AMD significantly increased the performance per watt.

Compute Unit - an elementary form of organizing shader ALUs in GCN - has undergone a number of changes related to prefetching and caching of instructions, calls to the L2 cache, which together increased the specific performance of CU by 15%.
There is now support for half precision (FP16) calculations, which are used in computer vision and machine learning programs.
GCN 1.3 provides direct access to the internal instruction set (ISA) of stream processors, which allows developers to write as low-level and fast code as possible - in contrast to the DirectX and OpenGL shader languages \u200b\u200bthat are abstracted from hardware.
Geometric processors are now able to exclude zero-size polygons or polygons with no projected pixels at the early stages of the pipeline, and have an index cache that reduces resource consumption when rendering small duplicate geometry.
L2 cache doubled.

In addition, AMD engineers went to great lengths to get Polaris to run at the highest frequency possible. The GPU frequency is now controlled with a minimum latency (latency less than 1 ns), and the voltage curve is corrected by the card every time the PC is booted in order to take into account the variation in parameters between individual chips and aging of silicon during operation.

However, the transition to the 14nm FinFET process did not go smoothly for AMD. Indeed, the company was able to increase the performance per watt by 62% (judging by the results of the Radeon RX 480 and Radeon R9 380X in gaming tests and the passport TDP of the cards). but maximum frequencies Polaris do not exceed 1266 MHz, and only a few of the manufacturing partners have achieved more with additional work on cooling and power supply. On the other hand, GeForce video cards still retain their leadership in the performance ratio - the power that NVIDIA reached back in the Maxwell generation. It seems that AMD at the first stage could not reveal all the capabilities of the new generation process technology, or the GCN architecture itself already requires deep modernization - the last task remained with the Vega chips.

Polaris-based accelerators range in price from $ 109 to $ 239 (see table), although in response to the arrival of the GeForce GTX 1050/1050 Ti, AMD has reduced prices for the two lower-end cards to $ 100 and $ 170, respectively. At the moment, in every price / performance category, there is a similar balance of power between competing products: the GeForce GTX 1050 Ti is faster than the Radeon RX 460 with 4GB of RAM, the GTX 1060 with 3GB of memory is faster than the RX 470, and the full-fledged GTX 1060 is ahead of the RX. 480. However aMD graphics cards are cheaper, which means they are popular.

⇡ AMD Radeon Pro Duo

The report on the past year in the field of discrete GPUs will not be complete if we ignore another of the "red" video cards. While AMD had not yet released a flagship single-processor video adapter to replace the Radeon R9 Fury X, the company had one proven move left to continue conquering new frontiers - to install two Fiji chips on one board. This card, the release of which AMD has repeatedly postponed, still appeared on sale shortly before the GeForce GTX 1080, but fell into the category of professional Radeon Pro accelerators and was positioned as a platform for creating games in the VR environment.

For gamers at $ 1,499 (more expensive than a pair of Radeon R9 Fury X at the time of release), the Radeon Pro Duo is of no interest, and we didn't even have the opportunity to test this card. It's a pity, because from a technical point of view, the Radeon Pro Duo looks intriguing. The card's passport TDP has grown by only 27% compared to the Fury X, despite the fact that the peak frequencies of AMD processors have been reduced by 50 MHz. Earlier, AMD has already managed to release a successful dual-processor video card - Radeon R9 295X2, so the specifications announced by the manufacturer do not cause much skepticism.

What to expect in 2017

The main expectations for the coming year are related to AMD. NVIDIA will likely limit itself to launching a flagship GP102-based gaming card called the GeForce GTX 1080 Ti, and perhaps fill another vacancy in the 10-series GeForce - the GTX 1060 Ti. The rest of the line of Pascal accelerators has already been formed, and the debut of the next architecture, Volta, is scheduled only for 2018.

As with the CPU, AMD has mustered all its strength to develop a truly disruptive microarchitecture of GPUs, while Polaris has become just a staging post on the way to the latter. Presumably, already in the first quarter of 2017, the company will release its best silicon, Vega 10, to the mass market for the first time (and with it or later - one or more junior chips in the line). The most reliable evidence of its capabilities was the announcement of the MI25 computational card in the Radeon Instinct line, which is positioned as an accelerator for deep learning tasks. Based on the specs, it is based on none other than Vega 10. The card delivers 12.5 TFLOPS of single precision (FP32) computing power — more than the GP102 TITAN X — and comes with 16GB HBM2 memory. The TDP of the video card is within 300 W. The real speed of the processor is anyone's guess, but Vega is known to bring the largest GPU microarchitecture update since the first GCN-based chips five years ago. The latter will noticeably improve performance per watt and allow more efficient use of the computing power of shader ALUs (which AMD chips traditionally do not lack) in gaming applications.

It is also rumored that AMD engineers have now perfectly mastered the 14nm FinFET process and the company is ready to release a second version of Polaris graphics cards with a significantly lower TDP. As it seems to us, if this is true, then the updated chips are more likely to go to the Radeon RX 500 line than to receive increased indexes in the existing 400 series.

⇡ Application. Current lines of discrete graphics adapters from AMD and NVIDIA

Manufacturer	AMD
Model	Radeon RX 460	Radeon RX 470	Radeon RX 480	Radeon R9 Nano	Radeon R9 Fury	Radeon R9 Fury X
	GPU
Name	Polaris 11	Polaris 10	Polaris 10	Fiji XT	Fiji PRO	Fiji XT
Microarchitecture	GCN 1.3	GCN 1.3	GCN 1.3	GCN 1.2	GCN 1.2	GCN 1.2
Technological process, nm	14 nm FinFET	14 nm FinFET	14 nm FinFET	28	28	28
Number of transistors, million	3 000	5 700	5 700	8900	8900	8900
	1 090 / 1 200	926 / 1 206	1 120 / 1 266	— / 1 000	— / 1 000	— / 1 050
Number of shader ALUs	896	2 048	2 304	4096	3584	4096
	56	128	144	256	224	256
ROP number	16	32	32	64	64	64
	RAM
Bus width, bit	128	256	256	4096	4096	4096
Chip type	GDDR5 SDRAM	GDDR5 SDRAM	GDDR5 SDRAM	HBM	HBM	HBM
	1 750 (7 000)	1 650 (6 600)	1 750 (7 000) / 2 000 (8 000)	500 (1000)	500 (1000)	500 (1000)
Volume, MB	2 048 / 4 096	4 096	4 096 / 8 192	4096	4096	4096
I / O bus	PCI Express 3.0 x8	PCI Express 3.0 x16	PCI Express 3.0 x16	PCI Express 3.0 x16	PCI Express 3.0 x16	PCI Express 3.0 x16
	Performance
	2 150	4 940	5 834	8 192	7 168	8 602
FP32 / FP64 performance	1/16	1/16	1/16	1/16	1/16	1/16
	112	211	196/224	512	512	512
	Image output
		DL DVI-D, HDMI 2.0b, DisplayPort 1.3 / 1.4	DL DVI-D, HDMI 2.0b, DisplayPort 1.3 / 1.4	HDMI 1.4a, DisplayPort 1.2	HDMI 1.4a, DisplayPort 1.2	HDMI 1.4a, DisplayPort 1.2
TDP, W	<75	120	150	175	275	275
	109/139	179	199/229	649	549	649
	8 299 / 10 299	15 999	16 310 / 18 970	ND	ND	ND

Manufacturer	NVIDIA
Model	GeForce GTX 1050	GeForce GTX 1050 Ti	GeForce GTX 1060 3 GB	GeForce GTX 1060	GeForce GTX 1070	GeForce GTX 1080	TITAN X
	GPU
Name	GP107	GP107	GP106	GP106	GP104	GP104	GP102
Microarchitecture	Pascal	Pascal	Maxwell	Maxwell	Pascal	Pascal	Pascal
Technological process, nm	14 nm FinFET	14 nm FinFET	16 nm FinFET	16 nm FinFET	16 nm FinFET	16 nm FinFET	16 nm FinFET
Number of transistors, million	3 300	3 300	4 400	4 400	7 200	7 200	12 000
Clock frequency, MHz: Base Clock / Boost Clock	1 354 / 1 455	1 290 / 1 392	1506/1708	1506/1708	1 506 / 1 683	1 607 / 1 733	1 417 / 1531
Number of shader ALUs	640	768	1 152	1 280	1 920	2 560	3 584
Number of texture mapping units	40	48	72	80	120	160	224
ROP number	32	32	48	48	64	64	96
	RAM
Bus width, bit	128	128	192	192	256	256	384
Chip type	GDDR5 SDRAM	GDDR5 SDRAM	GDDR5 SDRAM	GDDR5 SDRAM	GDDR5 SDRAM	GDDR5X SDRAM	GDDR5X SDRAM
Clock frequency, MHz (bandwidth per contact, Mbps)	1 750 (7 000)	1 750 (7 000)	2000 (8000)	2000 (8000)	2000 (8000)	1 250 (10 000)	1 250 (10 000)
Volume, MB	2 048	4 096	6 144	6 144	8 192	8 192	12 288
I / O bus	PCI Express 3.0 x16	PCI Express 3.0 x16	PCI Express 3.0 x16	PCI Express 3.0 x16	PCI Express 3.0 x16	PCI Express 3.0 x16	PCI Express 3.0 x16
	Performance
Peak performance FP32, GFLOPS (based on maximum specified frequency)	1 862	2 138	3 935	4 373	6 463	8 873	10 974
FP32 / FP64 performance	1/32	1/32	1/32	1/32	1/32	1/32	1/32
RAM bandwidth, GB / s	112	112	192	192	256	320	480
	Image output
Image output interfaces		DL DVI-D, DisplayPort 1.3 / 1.4, HDMI 2.0b	DL DVI-D, DisplayPort 1.3 / 1.4, HDMI 2.0b	DL DVI-D, DisplayPort 1.3 / 1.4, HDMI 2.0b	DL DVI-D, DisplayPort 1.3 / 1.4, HDMI 2.0b	DL DVI-D, DisplayPort 1.3 / 1.4, HDMI 2.0b	DL DVI-D, DisplayPort 1.3 / 1.4, HDMI 2.0b
TDP, W	75	75	120	120	150	180	250
MSRP at time of release (US, excluding tax), $	109	139	199	249/299 (Founders Edition / Affiliate Cards)	379/449 (Founders Edition / Affiliate Cards)	599/699 (Founders Edition / Partner Cards)	1 200
Recommended retail price at the time of release (Russia), rubles	8 490	10 490	ND	18,999 / - (Founders Edition / partner cards)	ND / 34 990 (Founders Edition / partner cards)	ND / 54 990 (Founders Edition / partner cards)	—

Got the ability to track graphics processing unit (GPU) performance data. Users can analyze this information to understand how the graphics card resources, which are increasingly used in computing, are being used.

This means that all GPUs installed in the PC will be shown in the “Performance” tab. In addition, in the Processes tab, you can see which processes are accessing the GPU, and the GPU memory usage data is located in the Details tab.

How to check if the GPU Performance Viewer is supported

While Task Manager doesn't have any specific requirements for monitoring CPU, memory, disk, or network adapters, the situation with GPUs is slightly different.

In Windows 10, GPU information is available in Task Manager only when using the Windows Display Driver Model (WDDM) architecture. WDDM is a graphics driver architecture for a video card that enables the rendering of the desktop and applications to the screen.

WDDM provides a graphics core that includes a scheduler (VidSch) and a video memory manager (VidMm). It is these modules that are responsible for making decisions when using GPU resources.

The task manager receives information about the GPU resource utilization directly from the scheduler and video memory manager of the graphics core. Moreover, this is true both in the case of integrated and in the case of dedicated graphics processors. This function requires WDDM version 2.0 or higher to work correctly.

To check if your devices support viewing GPU data in Task Manager, follow these steps:

Use the Windows Key + R keyboard shortcut to open the Run command.
Enter the command dxdiag.exeto open DirectX Diagnostic Tool and press the Enter key.
Click on the “Display” tab.
In the right section “Drivers” look at the meaning of the driver model.

If you are using a WDDM 2.0 or higher model, Task Manager will display GPU usage in the Performance tab.

How to monitor GPU performance using Task Manager

To monitor GPU performance data using Task Manager, simply right-click on the taskbar and select Task Manager. If the compact view is active, click the Details button and then click the Performance tab.

Advice: to quickly launch the Task Manager, you can use the keyboard shortcut Ctrl + Shift + Esc

Performance tab

If your computer supports WDDM version 2.0 or later, on the left pane of the tabs Performance your GPU will be displayed. In case multiple GPUs are installed in the system, each of them will be shown using the number corresponding to its physical location, for example, GPU 0, GPU 1, GPU 2, etc.

Windows 10 supports bundles of multiple GPUs using Nvidia SLI and AMD Crossfire modes. When one of these configurations is found on the system, the Performance tab will indicate each link using a number (eg Link 0, Link 1, etc.). The user will be able to see and check every GPU within the bundle.

On a specific GPU page, you will find a summary of performance data, which is generally divided into two sections.

This section contains current information about the engines of the GPU itself, and not about its individual cores.

The Task Manager displays the four most requested GPU engines by default, which by default include 3D, copying, video decoding, and video processing, but you can change these views by clicking on the name and choosing a different engine.

The user can even change the graph view to a single slider by right-clicking anywhere in the section and choosing the "Change Graph\u003e Single Core" option.

Below the graphs of the engines, there is a data block on video memory consumption.

The Task Manager shows two types of video memory: shared and dedicated.

Dedicated memory is memory that will only be used by the graphics card. This is usually the amount of VRAM on discrete cards, or the amount of memory available to the processor on which the computer is configured to explicitly reserve.

In the lower right corner, the Hardware Reserved Memory option is displayed - this amount of memory is reserved for the video driver.

The allocated memory in this section represents the amount of memory actively used by processes, and the total memory in this section represents the amount of system memory consumed for graphics purposes.

In addition, in the left pane under the name of GPUs, you will see the current percentage of GPU utilization. It's important to note that Task Manager uses the percentage of the most loaded engine to represent overall usage.

To see performance data over time, run a GPU-intensive application, such as a video game.

Processes tab

You can also monitor GPU performance in the tab Processes... In this section, you will find a generalized summary for a specific process.

The GPU column shows the use of the most active engine to represent the total GPU resource usage by a particular process.

However, if multiple engines report 100 percent utilization, confusion can arise. The additional column “GPU core” provides detailed information about the engine loaded by the given process.

The column heading on the Processes tab shows the total resource consumption of all GPUs available on the system.

If you do not see these columns, right-click on any column header and check the appropriate boxes.

Details tab

By default, the tab does not display GPU information, but you can always right-click on a column header, select the Select Columns option, and enable the following options:

GPU core
Dedicated GPU memory
Total GPU Memory

Memory tabs display the total and allocated amounts of memory, respectively, that are used by a particular process. The columns “GPU” and “GPU Core” show the same information as in the “Processes” tab.

When using the Details tab, you need to be aware that the addition of used memory by each process may be larger than the total available memory, since the total memory will be counted several times. This information is useful for understanding memory usage in a process, but you should use the Performance tab to see more accurate information on graphics usage.

Conclusion

Microsoft is committed to providing users with a more accurate tool for assessing the performance of the graphics subsystem compared to third-party applications. Note that work on this functionality is ongoing and improvements are possible in the near future.