Moore Threads recently held the 2022 Autumn Conference, where they introduced the first domestic graphics card product, the MTTS80, which supports the Windows environment and the DirectX graphics interface, as well as the new multi-functional GPU chip “Chunxiao,” the MTT S3000 for server applications, and the meta-computing all-in-one machine MCCX.
Initially, the author assumed that this would be a “PPT conference.” Because Moore’s thread has gone too far this time. But I had no idea that just a week later, this MTT S80 would be on our desktop, and that it could be used under Windows after installing the host without any complicated debugging.
Let’s take a look at what kind of step this MTT S80 has taken for the development of domestic graphics cards in this article. The following is the test configuration:
The Moore Threads MTT S80’s packaging is one-of-a-kind. The Chinese-style line pattern on it emphasizes the fact that it is made in China. In addition, this is the first time we have tested a Chinese-made graphics card, which is quite significant.
In addition to the graphics card body, the package includes a very simple instruction manual and a dual PCIe 8Pin to CPU 8Pin cable. The manual is so straightforward because the installation procedure is identical to that of a standard graphics card. Install it, then launch Windows and install the driver.
The MTT S80 graphics card body has a high design level. The overall design is square and metal-inspired. The shell is one-piece and uses aluminum alloy die-casting + CNC technology, which greatly improves the overall structural strength of the graphics card and eliminates the need for a graphics card bracket. The heat dissipation section uses a three-fan design, with two 8cm fans and a 7cm fan in the center forming a centrally symmetrical overall layout.
The outer edges of the fans on both sides are wrapped by two arcs inspired by hyperbolic functions found in mathematics, and they complement the circular RGB fan in the middle, which has a great sense of design. The three sets of fans here all support intelligent speed adjustment, allowing for a quiet experience while ensuring the GPU’s stability.
The back panel is completely metal, with a Moore threads LOGO in the center, and the air vent on the right will light up after powering on, which is very cool.
The coolest feature is the orange halo in the center, which when lit appears to be a gushing crater, bringing endless energy.
The dense heat dissipation fins of the S80 can be seen from the side of the graphics card. At the same time, four 6mm heat pipes are used to run through the heat sink, allowing heat to transfer as quickly as possible from the GPU chip and video memory to the heat dissipation fins.
The 8-pin power interface on the side is the best design. Although this necessitates a larger case for compatibility, it also makes the case’s front more concise and beautiful.
The side interface part uses three DP1.4a and one HDMI2.1 interfaces, which are currently only available with high-end graphics cards capable of 8K video output.
Finally, the MTT S80 is the first graphics card to use the PCIe 5.0 interface, as well as a graphics card that supports the PCIe 5.0*16 interface, implying that it is best to use a relatively new motherboard to achieve the best interface performance. As a result, the Moore Threads JD flagship store will sell it for ￥2999 with an ASUS B660M motherboard.
Moore Threads MTT S80 includes a multi-functional GPU chip called “Chun Xiao” that is based on the MUSA architecture. In comparison to Moore Thread’s “Sudi” released in March of this year, the four built-in computing engines of “Chun Xiao” have been fully upgraded, and can now operate simultaneously. It can perform graphics and image rendering, 8K video codec, AI training and inference, general computing, GPU virtualization, physical simulation, and other tasks.
The MTT S80 is based on TSMC’s 7nm process in terms of core parameters. It has 4096 MUSA cores, a 1.8GHz main frequency, 16GB GDDR6 memory, and a 256bit memory interface width. The core contains 22 billion transistors, as well as a general-purpose computing core and tensor processor built into the MUSA architecture. Calculation precisions such as FP32, FP16, and INT8 can be supported by the calculation core.
We also disassembled the MTT S80, and the entire card is very simple to disassemble. Remove the backplane and bezel by unscrewing all visible screws. The internal workmanship is quite regular, and the video memory is 8 Samsung GDDR6 flash memories, each with 2GB, for a total of 16GB.
The core code is SD102AA-500, which is based on Moore thread’s GPU chip “Chunxiao.”
The most notable feature of the MTT S80 is that it is China’s first GPU to support the Windows environment and the DirectX graphics interface. Moore threads stated at the press conference that the MTT S80 Windows driver has a built-in MUSA DirectX Driver module and has completed the adaptation of more than ten games such as “Diablo 3,” “League of Legends,” and “Cross Fire.” There are more games that can run but are still being adapted. But we’ll have to wait and see if it really does what it says.
Let’s start with a theoretical performance test. However, prior to the test, we discovered that: MTT S80 does support Windows and DirectX environments, and can support DirectX 11 at the hardware level, but the driver has not yet completed the development of all functional modules, so currently only supports DirectX 9. So we can’t do the traditional test; we’ll have to find another way.
Unigine Valley BenchMark 1.0 is a software that can test the performance of DX9 in the Windows environment. MTT S80 received 2302 points in this software.
We checked the Unigine official website’s leaderboard and discovered that the MTT S80 can compete with the GTX 1060 6G in this project.
Pixel and texture fill rates are also important indicators of graphics card performance. The number of pixels that the GPU can render to the screen and write to the display memory in one second is referred to as the pixel fill rate. The pixel fill rate FFP of the MTT S80 – Single texture score is 188 GPixel / s is measured using the Fillrate Tester. In comparison, the RTX 3060 has an 85.30 GPixel/s pixel fill rate and the RTX 3080Ti has an 186.5 GPixel/s pixel fill rate.
The number of texture map elements that the GPU can map to a pixel in one second is referred to as the texture fill rate. We can put 3DMark 06 to the test. The final Multi-Texturing rate is up to 170 GPixel/s, while the RTX 3060 texture fill rate is 199.0 GTexel/s. The texture fill rate of the RTX 3050 is 142.2 GTexel/s. The reason for the disparity between projects is that the current driver has not optimized CPU multi-threading, so the higher the graphics load, the better MTT S80 performance. MTT S80 performance will be enhanced once future driver optimization is completed.
Aside from the two tests mentioned above, the Windows platform does not have much running software. So we switched to Linux and saw if we could get some data under Ubuntu. Let’s try out clpeak and see how it performs in terms of memory bandwidth and single precision floating point (FP32) performance. The maximum memory bandwidth is 365 Gbps, and the maximum single-precision floating point is 13.9 TFLOPS, according to the final measured data.
What level is this most likely? The theoretical performance of the desktop RTX 3060 12G is as follows. The MTT S80 has slightly higher memory bandwidth and floating point performance than the RTX 3060.
Because the MTT S80 is the first domestic graphics card to support PCIe 5.0, we tested its PCIe bandwidth as well. Under Ubuntu, we used OCL Bandwidth Test to test the interface uplink and downlink. The maximum upload bandwidth is 28G/s, and the maximum download bandwidth is 32G/s, which is twice as fast as most mainstream PCIe 4.0 graphics cards. MTT S80 can be described as a graphics card for “fighting the future.”
According to our results, MTT S80 can theoretically reach the level of RTX 3060-RTX 3060Ti without taking into account environmental compatibility. Because the driver is still attempting to adapt to the DirectX and OpenGL environments in the Windows environment, the performance of different software varies greatly. It is safe to say that the Moore Threads MTT S80’s hardware level is fully operational at this time. Although driver adaptation cannot currently keep up with mainstream levels, it has made a good start for domestic graphics cards.
As previously stated, the MTT S80 is the first Chinese-made graphics card to support Windows and DirectX environments; however, how does it perform in actual gaming? As previously stated, MTT S80 currently only supports the DirectX 9 environment, so we can only test some older games with a large audience. The following games are run at 1080P with low quality. The first is “League of Legends,” which has reached 140-150 frames per second, which is sufficient for e-sports monitors.
If you set it to 1080P high-definition, the number of frames will average around 136, and you will be able to play smoothly.
Finally, we tried 2K high-definition, and the average frame rate can still be kept above 120 frames per second, which is excellent.
“QQ Speed” locks 30 frames by default, allowing it to be freely played.
The average frame rate of “Cross Fire” is as high as 180 frames per second, making the game extremely smooth to play.
Moore Threads demonstrated “Diablo 3” at the press conference, and we have tested it to play smoothly at around 90-100 frames per second.
Minecraft has been adapted as well. However, the author discovered that the NetEase version cannot be opened, while the Microsoft version can be opened directly, but the frame rate is around 40-50 frames per second on average, which is not very smooth but is already playable.
Finally, let’s take a look at “CS:GO,” which is still a joy to play. We run the Benchmark at a frame rate of around 213 frames per second.
The adaptation of the above games shows that the current Moore Threads idea is to first adapt to those national-level games with a large audience to improve the acceptance of domestic graphics cards, and then go back to adapt to those high-quality niche games, proving that the development idea is unquestionably correct.
A home graphics card must not only be able to play games, but it must also have excellent video encoding and decoding capabilities. Moore threads stated during the press conference that the MTT S80 not only supports H.264 and H.265 (HEVC), but also adds the latest AV1 codec capability, as well as three DP 1.4a interfaces and one HDMI 2.1 interface, each of which can output 8K and 4K images.
The author first attempted to open a 4K YouTube video. The appearance and feel are very smooth, and there is no lag caused by a bad codec. The control panel shows that MTT S80 is also called normally for GPU acceleration.
So, how does it fare in terms of video codec performance and efficiency? We must return to the Linux environment, use the ffmpeg tool to invoke Vappi’s hardware codec acceleration interface, and test various code stream formats. According to our test results, we can normally decode multiple channels of H.264 and H.265 in parallel, as well as VP9 and AV1 formats, and we can achieve multi-channel H.264, H.265 parallel encoding and video transcoding between multiple formats.
We created 1080p video YUV data with H.265 multi-channel encoding. During the test, 9 channels of encoding were used in parallel to put as much pressure on the encoder as possible. According to the test results, the frame rate of each encoding channel is 183fps, and the overall performance exceeds 1080p1600fps.
Additionally, we conducted some decoding performance tests. The total frame rate can exceed 1200fps when decoding 1080p video in a multi-channel pressure test. The single-channel performance of 10-channel parallel decoding of 1080p video in VP9 format is shown below. The frame rate is shown to be 122fps.
The MTT S80’s video codec performance is very strong online, and the foundation has been laid in terms of hardware capabilities. Most content consumers can use it immediately after purchasing it, and there is no obligation to watch 4K HDR videos. Encoding capability of the MTT S80 hardware is also very strong for video creators. However, no editing software adaptation is currently available. According to feedback from Moore Thread’s internal product personnel, they are currently actively adapting drivers and APIs with domestic and foreign video editing software, with the hope of gradually meeting consumers’ video editing needs in the future. Moore threads can collaborate with some domestic editing software to promote editing software adaptation.
MTT S80 can also be used in AI training due to its full-featured MUSA architecture. For example, using the MUSA software stack, developers can easily and quickly migrate existing AI models to MTT S80; in terms of compatibility, MTT S80 is compatible with PyTorch, a variety of mainstream deep learning frameworks such as TensorFlow, and can optimize dozens of AI models such as Transformer, CNN, and RNN.
In previous tests, the MTT S80 demonstrated very strong single-precision floating-point performance, allowing it to demonstrate powerful performance in AI high-precision reasoning that requires single-precision floating-point performance, as well as meet scenarios that require extremely high data calculation accuracy, such as medical, financial, and other application fields. MTT S80, for example, is specially adapted to MONAI, an AI open source framework in the medical field, to achieve high-precision reasoning for a variety of tasks.
“CUDA on MUSA” is the most important black technology. Moore Threads has created a CUDA ON MUSA compatible solution for CUDA language users in order to reduce migration costs. Using Moore Thread’s porting tool, the CUDA source code can be run on the Moore Threads MUSA architecture GPU in two steps: compiling and running.
According to the author’s current evaluation, the hardware performance of the MTT S80 has reached the level of mainstream desserts, which is undoubtedly a significant step forward for the entire domestic graphics card industry. The most difficult challenge, however, is determining how to develop the driver in the future. Few people in the world understand the development of Windows drivers due to the high level of professionalism in computer graphics. The majority of them are concentrated in Western countries, with only a few professionals in China. Domestic GPU companies in the start-up stage must quickly launch market-oriented GPU products, but the challenge they face is a lack of talent in key areas such as chip design and underlying driver development, as well as a tempering team. So creating a general-purpose GPU is not an easy task.
Even Intel, which has been involved in nuclear display for more than ten years and has the largest market share, experienced driver development setbacks when entering the independent graphics card market, let alone a new player who has only been in the market for two years. Domestic GPU compatibility with the old software ecosystem is undoubtedly a lengthy and difficult process. We must admit that independent innovation is a difficult path to take, but it is one that must be taken. With the recent US ban, Nvidia was forced to stop supplying certain models of GPU chips to China. We can’t predict what kind of conflict will arise in the future, so we must be fully prepared.
However, we are fortunate to see that Moore Threads has taken the first step toward compatibility with mainstream platforms today. In terms of the MTTS80 in our hands, it can be plugged directly into a Windows computer when purchased by most consumers who use it lightly. It can be used, and it is simple to watch the video and play LOL, which is unquestionably commendable. But we must also consider it rationally. Moore Threads cannot reach the sky in a single step and directly produce mainstream-level products. As a result, when assessing Moore Threads MTT S80, the author also provided the most encouragement and tolerance. Of course, I continue to hope that Moore Threads will encourage the adaptation of various games and applications as soon as possible, and fully release this powerful core.
Moore Threads is a high-tech integrated circuit company that specializes in GPU chip design. Its primary focus is the development and design of full-featured GPU chips and related products that can provide powerful computing acceleration capabilities to Chinese technology partners. Founded in October 2020, the company is dedicated to developing a new generation of GPUs for meta-computing applications, constructing a comprehensive computing platform that integrates visual computing, 3D graphics computing, scientific computing, and artificial intelligence computing, and establishing an ecosystem based on cloud-native GPU computing to help drive the digital economy’s development.
Moore Threads is distributed by Unixcloud Technology (Shenzhen) Co., Ltd. Unixcloud has a long history of product development and manufacturing. Moore Threads’ official authorized distributor is UnixCloud. It focuses on the field of edge computing in response to the computing power requirements of artificial intelligence development, and it offers the industry edge computing products and solutions that meet a variety of AI requirements. UnixCloud launched the 10G network card business at the same time, offering four-port and two-port 10G optical fiber network cards based on the Mucse-based network controller N10.