GCC, glibc performance in 2018 (2024)

What is the GNU toolchain?

In this blog we will focus on two components of the GNU toolchain, theGNU Compiler Collection(GCC)and the GNU C library(glibc). A full toolchaincontains severalvital components like assemblers, linkers and debuggers, but in this blogwe are focusing on thecompiler and the C library.

How important is it?

Very! GCCis the platform compiler for majorLinux distributions like Red Hat Enterprise Linux, SUSE Linux Enterprise Server, Ubuntu Linuxand many more. That means it is used to compile the Linux kernel, all the supporting system components, and the software packages that constitute a modern Linux distribution.It is also the default compiler for the developers using these distributions for software engineering.Correspondingly, glibc is the default library in these systems, providing thebackbone for theextraordinary diversity offunctionality, performance and security required by modern software.

Given the above,we arehard at work making sure the GNU toolchain is the best it can be on Arm platforms. While some of the work presented here is by Arm engineers we must emphasize all of this is only possible because of ourcollaborationwiththe strong GNU toolchain community. Check out thevarious blogs throughout the communityto get a feel for the breadth of work that is being done!

Toolchain performance

Oneof the areas we focus on is improving the performance of applications built with the GNU toolchain. There are many waysto do this and in this blog wepresent the highlights from our work in GCC and glibc as these are the two toolchain components that affect performance the most.

Improvements in GCC

The GNU Tools team in Arm has been hard at work doing our share to make this release the best version of GCCfor Arm platforms to date.Theproject follows an annual release cadence and the 2018 release of GCC 8has too many improvements to list in this blog! I would, however, like to highlightsome of the many optimisation improvements that GCC gained over the lastdevelopment cycle:

  • GCC gains a new loop interchange pass. This pass transforms loop nests toimproveuse of the data cache and makes memory access patterns morefriendly for crucial subsequent optimisations likeauto-vectorisation. It is a well-studied transform that has been missing a good implementation in GCC. Until now! It is now enabled by defaultat high optimisation levels and has already shown its utility by acceleratingmultiple benchmarks with a highlight in the 503.bwavesbenchmarkfrom the popular SPEC CPU 2017 benchmark suite ofmore than 10%. This is a phenomenalperformance improvement, reproducible acrossall Armprocessorsand provided as part of the default toolchain for all users of GCC 8.Consider the loop:
for (int j = 0; j < N; j++) for (int k = 0; k < N; k++) for (int i = 0; i < N; i++) c[i][j] = c[i][j] + a[i][k] * b[k][j];

The loop interchange pass can transform this into:

for (int i = 0; i < N; i++) for (int j = 0; j < N; j++) // i, j, k interchanged for (int k = 0; k < N; k++) c[i][j] = c[i][j] + a[i][k] * b[k][j];

GCC, glibc performance in 2018 (1)

We can see the memory access pattern for c[i][j] changed to a more cache-friendly iteration.Wheneach element in a row of the array c, accessed through i, liesin the same cache-line the interchanged access patternmakes much better use of the data locality.

  • The loop distribution pass in GCC is extended to handle more complexsituations present inreal code. Complex loops that contain vectorisable sequences mixed with non-vectorisable ones (for example due to loop-carried dependencies, complex data aliasing layouts) canbeseparated into their own loops. The parts that were vectorisable can then be vectorised independently of therest of the code, giving the expected performance uplift. Again, this is not an academic, prototype implementation but production-ready functionality that is enabled by default in the compiler athigh optimisation levels, giving an improvement of over 25% on the 456.hmmer benchmark from the SPEC CPU 2006benchmark suite. Thispass is a very powerful tool. Theanalysis it does can beused for manyexcitingoptimisations in the compiler.For example, the code below:
#define M (256)#define N (512)struct st{ int a[M][N]; int c[M]; int b[M][N];};voidfoo (struct st *p){ for (unsigned i = 0; i < M; ++i) { p->c[i] = 0; for (unsigned j = N; j > 0; --j) { p->a[i][j - 1] = 0; p->b[i][j - 1] = 0; } }}

is now optimised into a single call to the standardmemset function instead of initialising each field of the struct separately:

foo: mov x2, 1024 movk x2, 0x10, lsl 16 // size of memory to initialise is size of whole 'st' struct in bytes mov w1, 0 // initialise memory with zero b memset

We take our role in the GNUdeveloper community very seriously and all such impactful improvements are presented to the community, co-designed when possible and iterated through cycles of feedback until we have a solution thatworks not only for our conveniencebut is maintainable, scalable and usable by as manyconsumers of the toolchain as possible. We encourage strong participation at developer conferences and present on all kinds of topics, from Bin Cheng presenting the aboveloop optimisation workto our performance tracking methodologyby James Greenhalgh.

Improvements in glibc

Theglibc project has been pretty active as well.Many real world applications spend large portions of theirexecution time in the library. Arm collaborated with the excellentglibccommunity todeliver some truly exciting improvements for the2.27 release on February 2017 and the preceding 2.26 release:

  • The most frequently used single-precision floating-point math routines expf, powf, logfand their derivatives arerewrittenfrom the ground up. The new approach uses double-precision hardware to accelerate single-precision arithmetic operations and other improvements to the approximation algorithm to achieve massiveincreases in latency and throughput of the order of 200% and 300% over theprevious implementations. On top of that, the new implementations achieve better precision and are written in completely portable standard C, replacing existing hard-to-maintain assembly implementations on some targets, improving the maintainability of the codebase as well.Szabolcs Nagy provided the new implementations and collaborated with thecommunity to integratethis awesome work into theupstream glibc release.Thanks to these new routines usingglibc 2.27 gives a whopping 60%improvement on the 521.wrf benchmark from the SPEC CPU 2017 suite! That by itself pushes the entire aggregate SPEC fprate 2017 score by 3%.

GCC, glibc performance in 2018 (2)

  • In response to a customer observation about inconsistent performance of the standardinput/output function getcharwe investigated and improved the locking sequence to giveupwards of 400% improvement in single-threaded code that uses that common function heavily.
  • Wilco Dijkstraadded an optimised implementation of the memcmp function improvingits performance on aligned memoryarguments by 25% and more than 500% on unaligned arguments.
  • Unnecessary synchronisation was removed when accessing Thread Local Storage (TLS) variablesfroma shared library.Thisroughly halvesthe access time to these variables on AArch64 platforms.
  • Memory allocation and deallocation is one of the core functions of a C library andis tricky to get right because so many workloads need to do it. Finding the right balance between memory use, execution speed measured insingle-threaded and multi-threaded environments across the whole gamut of supported architectures is not a task for the faint-hearted!The glibc community (and a call out here to our friends at Red Hat) put in a lot of effort in improving the algorithms used for memory allocation andeveryone benefits. From the malloc improvements inglibc 2.26 we see gains of 3% and abovein benchmarks like 523.xalancbmk from SPEC CPU 2017 and other malloc-heavy workloads.

Putting it all together

Users of Linux distributions that come out with these newer versionsof GCC and glibccan get these and many more improvements as part of their out-of-the-box experience. Ourperformance tracking metrics show that using the2018 state of the artcomponents of the GNU toolchain against the equivalent early 2017releases gives an uplift of at least 1.5% on the aggregate SPEC intrate score of the SPEC CPU 2017suite and around 8% improvement on the SPEC fprate aggregate score. A Pretty gooduplift from just upgrading the software stack. The SPEC CPU benchmarks are derived from real-world software packagesthat have been optimisation targets for decades in some cases.And remember, these are just the aggregatescores in one benchmark suite. Individual applications, depending on their execution profile may achieve much more.

This post focuses on performance improvements but the GNU toolchain is about so much more. Check out the long list of new features and improvements in GCC 8 on the main project page.Support for bleeding-edge language standards, novel architectures like the Arm Scalable Vector Extensions, the Armv8.4-A architecture, the latest processorsspanningfrom the smallest embeddedapplications to the largest HPC behemoths and much more.

What's next?

The wheels of progress never stop turning. The GNU toolchaincommunity andour team here in Arm is already hard at work improving the toolchain for the 2019 releases. We've got somevery exciting projects in flight that we hope to share withyou throughout the year.

We will be providing more visibility into the work we do to improve the GNU software ecosystem as well as ways you can get involved and provide us with feedback andareas you'd like to see improved.

Thank you for reading and watch this space, this will be an exciting year for the GNU toolchain on Arm.

Tools, Software and IDEs blog

  • Taking Windows on Arm to the North Africa Developer Community

    Peter Ing

    Arm Ambassadors bring the benefits of Window on Arm to developers in North Africa.

  • Part 1: Porting to Arm Intrinsics with SIMDe

    Khalid Saadi

    This blog post presents a case study using SIMD Everywhere (SIMDe) to automatically port software using x86 SSE and AVX SIMD intrinsics to Arm Neon.

  • Product update: Arm Development Studio 2024.0 now available

    Ronan Synnott

    Arm Development Studio 2024.0 is now available.

GCC, glibc performance in 2018 (2024)
Top Articles
Billy Joel Joined by Axl Rose for ‘Highway to Hell’ as He Hits the Highway Out of Madison Square Garden With a Rousing Residency Finale: Concert Review
De Clue Family Funeral Home | Potosi, Missouri
Spasa Parish
Rentals for rent in Maastricht
159R Bus Schedule Pdf
Sallisaw Bin Store
Black Adam Showtimes Near Maya Cinemas Delano
Espn Transfer Portal Basketball
Pollen Levels Richmond
11 Best Sites Like The Chive For Funny Pictures and Memes
Things to do in Wichita Falls on weekends 12-15 September
Craigslist Pets Huntsville Alabama
Paulette Goddard | American Actress, Modern Times, Charlie Chaplin
Red Dead Redemption 2 Legendary Fish Locations Guide (“A Fisher of Fish”)
What's the Difference Between Halal and Haram Meat & Food?
R/Skinwalker
Rugged Gentleman Barber Shop Martinsburg Wv
Jennifer Lenzini Leaving Ktiv
Justified - Streams, Episodenguide und News zur Serie
Epay. Medstarhealth.org
Olde Kegg Bar & Grill Portage Menu
Cubilabras
Half Inning In Which The Home Team Bats Crossword
Amazing Lash Bay Colony
Juego Friv Poki
Dirt Devil Ud70181 Parts Diagram
Truist Bank Open Saturday
Water Leaks in Your Car When It Rains? Common Causes & Fixes
What’s Closing at Disney World? A Complete Guide
New from Simply So Good - Cherry Apricot Slab Pie
Drys Pharmacy
Ohio State Football Wiki
Find Words Containing Specific Letters | WordFinder®
FirstLight Power to Acquire Leading Canadian Renewable Operator and Developer Hydromega Services Inc. - FirstLight
Webmail.unt.edu
2024-25 ITH Season Preview: USC Trojans
Metro By T Mobile Sign In
Restored Republic December 1 2022
12 30 Pacific Time
Jami Lafay Gofundme
Greenbrier Bunker Tour Coupon
Pick N Pull Near Me [Locator Map + Guide + FAQ]
Crystal Westbrooks Nipple
Ice Hockey Dboard
Über 60 Prozent Rabatt auf E-Bikes: Aldi reduziert sämtliche Pedelecs stark im Preis - nur noch für kurze Zeit
Wie blocke ich einen Bot aus Boardman/USA - sellerforum.de
Infinity Pool Showtimes Near Maya Cinemas Bakersfield
Dermpathdiagnostics Com Pay Invoice
How To Use Price Chopper Points At Quiktrip
Maria Butina Bikini
Busted Newspaper Zapata Tx
Latest Posts
Article information

Author: Delena Feil

Last Updated:

Views: 5595

Rating: 4.4 / 5 (65 voted)

Reviews: 88% of readers found this page helpful

Author information

Name: Delena Feil

Birthday: 1998-08-29

Address: 747 Lubowitz Run, Sidmouth, HI 90646-5543

Phone: +99513241752844

Job: Design Supervisor

Hobby: Digital arts, Lacemaking, Air sports, Running, Scouting, Shooting, Puzzles

Introduction: My name is Delena Feil, I am a clean, splendid, calm, fancy, jolly, bright, faithful person who loves writing and wants to share my knowledge and understanding with you.