gcc macro language extensions

One of the great things about gcc and in particular its C/C++ preprocessor is various extensions that it has. In this post I would like to briefly describe three of them. One allows to turn C/C++ token into a string. Here token is anything that you can pass as an argument to a macro. Second allows you concatenate two tokens to create new expression. The last one allows C/C++ macros with variable number of arguments.

Stringifying a token

Its amazing how useful this is. Take following code for example.

std::cout << "obj.member1: " << obj.member1 << std::endl;
std::cout << ", obj.member2: " << obj.member2 << std::endl;
std::cout << ", obj.member3: " << obj.member3 << std::endl;
std::cout << ", obj.member4: " << obj.member4 << std::endl;
std::cout << ", obj.member5: " << obj.member5 << std::endl;
std::cout << ", obj.member6: " << obj.member6 << std::endl;
std::cout << ", obj.member7: " << obj.member7 << std::endl;
std::cout << ", obj.member8: " << obj.member8 << std::endl;
std::cout << ", obj.member9: " << obj.member9 << std::endl;
std::cout << ", obj.member10: " << obj.member10 << std::endl;
std::cout << ", obj.member11: " << obj.member11 << std::endl;
std::cout << ", obj.member12: " << obj.member12 << std::endl;
std::cout << ", obj.member13: " << obj.member13 << std::endl;
std::cout << ", obj.member14: " << obj.member14 << std::endl;

Wouldn’t you give a kidney just not to write name of every single member of obj twice? Well, it appears that this can be done. Watch this:

#define PMEM(mem) #mem ": " << mem
#define PCMEM(mem) ", " #mem ": " << mem

Now you can do the following:

std::cout << PMEM(obj.member1) << std::endl;
std::cout << PCMEM(obj.member2) << std::endl;
std::cout << PCMEM(obj.member3) << std::endl;
std::cout << PCMEM(obj.member4) << std::endl;
std::cout << PCMEM(obj.member5) << std::endl;
std::cout << PCMEM(obj.member6) << std::endl;
std::cout << PCMEM(obj.member7) << std::endl;
std::cout << PCMEM(obj.member8) << std::endl;
std::cout << PCMEM(obj.member9) << std::endl;
std::cout << PCMEM(obj.member10) << std::endl;
std::cout << PCMEM(obj.member11) << std::endl;
std::cout << PCMEM(obj.member12) << std::endl;
std::cout << PCMEM(obj.member13) << std::endl;
std::cout << PCMEM(obj.member14) << std::endl;

These two macros will do most of the job for you. Unfortunately, they cannot write the code for you, so you will have to write names of members of obj at least once. # operator does one simple thing. Whatever you use it on turns into a string. Just in case you’re wondering, I am using here another gcc’s feature – string concatenation. gcc allows you to take two immediate strings and concatenate them. First I turned expression obj.member1 into a string using # operator # and then I concatenated it with ": ". Note that stringification of tokens only works inside of macro. Writing something like this:

std::cout << #some_token << std::endl;

will produce compilation error and for a good reason. Another interesting thing is the fact that you can turn anything into a string, even if it is not a valid C/C++ expression. Take a look at the code below:

#define DPRINT(a) #a
std::cout << DPRINT(a + b) << std::endl;
std::cout << DPRINT(hello world) << std::endl;

This code will print two strings, first is a + b and second is hello world. This is despite the fact that hello world is not a valid C/C++.

Token concatenation

Using this feature you can construct new C/C++ tokens using existing tokens. For instance, if you have a large structure and you want to write a function for every member of the structure. One way to do that is by writing the code manually. But I guess you don’t need me for that.

struct some_struct {
    int member1;
    bool member2;
    unsigned long member3;
}

#define ADD_GETTER(TYPE, MEMBER) \
    TYPE get_ ## MEMBER(struct some_struct& st) { \
        return st.MEMBER; \
}

ADD_GETTER(int, member1);
ADD_GETTER(bool, member2);
ADD_GETTER(unsigned long, member3);

Lets analyze this piece of code for second. First I defined a structure called some_struct. Next, I wanted a macro that defines getter function for every member of some_struct. I added ADD_GETTER macro for that. Then I called it three times in a row providing type of the field in some_struct and name of the member.

Calling a macro for member1 expanded to following piece of code:

int get_member1(struct some_struct& st) {
    return st.member1;
}

Notice how it created name of the function. This is concatenation operation in action. ## makes gcc and g++ preprocessor concatenate two tokens, get_ and member1 into single token. ## operator removes all space characters between two tokens. Another thing that it does is eliminating white space and punctuation characters between two tokens. This is especially useful when implementing macros with variable number of arguments.

Macros with variable number of arguments

You can define a macro with variable number of arguments following way:

#define VMACRO(argument1, argument2, ...) do_something()

The three dots as last argument of the macro tells compiler that this is a variadic macro. I.e. this is a macro that receives variable number of arguments. To get access to arguments, you have to use special keyword __VA_ARGS__. Like this:

#define VMACRO(argument1, argument2, ...) do_something(__VA_ARGS__)

In this example I am ignoring argument1 and argument2 and passing remaining arguments to do_something() routine. When I first learned about this feature, I immediately tried to use it for debug printouts macros. This is the code that I’ve written.

#include <stdio.h>

#define DPRINT(format, ...) printf("DEBUG: " format, __VA_ARGS__) 

int main()
{
    DPRINT("hello world");
}

Note that strings have to be immediate values. For instance calling DPRINT(format, "..."); where format is pointer to string will not work because gcc cannot concatenate format with “DEBUG” string. Anyway, I wanted to address something different. You will be surprised to learn that this code doesn’t compile. This is because after preprocessing this code turns into something that is not valid C/C++. This is how main will look like after preprocessing:

int main()
{
    printf("DEBUG: " "hello world", );
}

Note the comma character after “hello world”. The thing is that empty token is valid token in gcc, so passing nothing as argument translates into nothing. There is a workaround for this problem. That is using concatenation operation. Lets change our implementation of DPRINT a little.

#include <stdio.h>

#define DPRINT(format, ...) printf("DEBUG: " format, ##__VA_ARGS__) 

int main()
{
    DPRINT("hello world");
}

Note concatenation operator before __VA_ARGS__. I already mentioned that concatenation operator gets rid of white space and punctuation characters between two tokens. This is exactly what it does in this case – it removes comma between format and empty token leaving clean printf("DEBUG: " "hello world"); This is exactly what we needed.

Did you know that you can receive periodical updates with the latest articles that I write right into your email box? Alternatively, you subscribe to the RSS feed!

Want to know how? Check out
Subscribe page

3 Comments

  1. Nick Black says:

    Just a heads-up: stringizing and string pasting (the # and ## preprocessor operators) are standard ANSI C90, and C99 introduced variadic macros. The only GCC extension here is the special handling of the ## operator with __VA_ARGS__.

    Are you familiar with the concept of x-macros?

    http://dank.qemfd.net/dankwiki/index.php/X_Macros

  2. @Nick Black
    True. However the only place that somehow documents these extensions is gcc’s manual, where they appear as extensions. And there is standard of course.
    I was not familiar with X macros. Nice technique. However I think with these extensions you can achieve same result easier.

  3. Many thanks with regard to writing this you stored me a a lot of extra work with this kind of tip!

Leave a Reply

Prove you are not a computer or die *