printf() vs stream IO in C++

Before joining Dell I was mostly working in kernel writing in C programming language. At Dell I still work on mostly low level stuff, but this time it is user-mode, so I am not tied up to C anymore. We’re writing in C++ and I am learning C++. One of the less appealing things for me in C++ was streaming support and the way its input/output implemented. In particular I got used to printf() functions family and leaving those in favor of streams and cout was tough. What really strikes me is the fact that no C++ book explains this stuff. All C++ books just tell you – so, my dear, this is how this stuff is done in C++. It took me some time to realize how C++ style input/output is much more convenient and powerful than printf() family of functions. Here’s why.

The C++ way

So, In C++ you normally work with stream of some sort. Stream is an abstraction that may hide anything in the world. It can be string stream that writes things to string or debug output stream that sends debug printouts to some remote debug log server. Once you have a stream, you redefine operator << for various classes to write into stream instead of doing bit-shift. For input streams you define operator >> that reads data from string or from socket and initializes object of some class with values it read. All those operator << and >> seems kind of weird. And in fact, this can be any of the operators. For some reason, standard library uses bit-shift operators and this turned into a standard. Boost, popular C++ extensions library uses operator% to implement similar functionality, although in slightly different way.

Convenience

At first, defining operator << for every class I write seemed like a tedious task. But you just do it once and then you can use it forever. Moreover, you can use classes that already have it to implement more stuff. So you just do it once really.

It makes much more sense in object oriented language

Since we’re talking object oriented here, C++ way to output stuff actually makes much more sense. The way object represented as a string is a property is the object itself.

Performance

Here is the interesting part. In printf(), the standard C library has to parse format string every time you print something. So, at runtime, every time you print something, standard C library parses your format string to figure out what and how many parameters you passed to printf(). This is important for high performance systems. Intensively printing, just anything, whether this is a debug log or part of some communication protocol rises CPU consumption of your program to the roof.

I worked for one company where they came up with a really neat trick to overcome the problem of high CPU consumption. They had to implement debug logging in C for their high performance product. So, instead of formatting the string every time there is a printout, they saved format string and then every printout they just saved the arguments in their binary form. To read the debug log, they had to read format strings that were periodically saved in the debug log file. Then they read binary arguments for the debug printout and finally formatted the message. So instead of formatting the printouts at runtime, they formatted the printouts when debug log is read.

Another problem occurs when you by mistake pass wrong data type to printf() function and compiler misses it. Another problem is when you pass a string that is not properly null terminated. Etc. Etc. C++ way solves all these problems. First, data type resolving happens during compilation. It is up to compiler to detect what type of data you’re trying to print and to call right operator<<. This improves CPU consumption situation. For the same reason, it is much harder to print something that is of unexpected type. Compiler is usually pretty good at resolving data types.

Wrapping it up

So, it seems that C++ way of producing output both better conforms object oriented way of programming and yields higher quality results. The only problem is that it requires you to get used to, but then, you do it just once.

Did you know that you can receive periodical updates with the latest articles that I write right into your email box? Alternatively, you subscribe to the RSS feed!

Want to know how? Check out
Subscribe page

10 Comments

  1. Another Alex says:

    1. Really high performance stuff does not print anything at all – there are other ways of logging.

    2. With printfs it is really easy to devise log_printf() functions that would not do anything unless appropriate log level is in effect. It is much harder to devise streams with similar properties and in any case the overhead may be significant because per class operator<<() will be called anyway.

    So after 10+ years of C++ my personal choices for logging are: (a) define fixed set of debug events, collect logging data and parse format strings processed only when 'show log' command is executed, (b) use printf-like functions.

    BTW, knowing how many times particular log was hit is very important information and it could be collected even when logs are disabled.

  2. @Another Alex
    I’ll appreciate if you share other ways of logging.

    I disagree with your second point. It is similarly easy to make debug logging code that would do nothing unless appropriate log level is in effect.

  3. Another Alex says:

    Other ways of logging: I am sorry I meant debug events mentioned at the end of my post.

    Believe me I did it it is a non-trivial exercise to write debug streams and by end of the day there still be noticeable overhead from uncontrolled calls to operator<<() – so IMHO it is not worth it to do debug streams.

  4. moe says:

    I think that operators <> were chosen for IO because of being some kind of a visual cue for both in and out operations AND because of operator evaluation precedence.

    If you want to learn more about C++ IOStreams you can check out this book:
    http://www.amazon.com/Standard-IOStreams-Locales-Programmers-Reference/dp/product-description/0201183951

  5. Lucas Tan says:

    A really high performance logging mechanism would log every thing in binary and requires a custom tool to “visualize” the log.

    If you want the log to be human-readable and still want the performance, you could do a memory-map of the log file. Every log would just be a memory operation. Periodically this memory-map gets flushed to disk.

  6. @Lucas Tan – It depends on how much information you want to log.
    I guess my opinions on logs should be viewed considering what I am working on. I am working on a large embedded system. So, we have a server with 8TB of storage just for logs.

  7. ImmortalPC says:

    Cout is slower than printf.

    // g++ -O2 -W -Wall -Wextra printf-vs-cout.cpp -o printf-vs-cout.exe && ./printf-vs-cout.exe 1>/dev/null
    #include
    #include
    #include // Write
    #ifdef WIN32
    # include
    #else
    # include
    # include
    #endif
    using namespace std;

    double get_time();

    /*
    Printf no args
    4.88426
    Printf 1 arg
    18.5892

    Puts no args
    4.51563
    Fputs no args
    2.98327

    Write no args
    20.6939

    Cout no args
    5.32271
    Cout 1 arg
    27.0982
    */

    enum { LOOP_SIZE = 100000000 };

    #define makeLoop( name, … ) \
    cerr << name"\n"; \
    begin = get_time(); \
    for( int i=0; i<LOOP_SIZE; ++i ) \
    { \
    __VA_ARGS__; \
    } \
    cerr << '\t' << get_time()-begin << '\n';

    int main()
    {
    double begin = 0;

    makeLoop("Printf no args", printf("test\n"));
    makeLoop("Printf 1 arg", printf("test %d\n", i));
    cerr << '\n';
    makeLoop("Puts no args", puts("test\n"));
    makeLoop("Fputs no args", fputs("test\n", stdout));
    cerr << '\n';
    makeLoop("Write no args", write(1, "test\n", 5));
    cerr << '\n';
    makeLoop("Cout no args", cout << "test\n");
    makeLoop("Cout 1 arg", cout << "test" << i << '\n');

    return 0;
    }

    double get_time()
    {
    # ifdef WIN32
    LARGE_INTEGER t, f;
    QueryPerformanceCounter(&t);
    QueryPerformanceFrequency(&f);
    return (double)t.QuadPart/(double)f.QuadPart;
    # else
    struct timeval t;
    struct timezone tzp;
    gettimeofday(&t, &tzp);
    return t.tv_sec + t.tv_usec*1e-6;
    # endif
    }

Leave a Reply


four − = 3