Bloom filters

Bloom filter is a data structure that contains set of elements. Unlike regular data structures it cannot contain data that is associated with certain key. Neither it can contain keys themselves. The only type of information it can contain is whether certain key belongs to a set or not.

You must be wondering what it is useful for. Here’s typical scenario for using bloom filter. Lets say you have large data structure and you often have to check if particular member is in the data structure. For example, lets say you have large binary tree and you often query the tree if it contains some element.

Read the rest of this entry »

printf() vs stream IO in C++

Before joining Dell I was mostly working in kernel writing in C programming language. At Dell I still work on mostly low level stuff, but this time it is user-mode, so I am not tied up to C anymore. We’re writing in C++ and I am learning C++. One of the less appealing things for me in C++ was streaming support and the way its input/output implemented. In particular I got used to printf() functions family and leaving those in favor of streams and cout was tough. What really strikes me is the fact that no C++ book explains this stuff. All C++ books just tell you – so, my dear, this is how this stuff is done in C++. It took me some time to realize how C++ style input/output is much more convenient and powerful than printf() family of functions. Here’s why.
Read the rest of this entry »

gcc macro language extensions

One of the great things about gcc and in particular its C/C++ preprocessor is various extensions that it has. In this post I would like to briefly describe three of them. One allows to turn C/C++ token into a string. Here token is anything that you can pass as an argument to a macro. Second allows you concatenate two tokens to create new expression. The last one allows C/C++ macros with variable number of arguments.
Read the rest of this entry »

UML cheatsheet

Every once in awhile, I have to draw a UML diagram. I rarely do serious designs with UML, however sometimes I do need to depict some piece of code in a diagram and UML seems to be the best notation around.

Unfortunately, various sources of information on UML tend to over-complicate things. I am not software architect and drawing UMLs is not my job. So my UML skills are poor by definition. Moreover, I am happy with this situation and don’t see it changing in the future (even if I get promoted ;-) ).

So from time to time I need a simple UML reference card. Simple search finds references like this one, which are excellent if you are serious about UML, and I am not.

Eventually, I decided to write a short UML class diagram reference card for myself. I hope you will enjoy it as well.
Read the rest of this entry »

Making writes durable – is your data on disk?

Here is an interesting article written by Evan Jones. The article explains how you can be guaranteed when your data is on disk.

In case you’re wondering, when write(), fwrite() or any other library call that writes data to disk reports success you are not guaranteed that the data is actually on the disk. In fact, in Linux, write() reports success when data is in dirty cache. Then, special kernel thread kicks in and makes sure that the data is on disk.

Depending on circumstances, it may take some time until writer kernel thread will finish writing. Anyway, in his post Evan talks about how to make sure that the data is actually stable on disk.

Models for multithreaded applications

As you know, I changed a couple of workplaces during my career. Long story short, one interesting thing that I noticed in different companies is various models for multi-threaded programs (mostly for large embedded systems).

Read the rest of this entry »

Python for bash replacement

When I started learning Python, I was looking for a programming language that would replace BASH, AWK and SED. I am a C/C++ programmer and as such I better invest my time into studying C and C++. Instead, every time I needed some complex script I opened up a book on BASH and refreshed my knowledge. And since bumping into boundaries of what BASH can do is relatively easy, I always opened awk/sed book few minutes later.

Actually, this is quiet common. Once in a while I see my colleagues, just like myself, open up a book on BASH. The problem is that because we don’t actively program BASH, the knowledge and experience that we gain from this experience wear out over time. So next time we approach, so we have to repeatedly study BASH stuff over and over again. And again, this is not only BASH I am talking about, but also AWK and SED.

It is utterly broken state of affairs and I wish there was a solution. Unfortunately there is no solution yet. The good thing is that with some effort the solution may arise. I am talking about Python programming language.

Read the rest of this entry »

pthread_exit() in C++

Today I ran into an interesting problem that I would like to share. I am working on multi-threaded code in C++. Here’s what happened.

I started a thread that looks like this:

try {
    do_something()
} catch (...) {
    std::cout << "Got unknown exception" << std::endl;
}

The do_something() routine eventually called pthread_exit().

Once I ran this piece of code, I instantly got an unknown exception notification. Long story short, here’s what I found out.

When calling pthread_exit() in C++, it has to destruct all objects that has been created on stack. This process called stack unwinding and this is exactly what happens when you throw an exception. pthread_exit() utilizes this feature of C++ to cleanup before shutting down the thread for good.

To do that pthread_exit() throws some obscure exception and catches it right before ditching the thread. This way it cleans up all objects nicely. On the other hand, catching … becomes impossible.

My next programming language

This week-end I’ve been playing with various version control systems. Until now, I’ve been doing all my home codings with subversion. I’ve written about bazaar in the past, but it seems to me that bazaar isn’t going anywhere and it busts any piece of motivation that I have to continue writing about it.

Version control that I did try is git. This is a very popular version control system and for a good reason. Comparing git and subversion brought me to a conclusion that git is really a very consequence of how things work in the world. Thing about git is that it is a distributed version control while subversion is not. You can make a distributed version of subversion, but it won’t be subversion anymore.

Distributivity is one thing, but there are more. Take the standard trunk/branches/tags layout that you have to create in subversion – version control could do it for you, as git/bazaar/mercurial do. At first, after working with CVS for some time, having an option to have non-standard layout seemed cool for some time. But then it appeared completely useless.

This brings a notion of evolution into version control systems and I bet there’s similar process with programming languages. I think we can already starting drawing a programming language that will replace Python.

Yep, Python isn’t perfect. I thought I’d compile a list of things that shall be different in ought to be programming language that comes after Python.

Read the rest of this entry »

Call a constructor or allocate an object in-place

Since I joined Dell, my main field of research and work has somewhat changed. Now I am mostly working with C++ and file-systems. This world is not entirely new to me, but apparently I have a lot of stuff to learn.

Today I’d like to talk about one nice trick that I learned few days ago.

When working with large software systems, memory management becomes an imperative. In C, you can easily allocate a large chunk of memory and allocate structure right on that buffer. This is by far more difficult in C++, because compiler has to call consturctor.

Apparently, you can, in a way, directly call object’s constructor . I.e. you can allocate an object, on specified memory region, without actually allocating this region.

This is how you do it.

char* s = new char[1024];

SomeClass* p = new (s) SomeClass;

First new operator just allocates 1024 bytes. This is good old allocation as we know it. Note the special syntax of the second new operator. It allocates the new object on memory specified in brackets. Basically, this calls SomeClass’s constructor using s as storage.

One thing that I don’t know how to do is how to call destructor on the object – i.e. how to delete an object in place.