According to test test application, gtkmm needs significantly more time when default signal handlers and virtual functions are enabled. What gtkmm basically does is creating a new class inheriting from the corresponding class in GTK and overriding all vfuncs and default signal handlers. In the overridden method, it gets the C++ wrapper object for the C object (if any) and casts it to the actual type (e.g. Gtk::Window). As an optimization it checks already whether the C++ object is a custom one that derives from the gtkmm one. If not, the vfunc could not have been overridden anyway and we just call the parent's (GtkWindow) vfunc implementation. Otherwise, a C++ virtual function is called that classes deriving from Gtk::Window might override. This way, GObject vfuncs can be overridden in the C++ code.
For Openismus, I tried to find out what makes this procedure slow. To do so I used callgrind to collect some profiling data from the test program mentioned above. Then, I installed Qt and kdelibs just to be able to use kcachegrind to visualize that data. Wouldn't someone like to write a kcachegrind-like application for GNOME?

I had a look at Container_Class::forall_vfunc because this was the one called most times (64 025 calls, to be accurate). kcachegrind shows that 3.10% of the total time was spent in forall_vfunc (actually, its not time but CPU cycles or something, but as a physicist I assume they are proportional...). Out of these 3.10%, 1.48% are spent in dynamic_cast<>, another 0.99% in ObjectBase::_get_current_wrapper(). The latter call looks up the C++ object from the C object, and the dynamic_cast casts it from ObjectBase to the actual type. Both calls are at the very beginning of the function. Also of notice is that 2.50% of the overall time is in dynamic_cast<>. This means that just the forall_vfunc issues more than the half of all dynamic_casts in the whole program, not counting all the other vfuncs.
What I tried out now is to do the dynamic_cast only if the C++ object is derived from the gtkmm class. In the test application, this is only the case for ExampleWindow but not for all the other widgets that are created. If the object is not derived, the actual C++ type is not needed because we just call the parent vfunc anyway. The result looks like this:

The time spent in forall_vfunc is now nearly only the half as before. It does no more call to dynamic_cast. This is because all containers on which the forall vfunc was called are not overridden by a C++ class. Other vfuncs, as expose_event, still showed a small amount of dynamic_cast calls. These were the expose events the Window received because of being derived by ExampleWindow. There is a bug open about this.
The next thing I had a look at was the ObjectBase::_get_current_wrapper() call. kcachegrind shows a total amount of 1.70% out of which 0.14% are spent in the function itself (and not in functions being called by get_current_wrapper() in turn). However, the function does nothing more than calling g_object_get_qdata on the given GObject to get the C++ wrapper. Therefore, it seems that a significant amount of time is lost due to the overhead of a function call here because the function is often called (106 674 times), so I tried inlining it:

Any calls to get_current_wrapper() disappeared now. Instead, g_object_get_qdata() shows up directly in forall_vfunc. A good way to verify that the compiler actually inlined the function call. Again, there is a performance gain, but way smaller.
So long, but what about actual execution times? When running the test application, there was no noticeable speedup. However, when running under valgrind (meaning about 10-30 times slower), it finished one or two seconds earlier. To conclude, for most users the changes will probably not be visible, but for large applications and/or less powerful machines gtkmm applications might run a bit more smoothly now.