Tuesday, June 19, 2007

What is Dangling pointers in the world of C, C++ and VC++ languages?

Dangling pointer

Dangling pointers and wild pointers in computer programming are pointers that do not point to a valid object of the appropriate type, or to a distinguished null pointer value in languages which support this. Dangling pointers arise when an object is deleted or de-allocated, without modifying the value of the pointer, so that the pointer still points to the memory location of the de-allocated memory. As the system may reallocate the previously freed memory to another process, if the original program then dereferences the (now) dangling pointer, unpredictable behavior may result, as the memory may now contain completely different data. This is especially the case if the program writes data to memory pointed by a dangling pointer, as silent corruption of unrelated data may result, leading to subtle bugs that can be extremely difficult to find, or general protection faults (Windows). If the overwritten data is bookkeeping data used by the system's memory allocator, the corruption can cause system instabilities. Wild pointers arise when a pointer is used prior to initialization to some known state, which is possible in some programming languages. They show the same erratic behavior as dangling pointers, though they are less likely to stay undetected.

Cause of dangling pointers

In many languages (particularly the C programming language, which assumes the programmer will take care of all design issues, and hence do not include many of the checks that are present in higher-level languages), deleting an object from memory explicitly or by destroying the stack frame on return does not alter any associated pointers. The pointer still points to the location in memory where the object or data was, even though the object or data has since been deleted and the memory may now be used for other purposes, creating a dangling pointer.

A straightforward example is shown below:

       {
           char *cp = NULL;
           /* ... */
           {
               char c;
               cp = &c;
           } /* The memory location, which c was occupying, is released here */          
           /* cp here is now a dangling pointer */
       }

In the above, one solution to avoid the dangling pointer is to make cp a null pointer after the inner block is exited, or to otherwise guarantee that cp won't be used again without further initialization in the code which follows.

Another frequent source of creating dangling pointers is a jumbled combination of malloc () and free () library calls. In such a case, a pointer becomes dangling when the block of memory it points to is freed. As with the previous example, one way to avoid this is to make sure to set the pointer back to null after freeing the memory, as demonstrated below:

       #include 
       {
           char *cp = malloc ( A_CONST );
           /* ... */
           free ( cp );      /* cp now becomes a dangling pointer */
           cp = NULL;        /* cp is no longer dangling */
           /* ... */
       }

Lastly, a common programming misstep to create a dangling pointer is returning the address of a local variable. Since local variables are de-allocated when the function returns, any pointers that point to local variables will become dangling pointers once the stack frame is de-allocated.

       char * func ( void )
       {
           char ca[] = "Pointers and Arrays - II";
           /* ... */
           return ca;
       }

If it is required to return the address of ca, it should be declared with the static storage specifier.

Cause of wild pointers

Wild pointers are created by omitting necessary initialization prior first use. Thus, strictly speaking, every pointer in programming languages which do not enforce initialization begins as a wild pointer.

This most often occurs due to jumping over the initialization, not by omitting it. Most compilers are able to warn about this.

Security holes involving dangling pointers

Like buffer overflow bugs, dangling/wild pointer bugs are frequently security holes. For example, if the pointer is used to make a virtual function call, a different address (possibly pointing at exploit code) may be called due to the vtable pointer being overwritten. Alternatively, if the pointer is used for writing to memory, some other data structure may be corrupted. Even if the memory is only read once the pointer becomes dangling, it can lead to information leaks (if interesting data is put in the next structure allocated there) or privilege escalation (if the now-invalid memory is used in security checks).

Avoiding dangling pointer errors

A popular technique to avoid dangling pointers is to use smart pointers. A smart pointer typically uses reference counting to reclaim objects. Some other techniques include the tombstones method and the locks-and-keys method.

One alternative is to use the DieHard memory allocator[1], which virtually eliminates dangling pointer errors, as well as a variety of other memory errors (like invalid and double frees).

In languages like Java, dangling pointers cannot occur because there is no mechanism to explicitly de-allocate memory. Rather, the garbage collector may de-allocate memory, but only when the object is no longer reachable from any references.

Dangling pointer detection

To expose dangling pointer errors, one common programming technique is to set pointers to the null pointer or to an invalid address once the storage they point to has been released. When the null pointer is dereferences (in most languages) the program will immediately terminate — there is no potential for data corruption or unpredictable behavior. This makes the underlying programming mistake easier to find and resolve. This technique does not help when there are multiple copies of the pointer.

Some debuggers will automatically overwrite and destroy data that has been freed, usually with a specific pattern, such as 0xdeadbeef (Microsoft's Visual C/C++ debugger, for example, uses 0xCC, 0xCD or 0xDD depending on what has been freed). This usually prevents the data from being reused by making it useless and also very prominent (the pattern serves to show the programmer that the memory has already been freed).


No comments: