You Are Here:

Community: Blogs

Lauri Aalto's Forum Nokia Blog

Fixing out-of-memory issues in Redland RDF libraries

laa-laa | 12 November, 2008 10:36

A while ago my Symbian OS based project needed to process RDF data: parse it, store it, query it and serialize it. For that purpose I decided to port a readily available library, and after some browsing around I ended up with Redland RDF libraries. Redland had the features I needed, its LGPL2.1/Apache2.0 dual license was generous enough, and it was written in C without any complicated dependencies.

This post describes how I tested and fixed out-of-memory (OOM) robustness issues and memory leaks. I won't describe the porting process itself since it was relatively straightforward with P.I.P.S. libraries and there are already plenty of examples about porting code using P.I.P.S. and Open C.

Why test for OOM?

Redland, like many open source projects, grew up in unixland where memory is plentiful and supplemented by virtual memory. Programs are combined in shell scripting style where individual programs run only for a short while and the process resources are then automatically freed. On the other hand, devices based on Symbian OS are still rather resource-constrained. Compared to desktop systems, there’s only a little RAM and no virtual memory. Therefore OOM errors are more likely to happen. Additionally in Symbian OS, the scripting architectural style is rarely used. Thus program lifecycles are different from unixland. For example, in my project I planned to use Redland in long-running background servers. Leaking memory in a long-running background server is a sure way to memory allocation failures and all kinds of errors.

My target was therefore to make the library resilient to OOMs: when OOMs do occur, they are handled gracefully. No crashes. No incorrect results. No memory leaks.

OOM loop

There's already a rather well established OOM testing technique on Symbian OS called OOM loop. The basic idea is to inject allocation failures using __UHEAP_SETFAIL() heap failure macro, which in turn uses User::__DbgSetAllocFail(), and then see how the code deals with allocation failures. John Pagonis writes extensively about the OOM loop construct in his Symbian Developer Network technical paper.

I started by implementing some integration test cases that exercised the Redland libraries in a similar fashion I was planning to use them in my real program. I attempted to run the test functions in an OOM loop with lots of iterations. For example, using the EDeterministic heap failure mode to fail every kth allocation for k=1..2000. One particular issue I soon discovered was the fact that the library code used abort() extensively for handling fatal errors like memory allocation failures. For my purposes, terminating the process was not the kind of recovery strategy I was looking for, so I stubbed the standard C library implementation of abort() with my own version that throws an exception with User::Leave(). These leaves could be caught in library caller code and dealt with properly.

This enabled me to enter the following, somewhat test-driven bug fixing loop:

  1. Write new test code or extend old tests. Run all tests. Repeat until some of the tests fail.
  2. Fix any problems discovered.
  3. Go back to step 1.

This way I was able to discover and fix literally hundreds of bugs in the libraries. Most of them were relatively simple failures to check the return code of some potentially failing function, simple memory leaks and so on. Some bugs were a little more complicated, for example, requiring design-level clarifications to object ownership passing rules.

Improved heap failure tool

The OOM loop approach described above also had its issues:

  1. It was hard to determine where to set heap failure limits (the maximum value of k).
  2. Not all allocation failures would result to observable bugs but still have the system under test running in a slightly inconsistent state.
  3. Some complicated bugs were hard to debug because the allocation failure and the observed error were highly decoupled i.e. very far from each other.
  4. Some integration test cases would detect errors in dependent libraries (e.g. sqlite database or libxml2 parser) that I was not interested in fixing.

To counter these issues, I decided to not use the Symbian OS heap failure tool but to write my own. Fortunately for me, the Redland libraries only used a small set of memory management functions: malloc(), calloc(), realloc() and free(). No other functions like strdup() were used. This made it easy to implement my own versions of these functions in the porting layer DLL that already contained the abort() replacement. For replacement, I used User::Alloc(), User::AllocZ(), User::ReAlloc() and User::Free().

I decided to store the heap failure tool state information (failure mode, allocation counter, pseudorandom seed) in the DLL thread-local storage. I also added an OOM counter that would keep track of all allocation failures, simulated and real. I also added some DLL API functions to set the heap failure parameters and to query/reset the OOM counter. For heap failure simulation, I didn't feel the need to implement all heap failure modes supported by the native heap failure tool. I was happy with just the deterministic (EDeterministic) and pseudorandom (ERandom) modes.

This setup fully addresses the issues I had:

  1. A suitable upper bound for k was reached in deterministic failure mode when the test function ran successfully, produced correct results and no OOMs were registered.
  2. I could query the porting layer state for OOM failure count and see whether there had been any (undetected) OOM errors while running the test code.
  3. On a simulated allocation failure, I could set the heap failure tool to issue a debugger breakpoint using the __BREAKPOINT() macro i.e. "int 3" assembly instruction on x86/WINSCW emulator. This way I quickly discover the root causes for errors occurring much later in the test case.
  4. Dependent libraries were not affected since they were not using my heap failure tool.

The OOM counter was also useful in non-testing setup. I could use it to invalidate results of any operation to make sure the system was not running in an inconsistent state and producing "almost correct" results.

Of course, all the bug fixes have been submitted back to the open source project to benefit the whole community.

RSSComments

You must login to post comments. Login
 

Rate This

 
 
Bookmark this page: DeliciousDiggFacebookGoogleYahooStumbleUponRedditFurlTechnocratiMagnoliaTwitter  Share this page Share this page Print this Page Print this page Invite a friend Invite a friend
Email Newsletters Press Terms & Conditions Privacy Policy Sitemap Contact Us © 2009 Nokia 
RDF Facets: qdcZidentifierQSxhttpE3aE2fE2fblogsE2eforumE2enokiaE2ecomE2fblogE2flauriE2daaltosE2dforumE2dnokiaE2dblogE2f2008E2f11E2f12E2ffiE78ingE2doutE2dofE2dmemoryE2dissuesE2dinE2dredlandE2drdfE2dlibrariesX qdcZtypeQUqfnZE45E78cludedFromGeneralE4cistingsQ qdcZtypeQUqfnTypeZBlogContentQ qdcZtypeQUqfnTypeZBlogE45ntryQ qdcZtypeQUqfnTypeZCommunityContentQ qdcZtypeQUqfnTypeZE52esourceQ qdcZtypeQUqfnTypeZWebpageQ qdcZtypeQUqmarsZManagedE52esourceQ qdcZtypeQUqwebZInformationE52esourceQ qdcZtypeQUqwebZPageQ qdcZtypeQUqwebZE52esourceQ qdcZtypeQUqrdfsZE52esourceQ qfnZtopicQUqfnTopicZcppQ qfnZtopicQUqfnTopicZopenE5fcQ qfnZtopicQUqfnTopicZseriesE5f60Q qfnZtopicQUqfnTopicZtestingQ qfnZtypeQUqfnTypeZBlogContentQ qfnZtypeQUqfnTypeZBlogE45ntryQ qfnZtypeQUqfnTypeZCommunityContentQ qfnZtypeQUqfnTypeZE52esourceQ qfnZtypeQUqfnTypeZWebpageQ qfnZuserE5ftagQSxopenE2dcE2fcE2bE2bX qfnZuserE5ftagQSxs60X qfnZuserE5ftagQSxsymbianE2dcE2bE2bX qfnZuserE5ftagQSxtestingX qmarsZlanguageQUxhttpE3aE2fE2fswE2enokiaE2ecomE2flanguageE2d1E2fenX qrdfZtypeQUqfnZE45E78cludedFromGeneralE4cistingsQ qrdfZtypeQUqfnTypeZBlogContentQ qrdfZtypeQUqfnTypeZBlogE45ntryQ qrdfZtypeQUqfnTypeZCommunityContentQ qrdfZtypeQUqfnTypeZE52esourceQ qrdfZtypeQUqfnTypeZWebpageQ qrdfZtypeQUqmarsZManagedE52esourceQ qrdfZtypeQUqwebZInformationE52esourceQ qrdfZtypeQUqwebZPageQ qrdfZtypeQUqwebZE52esourceQ qrdfZtypeQUqrdfsZE52esourceQ