Steven Tattersall's personal blog
Created on: 2008-09-19
Posted in product 24 debugging low-level
I originallly wrote this piece for an internal Sony discussion about memory debugging, about 2 years ago. In lieu of actual creating any original content for this website, I thought I'd resurrect this piece, and polish it to make it less dry than the original report. It's unusual, since games developers rarely admit in public to anything that doesn't sound like a shiny, new, perfect game engine. But game development is now more the art of the possible than the art of whizzy code.
I should also point out that a lot of the hard work was done by the two people who worked on the systems originally: Chris Sorrell and Gavin Bell. Kudos to them.
I originally started working on "24" in mid-2005. By that point the game had had over 2 years of development on it. "24" was based on Cambridge Studios in-house PS2 engine, started in 1999. This is the fourth game based around the PS2 engine. It was originally developed for Primal, a third-person action adventure with on-foot action. "24" had a very different requirement set, as it attempted to incorporate free-roaming driving missions. The loading patterns for driving areas caused vastly more "problematic" memory usage than the slower on-foot sections. There were also quite a few subtle issues caused by persistent memory allocations, which we will see later.
With a couple of months left on the project we were still having serious memory and stability problems, so extra staff was added to work on diagnostics (i.e. me).
Terminology for memory allocation seems to change with every engine I see, so here is the list for this piece. (This differs from the class names in our code, which reused "pool" in more than one context).
The core memory allocator had been custom-written from the ground up, six years earlier. It was, in essence, a best-fit allocator using boundary tags (as far as I could tell). It also had some custom bucketing code to try to boost performance -- to skip any candidates that were known to be too small for a potential allocation, it tracked the free areas in "size buckets" and only looked in the bucket of sizes that were as big as, or bigger than, the amount of memory you wanted to allocate.
The basic heap allocator also offered the usual additions:
There were several heaps in the game:
All the heaps and their sizes were allocated at boot-up of the game. We couldn't reinitialise heaps at runtime, as the engine design meant that some systems (particularly the render system, as it turned out), couldn't cope with it.
The Really Hard Bit was getting rid of memory fragmentation.
The resource system and loading system on "24" made a separate allocation for each loaded resource (and possibly extra allocations on post-load processing). During runtime, there could be up to around 2500 allocations in the main memory heap, most of which would be resources.
These are the main root causes of memory crashes we had in 24.
Although best-fit allocation performs well in general tests, we ended up with some interesting problems innate to its approach. In particular, memory "islands" caused by persistent allocations in otherwise free space caused us serious problems later on.
A bit of reading up showed that best-fit allocators are very prone to this, which is intuitive when you start to think abou it. Guess what? We were using a best-fit allocator.
Very late in the project we added custom allocation strategies to force allocations to the top or bottom of memory heaps, or into heaps that they would not have used with the default strategy. An example would be that when movies with embedded WAD data were played back, movie buffers were forced to the top of the heap, and resource allocations to the bottom. Although this was an unpleasant hack, it gave us a measure of robustness to the "islands" problem. It was invaluable in actually getting the game to ship. (In essence we were switching to using a first-fit allocator for critical areas).
Resource relocation was added to the engine before "24". It helped fix any fragmentation problems we had seen on earlier titles. However, on "24" it wasnt powerful enough to fix everything. This was mainly because the balance of the mixture of relocatable to non-relocatable objects in the heap shifted between projects there werent enough movable objects in the heap any more, and we didnt have time to recode systems to cope with relocation.
Relocatable types were:
Everything else (in particular, character model data) was non-relocatable. Analysing the patterns of how this caused fragmentation was essential for us.
The relocation allocator code just tended to push resources to the earliest big-enough free space in the top of memory. To keep overheads down, relocation was constrained to a maximum of 4 resources, or 64K, per frame. Generally, fragmentation wasnt limited by this relocation rate; it was more that the memory patterns were unrecoverable by that point because of large, unrelocatable resources sat at inconvenient positions in memory.
In conclusion, although we had a very sophisticated memory system, it stored up quite a lot of problems for us later down the line. What had served us comparatively well in early games didnt stand up very well for a game with very different gameplay styles.
If I had my time again, and was the Lead, I would force non-relocatable and relocatable resources to be separate heaps, and set my budgets accordingly (easier said than done, though).
As might have become obvious now, our engine did a lot of allocation. Loading up a driving level and checking the number of active allocations gives a figure of over 13000! Of these allocations, 12000 are in the bantam ("small") heap.
At the same point there are 1360 separate loaded resource files, of which 900 are theoretically relocatable. However, in a typical test these relocatable resources comprised only about 6MB of the 20+MB of the allocated heap space (this ratio varies, of course).
Some tests we did after the game was released showed that we were doing around 200-300 memory operations (including freeing memory) per frame during shooting levels. On driving levels it was around 100-200, but with spikes of up to 700 per frame when streamed loading caused object creation (a spike has been seen with 5000 deallocations in a frame during gameplay). On previous PS2 titles this was actually worse; there was a lot of work done to recode rendering allocations which cut down this figure quite significantly. Nearly all these per-frame allocations were of small size and went in the bantam heap.
Clearly this would have an effect on real-time update for any external tool. We can get some more exhaustive statistics on this if anyone wants them.
To run reliably on driving levels, we had to maintain an overall level of at least 1.5MB free in the main heap. Any less and fragmentation was liable to cause a crash.
These are the debug fields saved per-allocation on the main different build types.
Feature |
Debug |
Internal disks |
Master |
Allocation size |
Yes |
Yes |
Yes |
Allocation time |
Yes |
Yes |
Yes |
Full name tag (supplied in code) |
Yes |
No |
No |
Allocation PC |
Yes |
Yes |
No |
Mission ID |
Yes |
Yes |
No |
Runtime overhead was 32 bytes/allocation on debug disks/masters, 48 bytes on full-debug builds.
The "mission ID" field was a late addition. It allowed us to track quite effectively leaks between missions, particularly when doing automated tests, and see how long some allocations actually persisted.
Using names tags for allocations often led to problems. Cut and paste code, and template code, often made it difficult to provide useful tags and a lot of STL-style classes ended up with all their allocations with the same name until we reworked the code. This often led to a lot of subtle leaks being ignored by programmers!
There was the facility to do boundary corruption checks by tagging data beyond allocations with known values. Surprisingly, this feature wasnt that useful to us, although it took a lot of work we found only 2 cases where it fired, and they were probably debuggable anyway, as the bugs would have corrupted internal memory pointers.
The usefulness of this feature was lessened because the checks for corruption usually fired a long time after the code responsible for the corruption, sometimes several seconds later. Checks were usually done when the block itself, or surrounding blocks changed state (usually being freed). Corrupting the memory system pointers inadvertently made the problems usually come to light earlier -- usually when doing debug per-frame integrity checks.
Memory reports were normally triggered from in-game menus and provided the usual set of filtering per-pool and per-type, full dumps, sorting by tag etc. I guess this is all pretty standard.
The in-game display of fragmentation gave us a good quick overview of memory layout. It had a small memory footprint so we could leave it operational on all disks other than master submissions.
In-game
fragmentation display. Top panel
is main heap, lower left is bantam heap (dangerously full!). Different colours represent unique allocations.
When used in-game, there were keyboard controls to browse memory and check the type of any selected allocation. There were also options to highlight allocations with specific names, any allocations after a set time, or by which mission they were allocated on. This gave a quick and easy way of spotting obvious leaks, and watching allocation patterns (a sobering experience).
Unfortunately, since the control was coarse, it wasnt that useful for in-depth analysis; we generally had to resort to written log data for that.
In the final stages we saw a lot of discrepancies between even "near disk" builds -- where all build options were the same as a disk build except for using the disk as data source -- and the disk builds themselves. We also had a lot of cases where memory fragmentation was only being reported on testers PS2 Debug kits.
Since we couldnt afford the IOP memory to add network support, we logged quite a lot of statistics to memory card files. This had near-zero runtime/memory overhead.
Statistics included:
The dumps were all in normal ASCII format; this made parsing very easy. Generally the testers played all day then we copied all their memory card logs off for later analysis. Using the memory cards also meant that the testers didnt need much external tools to be run; they could use it in a fire-and-forget manner.
We then very quickly knocked up Python scripts to parse the outputs. By also parsing the MAP file that matched the disks executable, we could also do lookups on all the code that allocated the memory, rather than storing debug strings at runtime.
The scripts could then give full annotated and totalised reports, and visual representation of the memory layout (similar to the in-game fragmentation display). There was no interactive use of the tools; we never really found a case where it was necessary. If you really wanted to know details of an exact allocation we just looked in the text reports.
The exception handler scripts could also generate a full binary image file that could get loaded into the PS2 debugger this was terrific for checking out complicated crashes.
In short, this system was probably critical in hunting down most of the really nasty crash problems we had.
output from a Python script showing global heap memory layout extracted from memory card logs. This shows a snapshot dumped after an out-of-memory exception, drawn in a similar way to the in-game fragmentation display. White areas are remaining free memory. The text at the start of each allocation is the truncated name of the C++ function that did the allocation. In this view, all allocations from the same code have the same colour
the same view as above, but this time filtered to show allocations inside the current level (in pink) and those from before (green). Here its clearer to the trained eye that we are seeing minor fragmentation in the middle of the heap. This is because of allocations that live longer than one mission here script allocations and the language manager
The debug builds supported a rolling buffer of the last 8192 memory operations in the system. It was optionally built in using conditional compilation.
Surprisingly, we only found this feature useful on one occasion. This was a crash bug involving render DMA and cache invalidation on non-disk builds, where the fact that near-consecutive allocations partially overlapped causing unexpected data cache write-backs.
On PS1 our approach to memory leaks was simple; we made a simplified snapshot of the memory state at the start of, say, a level. Then on exit of the level we compared the memory state to the snapshot. Anything that didnt match was a leak.
On PS2 this approach fell down quite badly. As well as the fact that not all systems shut down on level exit, it seems that C++ code (or maybe just our C++ code!) gave memory usage that didnt fit this model. One example would be our implementation of dynamically sized arrays, similar to STL vectors. Although they did not leak memory, they would end up with internal buffers that would grow but would never shrink back if the number of objects within it decreased. Often these reallocated buffers would be created at extremely inopportune times e.g. when the bantam heap had overflowed into the main heap, causing consistent fragmentation.
Early on in PS2 development the decision was taken not to assert when possible leaks were flagged. (This may well have been a mistake.) The warnings were always ignored by users; often the warnings were spurious, and in many cases even if the warnings were valid, they were unhelpful e.g. on the lines of "there are more generic list objects in the world than when you started".
Given no asserts, the main approaches we found were:
The automated tests and memory card logs were invaluable in gathering the data for the "hard work".
At the start of doing the final work on memory debugging we thought that real-time update would be invaluable. As it turned out, it wasnt nearly as important as we expected. Although the in-game display was handy for getting an overall view of what was going on, most of the crucial events were only exposed in later analysis.
The most useful debug capability to me, by far, was the ability to dump a snapshot at a known point (triggered from code), and potentially visualise them offline. This let us see very clearly obvious leaks and fragmentation islands at important points. Because we still had allocation time in the debug dumps, it was easy to trace patterns of allocation.
The other really obvious point was that having this data available in an easily-parsed format was vital. We just used simple CSV-like text files. It meant that offline processing was trivial and would allow us to do automation.
Here are the main things to take away from all this:
And finally...