Monday, 14 January 2013

A Garbage Collection Analysis of PCGen, the popular Open Source Character Generator


I decided to combine two software loves of mine and perform some analysis on PCGen, a popular Java based open source character generator for role-playing games. I used Censum, our (jClarity's) new Garbage Collection log analysis tool to perform the analysis.

This write-up assumes you have a passing familiarity with Garbage Collection (GC) on the JVM. If you're not familiar with GC then I recommend you join our Friends of jClarity programme. We're building up a knowledge base around GC to share freely with the whole Java community, we'd love for you to come and validate this!

The projects

The two projects I'm using are are PCGen (the project I'm doing the analysis on) and Censum (the GC analysis tool).


PCGen is a popular character generator and maintenance program for d20 role playing systems such as Star Wars and Dungeons & Dragons. It's a long running project (> 10 years) and consists of a large (~750,000 LOC) Java Swing desktop tool that has a ton of proprietary format data files. Disclaimer: I'm the Chairperson for PCGen.

PCGen is a data intensive tool. In order to drive its rules engine and to meet the requirement of a responsive UI (with lots of detail) much of this data is loaded up front and held in memory. Users have reported the following issues in the past.
  1. Pauses often occur when dealing with multiple characters and/or high level characters.
  2. When creating a character of a high level or if more than 2 characters were created, PCGen simply died. More tech savvy users report that they saw an OutOfMemoryError in the pcgen log file.
Some work has been done to mitigate this poor performance in the latest version of PCGen (6.0) and so I decided to use Censum to determine if those changes had improved matters.


Censum is jClarity's new Garbage Collection analysis tool. It's focus is to use powerful analytics to crunch through the raw log data and give busy developers (like contributors to PCGen!) plain english answers quickly. Disclaimer: I'm the CTO of jClarity.

Censum is a new product that is free for Open Source projects and of course if you wish to purchase a copy you can get a free trial license (click on Start Your Free Trial) today!

TLDR - The Conclusion

We have good news, some information and bad news.

The Good News

The positive news is that the default heap settings that PCGen starts with (-Xms256m -Xmx512m) are now adequate in terms of being sized well enough to keep PCGen running. Even after creating a 5th complex character, there was no OutOfMemoryError.

Censum shows that once a full GC is run (typically after each new character has been created), a large percentage of the heap is recovered and each the character takes up about 25-50MB of heap space. We can very roughly extrapolate that with a starting (data loaded) point of ~125MB that PCGen can comfortably hold about 10-15 characters open at any one time without crashing. This is perhaps not enough for a GM to have his Goblin horde up and running, but certainly enough for most regular parties!

The Bad News

The slightly more negative news is Censum reporting that PCGen has relatively high pause times, likely triggered by too much premature promotion.

Too much premature promotion pushes memory into the old gen space more quickly that we'd like. This can have the knock-on effect of causing more old gen collections as well as full GCs, naturally leading to more pause times.

See the full analysis section for extra details on high pause times, premature promotion and what PCGen can do about it.

Where to from here?

PCGen could follow Censum's "stop-gap" recommendation to alter the size of the young generational space. By using the -XX:NewSize flag and setting that to ~256M, the high pause times problem is alleviated.

However, the longer term solution is for PCGen to continue reducing the impact of their data structures (in particular the representation of a player character). This is in fact an ongoing project for PCGen today!

The technical setup

PCGen is typically run from a shell script with the default heap settings of -Xms256m and -Xmx512m. The script was altered to provide the minimum set of flags that are required to produce GC log that can be analysed. The flags that were added to the java command were:

-verbose:gc -Xloggc:pcgen_gc.log -XX:+PrintGCDetails -XX:+PrintTenuringDistribution
  • -verbose:gc and -Xloggc:pcgen_gc.log produce a basic log that outputs to a file called pcgen_gc.log.
  • -XX:+PrintGCDetails provides the absolute minimum set of GC allocation information that Censum needs to perform an analysis.
  • Finally -XX:+PrintTenuringDistribution gives useful information on when objects are moving from a young generational space (eden, survivor 1 and survivor 2) to an old generation space (tenured).
All of these options have little to no impact on a running JVM. You should always have these switched on in Production!

PCGen was run with Oracle's Java 7u10 on a MBP running Mac OS 10.7.5, with 8GB of RAM, a 256MB SSD drive and a hyperthreaded Dual Core 2.8Ghz i7 processor.

The PCGen activities

PCGen begins by loading up basic game mode and system files + the basic UI to load data sources. The next step is for the user to select which data sources to load (roleplaying game rule sets). The popular SRD 3.5 with Monsters set was loaded (Dungeons and Dragons 3.5e).

A Character (Karianna) was created level by level into a 20th Wizard with fully loaded spells, equipment, and a Cat familiar (effectively a 2nd character in PCGen). Several more complex characters were added after that, including a Great Wyrm Blue Dragon (loads of data!).


I'll cover the initial data loading use case and then general on going usage (character creation).

Data Loading

I was curious about the memory impact of the initial data load. Although I get fast loading times with my SSD drive having PCGen load its data without memory problems is certainly a project goal! Here is what Censum showed in terms of heap allocation after GC's.

As you can see, the initial load of data caused a number of young generational collections and one old (tenured) GC event at the end of the data load. The heap usage climbed to a max of about 325MB, but after garbage was collected, the heap usage fell back to about 100MB. Not too bad for loading about 15 thick rule books worth of data!

However, data loading for PCGen is a little like the start-up period for an web/application server such as Tomcat. In terms of your GC analysis it's generally best to be excluded as part of a one-off start-up as opposed to analysing your running application.

Creating characters

Creating Karianna and advancing her to 20th level involves filling in details of 13 main tabs ~20 sub tabs and a good deal of data! Another 4 characters were created of similar complexity, some friendly (a Cat familiar) and some not (a Great Wyrm Blue Dragon).

A few screenshots of the process follow:



The Embedded Character Sheet

Censum's Analysis

On opening the log file, Censum took me immediately to its Analytics Summary screen, which lets me know at a glance as to how PCGen's garbage collection is going.

The good news

Immediately I know that:
  1. I have the right flags to collect enough GC data for analytics to work properly
  2. My GC throughput is good (PCGen spends it's time running, not garbage collecting)
  3. There are no nasty System.gc() calls (generally not good practice).

The informational news

Memory Utilisation (which a memory leak is a subset of) and Memory Pool Sizes are informational as the log has not yet gathered 24 hours of data (our recommended minimum to see a full working day's cycle for an application).

The bad news

PCGen appears to have high pause times as well as a premature promotion problem. Let's dive into those a bit further!

High Pause Times

High pause times are caused by GC collectors having to pause application threads in order to clean up object references in a safe manner. The more object references that the collectors have to scan over and clean up, the longer the pause! The longest pauses are usually caused by full GC collections, where the entire Java Heap (e.g. Both young and old gen spaces are getting really full) are cleaned up.

As a user I noticed pauses a couple of times, not enough to really disturb me, but I am aware that I have extremely good hardware and that these pause times may be significantly worse for other users.

As Censum points out, the 0.15% time spent paused is not a major concern, it's the 0.666 second pause time that's concerning. However, I remembered that the highest pause time could come from the initial data load in PCGen. To correlate this, Censum provides a graph of pause times.

The data load was the worst offender, but certainly for each character created there was a good ½ second pause around each character creation point due to a full GC. Again ½ a second wasn't too annoying for me in the context of using PCGen, but as Censum shows, full GCs take time and so PCGen should look to reduce the number of full GC's.

In this case we know that we are probably getting more full GC's than we'd like due to the other warning that Censum gives us - too much premature promotion.

Premature Promotion

Premature promotion basically means that objects that should be getting collected in the young gen space are being promoted into the old gen space before their age is up. This 'age' is known as the tenuring threshold, and is based on a combination of software research done in the 1980's and the JVM's runtime heuristics. Premature promotion can occur due to:
  1. The rate of new objects being created overwhelming the young gen space
  2. The size of the objects being created are too large to fit into the young gen space. e.g. Large contiguous blocks of memory.
This has a knock on effect of putting pressure on the old gen space. It fills up more quickly and therefore more old gen collections and eventually full GCs occur, leading to more frequent pause times.

When I go to take a look and see how soon objects should be promoted and how early they are getting promoted, I get an answer immediately. The Tenuring Summary screen shows us that the Tenuring threshold is set to 15 (objects can survive ~15 collections in young gen before being promoted naturally to old gen). Note also that 100% of objects are being prematurely promoted!

I then go and look and see when objects are being promoted. Censum tells me that as majority are being promoted almost as soon as they arrive in young gen (Tenuring Threshold of 1). That value indicates to me that the arriving objects are too large for the current young gen size. However, I double check and make sure object allocation isn't extremely high, Censum tells me that object allocation peaks at about 2.5GB/s, which is only about ⅔ of the capability of my MBP, so we're OK there.

Steps for PCGen to take

The next step for the PCGen team would be to take a look at something like the histogram from jmap to see what large objects are being created. A quick check of jmap output means that I know that we have large (multi-MB) contiguous arrays of String, char and byte.

An option for PCGen at this point is to resize the young gen space in order to fit these large objects into young gen space using the -XX:NewRatio or -XX:NewSize flags. The size to set either of these two values would depend on the results discovered from the jmap usage. In this case setting -XX:NewSize to ~256M is required to alleviate the the number of pauses.

However, that's generally not a great long term solution. The PCGen project is better off reducing the usage of these large contiguous memory allocations. Lead by Tom Parker and James Dempsey, this is exactly what the PCGen team is doing, taking out large memory structures from the monolithic PlayerCharacter class into smaller more flexible facets.


It's fantastic to see that the incredibly hard work the PCGen volunteers have put in for the 6.0 release has paid real dividends for the end users!

A couple of years ago I was tracking down a 30+ second pause that a PCGen user had reported when loading up her 25th level Kobold Shaman. It took me several days to track down that it was a GC/memory problem as I only had raw GC log files (which are multithreaded, indeterminant beasts). Going forwards I'll definitely be using tooling like Censum to solve these issues more quickly, which means I can get on with adding new features to PCGen!

Don't forget to check out PCGen for your d20 role-playing game needs and Censum if you want to solve thorny GC problems!

Martijn (Chairperson - PCGen, CTO - jClarity)
aka "The Diabolical Developer"


  1. I understand this is a blog post meant to shill Censum - but your point that being able to tune GC using only GC log files is totally wrong... especially considering the fact that you feed GC log files INTO censum.

    The free open source GCViewer doesn't have the same "wizards" that Censum has, but it does provide a very good summary of the pause times from a GC log and it makes it very easy to tune an app's GC performance.

    With the results I've been seeing as we've been moving towards using G1 in production with our apps, G1 may make a lot of GC tuning a thing of the past (so by the time Censum would really hit traction in the market, G1 might eliminate the need for such a tool)

    1. Hi Ryan,

      Thanks for commenting!

      Log vs Realtime processing:

      Just to be clear - you can feed the log into a number of tools or even just read it yourself if you have the time and the experience. It certainly doesn't have to be fed into Censum!

      You're quite correct that post processing a log is not the only way to determine how to tune a GC. But trying to tune 'live' or in real time GC information has several disadvantages:

      * Connecting to a live running JVM using the GC MXBeans actually creates its own objects (and therefore garbage) impacting the very application you're trying to tune.

      * The MXBeans don't give you as much information as the logs can (it's a limited API).

      * Real time analysis only gives you a short window for analysis, so you don't see the full picture of how the memory in your application is behaving over it's natural cycle (e.g. A payroll application over the period of a day's work, or even a week given the Friday afternoon timesheet rush). Tuning based only on partial data can cause major problems for your application.

      Other tools vs Censum:

      GCViewer is pretty decent, as is HPJmeter - but when we polled large numbers of developers and operations folk, we discovered that a majority of them didn't have the time to learn how to interpret the results that those tools give out. They were also often dealing with an emergency production issue and wanted the tool to tell them exactly where the problem was and what to do about it.

      We're trying very hard with Censum to give that definitive answer (as much as is mathematically possible) and make a solid recommendation (e.g. Set your NewSize=X). We still have some ways to go before we can recommend in all cases, but Censum is constantly learning and we're confident the accuracy and range of analytics will continue to improve as the machine learning and modelling we use gets more data.


      Your comments here are really interesting - we haven't heard those levels of confidence about G1 from anyone, even including the Oracle guys themselves who wrote it! G1 has only just gain official blessing from Oracle as a supported GC - we'll be watching it with interest but we have our doubts that it will 'automatically' fix everything. Not even Azul claims that their pauseless C4 GC does that :-).

      I noticed the GCViewer actually has some G1 support in it - so I guess it's still of need/interest to analyse G1 and tune it.

      OpenJDK and the Java community at large could really benefit from your findings, please do post them to the GC and hotspot mailing lists! If G1 can be proven to be in almost all cases better than existing collectors, that's a great leap forward for everyone.

      Thanks again and if you'd like an eval copy of Censum, let me know :-).

  2. It looks like you are loading a lot of those text-based config files and holding on to them. If you were more aggressive about removing references to the text config files in memory, that should help...?

    1. Hi Will,

      Thanks for your comment. So the code very roughly:

      1 Has a handle on the text file (a .lst proprietary format)
      2 Loads all of the data into memory
      3 Updates a Character Document Object Model (CDOM) with the data
      4 Keeps some data cached as pure reference data (like a database table with static data, but in memory)
      5 Releases the handle to the text file

      At this stage all of the data is needed to be kept in memory in order for the CDOM to correctly model the character as it gains and loses levels, as well as having classes, abilities, templates, feats, spells, equipment and much more change its stats!

      So it's hard to see what we could remove in terms of holding references - but we certainly can reduce the size of the overal CDOM per character as well as the static data...

      It's a fun engineering effort for the team!

  3. Does your censum tool analyze logs from IBM JVM?

  4. Unfortunately not no, we'll likely plan for this next year.