Using PoolMon to Analyze RAM Cache in Nonpaged Pool Memory
In case you missed it a couple of weeks ago, Andrew Morgan (one of our CTPs), posted a great article on how to accurately determine the size of the new RAM Cache.
As Andrew pointed out in his article, we now use nonpaged pool memory, so it's easy to fire up PoolMon and investigate. But I wanted to clarify one thing since Andrew only commented on the key pooltag denoted by 'VhdR.' (He said he reached out to Citrix for further insight, but received no response; so allow me to respond!) Andrew is spot-on that we use 'VhdR' for RAM cache allocation. But we also use 'VhdL' for internal metadata allocation, so that is the other pooltag to key on and grab for any scripting. It's never going to be very large, but I did want to point it out since it's the other pooltag we use in case you want to incorporate it into any scripts.
Using WPA to Really Dig Into PVS
Working at Citrix has its benefits. One of those is being able to talk to the brilliant developers and product architects who write our code and get some "inside info." In this case, I talked with Moso Lee, who is the brains behind the new RAM Cache with Overflow to Disk technology (so we all have Moso to thank!)
We were talking about monitoring and debugging PVS, and he quickly pointed out that we've always had an Event Provider for PVS (look for 'VhdEtw.xml' in the PVS installation directory). And if folks want to go deep with PVS and identify performance bottlenecks, then you might consider using Windows Performance Analyzer (WPA). I'm not going to go into detail on how WPA or event tracking works, but I do want to provide a quick example of how to use this powerful tool to understand truly and debug PVS. Because if you want to understand how our PVS driver works, how we're manipulating the storage stack or when we're failing over and writing to the VHDX disk, for example, then this tool and article are for you! It's certainly not for the average IT admin, but I know all the PVS geeks and filter driver gurus out there will love it.
Let's get started.
As I mentioned earlier, PVS is an ETW provider for WPA. So, you'll first want to grab the WPA which is part of the latest SDK for Windows 10. You can selectively install the Performance Toolkit as shown in the screenshot, which includes WPA and WPR.
First, we're going to use the Windows Performance Recorder (WPR) and simulate some PVS disk and file I/O activity. Then we're going to analyze what happened with WPA. So, fire up WPR and click Add Profiles and then point to this file, which is a PVS-specific template or profile that allows us to receive the events generated by the PVS event provider. So, import that profile, copy the other options you see in the screenshot below and click the Start button.
Now, we'll simulate some PVS activity in our lab (or your production environment if you dare!).
In this quick example, I'm using the new write cache method with a small memory buffer of 128 MB (please don't use this small of a buffer in the real-world!). I'm going to copy a 279 MB file to C:\Users\User\Documents\test.bin so I can force the PVS driver to not only "write" some data to the nonpaged pool, but also so we can see what happens when we fail over and start writing to the local disk (such as "D:\vdiskdif.vhdx"). After you're done copying the file and forcing the buffer to fill and spillover, you can stop the capture in WPR and open the results in WPA.
In the Graph Explorer within WPA, expand System Activity and select Generic Events. If you look at the screenshot below, there are a couple of key lines highlighted - WriteData and WriteRamData. This information shows the exact count of files being written to C:\vDisk (2419) and our VHDX file on the D drive (348). The "WriteData" is less than shown because it's cached in RAM and not flushed to disk quite yet. But let's keep digging to understand more.
Again in the Graph Explorer, expand File IO and Count by Type. This picture (and the following screenshot) shows the reduction of IO (file count) and the duration of time it takes between writing to C:\Users\User\Documents\test.bin and the spillover write cache file at D:\vdiskdif.vhdx. Very powerful stuff so you can easily identify pesky performance bottlenecks and rule out the PVS filter driver as the culprit.
I think it's probably wise to stop there since this is a lot to digest I'm sure. But for those PVS geeks out there like Andrew, you can go deeper to understand where exactly the data is being written initially (and where it eventually lands) using disk offsets. Just go back to "Generic Events" and tweak the column view and WPA can show the data transition in the various storage layers. And then if you really want to blow your mind, go back to your PVS environment and set the RAM cache buffer to 0 MB and re-run the Recorder and Analyzer. Then you'll get a clear picture on how we spillover to disk!
As I mentioned earlier, using WPA to debug PVS is probably not for everyone. In fact, it's probably not for 99% of our customers. But the next time you think you have a performance problem related to PVS and ProcMon & Wireshark aren't cutting it for you; this is a great tool to have in your bag of tricks. It's the only way to understand what our PVS driver is doing at a low-level. I hope you enjoyed this article, and I'd like to give one last shout-out to Moso Lee - he deserves most of the credit for this article, and we all should send him an email thanking him for giving us this new wC implementation!
Nick Rintalan, Lead Architect - Americas, Citrix Consulting
Moso Lee, Software Engineer - PVS, Citrix Product Development