Fun with Retained Mode

 

Sunday, March 29, 1998 - Secrets of MFC and DirectX

I've had this annoying bug in my code for over two years.

When my app would terminate abnormally (usually during debugging) my D3D allocated VRAM would not get freed up.
The way resource management works in D3D (and the same thing is also used for DirectSound) is that a helper program called DDHELP.EXE is fired up the first time you run a DirectX program.
When you hand an HWND to DirectDraw or DirectSound, that HWND is given to DDHELP.EXE which subclasses your window.
In Windows-speak, to subclass a window is to grab all of its messages. The subclassed window gets them before the original window gets them.
It kind-of makes sense from an OO point-of-view.
Every time you allocate a resource, DDHELP.EXE gets told about it along with your HWND so it can keep track of what you've allocated.
Then, when your app dies, DDHELP.EXE is supposed to get the WM_DESTROY message for your window and free up all the resources you've allocated.
If for some reason this doesn't work, the only way the resources can be freed is if your app does it itself before exiting. Otherwise they are locked up forever.
DDHELP.EXE has gotten more reliable over the years but because of the general flakiness of this approach (this resource management should have been built into the OS but since DirectX was initially a brilliant hack it's never been built into the OS) DirectX app writers have always been encouraged to free everything themselves.
(They are also encouraged to free COM objects in the opposite order they created them but this was to avoid bugs in the use of COM by DirectX.)
If you kill off DDHELP.EXE using the ever-popular KILLHELP.EXE routine distributed with the DDK, it is true that your system will seem to 'reset', but those resources are not recovered until you reboot.
(But you can try resizing your display - that will either hang your machine or free up the resources by marking all of the surfaces 'lost'. Since your app is dead it will never call 'Restore'.)
In my own code I like to summarily kill my own app with ExitProcess(). This is a really fast way to shut down an app - in a world of perfect operating systems this is the fastest way to close down an app and free up any resources your app has consumed.
Here's the key: DirectDraw works really well if you give it the wrong HWND. You would think there would be all kinds of problems, but really, you won't have any problems at all until you try to shut down your app. (Alt-Tab won't work right either. Many people would consider this a feature.)
There's only one problem - your code can exit and your resources will never be freed unless the window that owns the HWND you gave to DirectDraw gets a WM_DESTROY message sent to it.
How do you create a window that won't ever get a WM_DESTROY message?
It's simple - you create a child window of your main window and mistakenly pass that to DirectDraw. Then shutdown your app really fast, so the child windows are never sent the WM_DESTROY message.
It says quite clearly in the DirectDraw documentation that you should pass the HWND of the "active" window to DirectDraw, but this is wrong. You need to pass the HWND of your main window. It is the only window that is guaranteed to get a WM_DESTROY when your app dies.
If you're using MFC like me, it's naturally to assume that your CView derived window is the active window.
Big error… Everything will work fine except for shutting down. Your CView window may never receive a WM_DESTROY, and so DDHELP.EXE is never activated and your resources are never freed.
There are problems shutting down normally as well as abnormally.
The famous "Sleep(500);" fix that I invented helps with normal shutdown because it lets the WM_DESTROY message arrive at your CView based window.
For some reason, these problems were always worse on Rendition-based boards. It could be that their shutdown code was very fast… I'm not sure. The problem appeared more often on faster computers. I suspect this is because on slower computers MFC would use up an entire time quantum which would let other threads run and deliver that message. I'm just guessing though. The behavior was definitely consistent with what parallel processing-type people call a 'race condition.'
When you create any DirectDraw or DirectSound objects, and you use MFC, be sure to pass in the HWND from

AfxGetMainWnd()->GetSafeHwnd()

This is the only safe thing to do.

Ignore the documentation when it tells you to pass in the "active" window. Always pass in the HWND for your top-level window. Only pass in a different HWND if you create a different top-level window. Never pass in the HWND for a child window.
 
NOTE: Why use MFC?
The shutdown problem was hard to solve because not that many people have experience with MFC and DirectX - particulary people at Microsoft. MFC is handy for creating user interface code but not much use for game code.
In my case, the SuperSet game engine is both the game engine and the game editor. I edit levels in 3D the way grownups ought to do it. I'm surprise more people don't do it this way but the preponderance of editors is 2D with perhaps a 3D view of what you've assembled.
In fullscreen exclusive mode, MFC is pretty much invisible in SuperSet. In windowed mode, however, all kinds of incredibly useful dialog boxes and menus are available, which is why I use MFC. I can make a new subclass for a game object and put in a UI to deal with it in just a few minutes. This is extremely productive.
Whether you use MFC or not, it's time to build game editing functions into your engine. It is very easy to do if you architect for it from the start - and very hard to do if you don't. So start thinking about it now for your next game engine.

 Back to 'Random Blts' Table of Contents 


Wednesday, April 1, 1998 - More fun with MFC, RM, and HWNDs

I was trying to remember why it was that I wanted to use the CView's HWND instead of the CmainFrame's HWND in the first place.

Then I remembered …
If you use "D3DRMDevice::CreateFromClipper()" in Retained Mode, you want to pass in the CView's HWND because you want to write to the window that is the actual size of the client area.
(I've recently written the code in Immediate Mode to create a device of the right size and it takes many lines of code to do what CreateFromClipper() does for you. CreateFromClipper() is a great way to get started.)
If you give CreateFromClipper the HWND for CmainFrame, then you'll be giving it the HWND for the window that contains the button bar, the status bar and any other stuff the MFC guys and gals invent in the future that become children of the CmainFrame window.
So what you really want is to use the CView's HWND, because it more closely matches the client area. In fact, it should match it exactly.
CreateFromClipper will also create a Clipper object and attach it (hence the name) so any overlapping windows or menu items will behave properly.
Coupled with the documentation error about an "active" window, the CView's HWND is the logical choice.
It's just that it's wrong.
The trouble is, the only time you'll notice the trouble is when shutting down your app.
According to the Windows documentation for WM_DESTROY, it is first sent to the parent window and then to the children windows, if any.
It says quite clearly that you can assume that your children windows are still around while you handle WM_DESTROY but that your window is already destroyed.
This means the children haven't been sent WM_DESTROY yet.
Which means if you kill your app off really quickly at this point with ExitProcess or, frequently, just let MFC shut itself down, your CView window will never get a WM_DESTROY.
Which means that DDHELP.EXE will never get the WM_DESTROY message for your CView.
Which means your resources are never freed.
 
NOTE: In DX3, there was a bug in RM whereby explicitly deleting all RM objects, including the RM device, would still not completely reset any and all state information stored in the RMDevice.
This was because the RM object was secretly the same as the D3D object, and it was only by destroying the D3D object that you could really delete the RM object.
This problem showed up, for instance, when you deleted an existing RM Device and created a new one, say, for switching between windowed mode and fullscreen mode. On the Voodoo, for instance, which definitely had to be a different device when switching modes like that, the RM Device description down deep in the code would have incorrect information about the texture formats of the Voodoo, because it never properly reinitialized itself with the new device information.
This problem showed up even though you might never have actually even explicitly created the D3D object.
RM did it for you.
And then failed to free it for you.
This was all caused by a COM reference counting bug.
Like I said, COM sucks, because it makes it way too easier to introduce errors of this kind.
In a way, this is all for the good, because DX has exercised COM in ways it was probably never intended to be used. Creating thousands of COM objects (which RM does) is pretty different from embedding a few controls in a container.
 
ANOTHER NOTE: In the article above I mentioned that lighting in DX5 is screwed up, particularly MMX lighting.
I would like to point out that Stephen Coy at Microsoft rewrote all of the lighting code for DX5 and it does the right things.
The trouble is, Retained Mode doesn't use the new lighting code and won't until DX6.
Major bummer.

 Back to 'Random Blts' Table of Contents 


Monday, May 11, 1998 - HWND problem to be fixed?

The trick to getting Microsoft to do something is to make a really clear statement of the problem.

They are always so busy chasing down 8,000 different problems that unless something is relatively simple and straightforward for them to understand, it frequently gets relegated to the bottom of the list of things to do.
Now that I have an extremely simple explanation of the memory/resource leak problem with HWNDs and DDHELP.EXE, I suspect the problem will be fixed. Or at least made a little better.
(I spent some time at CGDC making sure a number of people understood the problem. And I was told that they'd made changes to DDHELP.EXE within the last few days.)
The reason, for Microsoft, to care about this gong forward, is that, while it is true that most (all?) games run full screen and are unlikely to have this particular problem, in the future, Microsoft is going to want people to develop XML / Chrome based Web sites.
ActiveX controls embedded in IE are going to have to use the wrong HWND - they have to specify their own window when they create a DirectDraw object, which is a child window of IE, and not a top-level window.
This means all 3D ActiveX / Chrome content will be hopelessly broken, as any crash (which are frequent in IE, as it is one of the few remaining components of Windows that is written with a "cowboy style" of programming) will use up scarce resources in the video card.
So, expect it to be fixed. Someday. Too bad they didn't add a new API in Windows 98 specifically for use by DirectDraw / DirectSound / DDHELP in order to make sure that when a THREAD dies, rather than the top-level window of a PROCESS dies, that all the resources are properly freed.
Oh well. Maybe in NT 5. (Okay, NT 6.)
[5/21/98 - I figured out how they can fix this with a patch to DDHELP.EXE: all they need to do is each time they register a window handle, they should traverse up the tree of parent windows until they get to the root. By keeping the entire list around, anytime any one of those window handles gets a WM_DESTROY message they can free any and all remaining resources. Simple.]

 

 Back to 'Random Blts' Table of Contents 


Thursday, August 05, 1999 - Write your own rendering loop for Retained Mode

I posted a remark on the DirectXDev mailing list that I had a method for hooking Immediate Mode (IM) rendering into a Retained Mode (RM) app. Since then I got a couple of requests for more on the technique so here it is.

IMHO the problem with all scene graph systems is that they want to take over the rendering loop. This is a problem because rendering technology is changing very fast so a good scene graph system will have lots of places to hook in your own rendering loop using the scene graph data structures.

Since RM doesn't provide such a facility, I though I would create one. As it is I did something like this for a client already - the client wanted to write straight to the S3 Virge hardware for performance and quality reasons even though they had a Retained Mode app. (This was in DX3 days and there was no support for anything except screen-door translucency or many other effects. Plus they wanted it to run on NT where there was no hardware acceleration support at all!) I observed that really RM is in two parts - the actual scene graph and the rendering and device code, which are logically separate. You can use the RM scene graph as a database and never even create a device. In fact, I've done something like this in a couple of projects. It's pretty handy because you can use the RM scene graph (frame) routines for transforming and maintaining a database of parts, and then actually use the frame routines - without rendering - to write out modified data. In the case of the second project, I also used the RM rendering code so I could see what was going on, but I considered that an extra freebie benefit.

I've posted an outline of some code here which compiles and which you can extend to provide for Immediate Mode rendering in your RM app. (You might want to "Save As…" and then open it with VC to get the right formatting - your browser might mess up the line breaks otherwise. Or here it is in a zip file.)

This is just an outline (although it compiles and runs) because there's all kinds of functionality I haven't provided yet.

There's also a bug I don't understand (see below). If you figure it out let me know.

This code solves a specific problem I had with RM - there is a bug (in my opinion) where if you use modify your viewport during rendering (by calling ->Render on two different top-level frames after changing the viewport in between), then when you call update and the actual rendering occurs RM gets confused IF you have any translucency in your models. It sorts the geometry (if you ask it) and to do that it needs to keep a list of your translucent geometry and then render it after it's rendered everything else. This is the only way to make translucent objects come out correctly because normally you will render from front-to-back to maximize the z-buffer but for translucent stuff you must render back-to-front so the translucency piles up correctly. Since RM refused to do the right thing for me, I used to disable the translucent stuff, but that was cruddy. So now, I use this rendering routine which takes care of the second viewport parameters without RM even knowing it happened.

The specific use was in drawing the HUD (Heads-Up Display) in CyberDome. Here's a picture of the HUD as drawn using RM:

 

Now here's a picture of the HUD as drawn by atgpRender:

 

Looks about the same, except that the second HUD was white letters instead of the nice gold ones - which is the bug. I don't get any errors reported when the code runs so I'm not sure what I'm doing wrong with setting up the material. But I'd rather have a white HUD and translucent particles and explosions than solid particles and explosions and a gold HUD.

Please note that the second HUD is being drawn in Immediate Mode.

The other major limitation is that atgpRender as I've posted here doesn't support textures. The problem you might run into is fighting between the RM texture manager and your own texture code. What Microsoft recommends if you are using the DX6/7 Immediate Mode texture manager is to allocate your fixed textures first and then let the texture manager work it out. You can do the same thing with the RM texture manager - load your textures first and then let it deal with the space that's left. If you've got gobs of texture memory with an AGP card this is no big issue unless you've gone insane with your textures. So don't go insane with your textures.

The way the code works is this:

It recursively visits all the frames in the hierarchy you hand it. For each frame, it looks for IDirect3DRMMeshBuilder2 objects. For each one it finds, it reads the data out of the MeshBuilder and stores it in an form suitable for drawing by IM. It then uses the "GetMatrix" call from the RM frame to get the current transformation matrix for that object. It transforms the object using IM and then draws it, being sure to save the current RM state and then restore it.

Another thing missing is the proper code for handling the camera. I put in just enough code to handle the camera for my HUD and nothing more. Ideally, you just get the camera frame and then invert it (I think) to transform your world objects into eye space. You can fool with all these things and improve it.

The main breakthrough is just knowing it can be done. At some point (not too far in the future), I am going to replace ALL of my rendering with an expanded version of this. Then I can use single-pass multitexture and other new IM modes (such as cubic environment maps) and hardware accelerated transforms. Yahoo! With a little work you can do the same.

Start coding.

Back to 'Random Blts' Table of Contents


Back to Above the Garage Productions