Time Travel Debugging
Time Travel Debugging or TTD, is the process of recording and then replay the execution of a process, both forwards and backward, to understand what is happening during the execution. It is vital to fixing bugs when the root cause is not evident, and the symptoms appear moments later when the source of the problem is gone.
It also allows you to rewind and play how many times you want, helping you to isolate when the problem happened, and setting breakpoints as you are used in Windbg. Even when you don't need to rewind, TTD has some advantages over Live Debugging, since it doesn't interrupt the process, you can create a trace and analyze it offline, but be aware that TTD is very intrusive and ideally should be used for less than a minute or the traces files can become very large (>5GB).
First, make sure you have WinDbg installed and the symbols are configured, below two ways to set it up:
- Set via an environment variable (it will work for every debugger)
- Set via Windbg. File -> Symbol File Path (Ctrl+S). Symbol Path: srv*c:\symbols\public*http://msdl.microsoft.com/download/symbols:
The machine where the traces will be recorded must be a Windows 10 >=1809 or a Windows Server 2019.
The demo application is a simple Windows Forms bugged applications that writes a log file to disk.
Click on Save several times and after some seconds it doesn't work anymore:
Recording a trace
You can currently use WinDbg Preview to record the trace, but for this Lab, we'll use the builtin TTTracer.exe available in Windows 10 (1809 or newer) and Windows Server 2019 since many times we can't install tools in a production machine.
Open the LabWindbgTTD.exe in the target machine and take note of its PID:
Now open a Command Line as admin and enter the following command, replacing <PID> with the PID of the process:
TTTracer -attach <PID>
The trace files will be created. Now click on Save several times until you receive the error. After the error, in a different Command Prompt, execute the following command to stop the trace:
TTTracer -stop <PID>
You will notice that the trace will stop in the first Command Prompt:
The traces will result in two files: LabWindbgTTD01.out and LabWindbgTTD01.run
Replay and Analyze
Copy both files to your machine, you can use WinDbg or WinDbg Preview to analyze it. Using WinDbg Preview, click File -> Open Trace File and select the ".run" file:
When debugging a TTD file, you can Step Into, Step Out and Step Over like you do when Live Debugging:
When stepping you can see the current Time Travel Position, like in the image above. Using the !tt <POSITION> command you can navigate to a specific position:
The !positions command show the positions for all threads:
But it starts to get interesting when using the Step command contrariwise, instead of p, t, and g, you can execute p- (Step Back), t- (Trace Back), and g- (Go Back):
When debugging .NET applications in a different machine from where the dump or trace was created you need to copy the mscordacwks.dll and SOS.dll from C:\Windows\Microsoft.NET\Framework\v4.0.30319\ (or C:\Windows\Microsoft.NET\Framework64\v4.0.30319\ for 64 bits processes) and execute this command in WinDbg:
.cordll -ve -u -lp <PATH>
Finding the problem (Spoiler alert)
Now that we know how to navigate the TTD file, how can we find the bug?
Let's try to stop when the error occurs, to do this you could use "sxe clr;g" like you would do in a Live Debug, but the TTD extends the Session and Process data model objects, exposing events like Exceptions. To see all of them execute "dx @$curprocess.TTD" and "dx @$cursession.TTD":
We can take advantage of this feature and filter the Exception events, "dx -r2 @$curprocess.TTD.Events.Where(t => t.Type == "Exception").Select(e => e.Exception)":
Click on [Time Travel] to navigate to the moment when the Exception was thrown and execute !pe to see the System.ObjectDisposedException. You can now execute !clrstack to see exactly what method is throwing the Exception, but it isn't helpful since the BtnSave_Click just call StreamWriter.WriteAsync and is not disposing the object.
In this case, a log containing the Stack Trace or a Dump file wouldn't help as well. The application is small and simple enough that looking the code you will easily find the problem, but let's continue the analysis using the WinDbg.
We know that at this point the root cause already happened, so go back to the beginning, !tt 0, and now we need to find out who is calling StreamWriter.Dispose, so set a breakpoint to this method and continue the execution, "!bpmd mscorlib.dll System.IO.StreamWriter.Dispose;g":
You'll break at Dispose method of StreamWriter, execute !clrstack to see the Stack Trace and find the method that is Disposing the StreamWriter:
TTD makes it viable to analyze many scenarios that would be extremally difficult to reproduce or to collect the right data. The possibility of going back and forth is powerful and certainly has the potential to greatly reduce troubleshooting time.
Microsoft Docs has many more details about TTTracer and Windbg Preview.