debugging slow logon

I love solving tough problem, or at least cast some light. One of my customers (I’m back consulting as whitehat.berlin since 2018) was having some serious issues on the AD logon. It took above a minute, with sessions timing out, for users to logon to their workstation. Beautiful, overprovisioned setup, we didn’t really spot the error, aside of going down to update fileserver’s fiberchannel card drivers and starting moving around data between shares. I still believe it is suboptimal to either partition the load manually creating new shares, not leveraging, if it does the job, the DFS(R?) solution from Microsoft.

The fun though came for someone like me to the challenge of collecting data in a proprietary enviroment, especially when you do have a vendor on storage, one on the appliance involved, and another and another, and none of them is responsible for the whole solution… a bit like the BER airport, everyone involved, noone responsible for the overall solution. So, how to isolate the problem at least? We had data from the storage itself, all green, all performant, not much from the client OSs… that in this case are the Microsoft fileservers. So? Well, seems that Microsoft itself has a metrics interface called WMI… Windows Metrics Interface… good ah? Out of which it delivers all the info you see in the Task Manager and similar tools. Well, a bunch of skilled hackers came up with 2 nice tools, one built above the other.

On one hand leoluk/perflib_exporter which looks up in memory, actually bypassing the standard WMI interface (more details on its github page) and delivers a full data dump of ALL available metrics that are in the OS. I was having a SysOps orgasm going through it. On the other hand martinlindhe/wmi_exporter that simply reads that dataset and converts to a format understandable by prometheus. Leading to this beautiful chart…

debugging_logon

This way we could spot which server was serving, how many filedescriptors were open on a certain share… and so on and so on… and yes, I had to come up with the SMB Samba Share data class… as that was missing, but it was just a couple of hours of cut&paste work. I now need to find the time to clean it up to get it merged back in the main project.

p.s. I didn’t know… golang compiles, with no complain, from linux, a .exe windows binary

Leave a comment