One of the things I stress to young engineers is to believe that the code does what it is meant to. If the code is meant to log a message and you don’t see the message then the code is not being executed. The following example from a few days ago shows how effective this idea can be debugging relatively complex issues.
An issue was reported by a customer that some status messages were not being sent to them. This was happening after the system was upgraded to the latest version of our server. The issue was escalated to engineering and I had a look at the issue with the engineer responsible for the customer. We had a look at the log files and could not see the entries that indicated status messages being sent. We checked the change log for the code that sent the messages and nothing had been modified in the latest server release. Aside: Each of our customers normally has differing requirements so our system consists of core set of functionality supplemented with with customer specific functionality implemented in what we call customisation classes.
I suggested to the engineer that because we could not see the log file entry for sending the status message I did not think that the code was being called – there must be another path through the system that was now executing. I had to go and check on something else so I left him to see what he could find out.
15 minutes later he came back with the cause of the problem identified. A change had been made in the core code that stopped the customisation code being called under certain circumstances. The way the system was meant to work was that the customisation class extended the core class and over rode certain methods. The code below shows this idea.
class Core
{
public void methodA()
{
methodB();
}
public void methodB()
{
doSomthing();
}
}
What had happened was that a new method signature had been introduced for methodB and methodA now called this new methodB. Because the customisation code did not over ride method the new methodB the code for sending the status message was never called. The problem is shown in the code sample below.
class Core
{
public void methodA()
{
methodB("No longer overriden)
}
public void methodB()
{
// normally overriden
}
public void methodB(String a)
{
}
}
The key reasons the cause was found so quickly were:
- Good knowledge of the system – The engineer knew what parts of the the system were related to the issue.
- He looked at what had changed
- He used the knowledge that there must be a different path through the system
- He “used the code”
A fix was produced that resolved the issue by reverting the functionality to what it had been in relation to the execution path through the system – Issue Resolved.
No comments:
Post a Comment