Skip to content

Debugging Techniques

November 29, 2014

I would like to share some of my experiences in embedded system debugging with the hope this will be able to helps some engineer in solving their daily problem. (Oh yes, firmware engineer has to face with problem/debugging everyday)

Step 1: Understand the problem

To solve a problem , first we must understand the problem by asking the following question:

1. What is THE problem. To understand the problem, we must know what is the intended design and how the problem deviate from original design – the problem.

2. How the problem happen? Is there any steps that can simulate the problem every time? If we are able to define a steps to simulate the problem, we would be able to trace the problem more easily.

3. How the system/problem behave is other similar case. This may help us have better understand on the problem.

Step 2: Source Code Tracing/Studying – Construct A System Flow

1. From step 1, finding out a starting point where you can start tracing the source code. For example, it can be when the consequences when the problem occur(and trace backward), or when the symptom when the problem occur.

2. To quickly study a large section of source code, one must understand the normal behaviour of the design. And then quickly go through the source code. Just understand the what the function(C language function, e.g. void funcA(void)) does and move on without focus into the detail of each function. While scanning through each functions, we must start construct a system flow in mind(or draw into paper, preferably in mind as you can adjust it later on quickly). Debugging by stepping through code or printing some message out will be helpful in constructing the system flow.

3. Review the construct system flow against the functionality/features. The construct system flow should be implementing that particular functionality/features. If system flow is not match, that means something is missing, at this point we can ignore or re-look into the source code, depending on how important is the missing piece. It is advisable to re-look into the source code, very often we would find out the previous system flow is incorrect, this is where we should start adjusting the system flow

4. In order to confirm the constructed system flow is indeed correct, sometimes we need to run some debugging stepping through the code, or printing out some debugging message to confirm it. It is important to validate the flow to ensure we are at the right path.

5. Keep readjusted the system flow until it is matching with the source code as well as the functionality/feature.

Step3: Finding The Bug

1. Now would be the time review of the system flow and try to figure which part of the code may cause the problem. This is where step1 would be helpful. The more we understand about the bug, the easier you can figure out which part of code may go wrong.

2. If no clue is found from previous steps, then we can try the following method:

  • Using debugger to stepping through source code, observing variable that goes into wrong values
  • Print out some debug message, observe if function call has go to unexpected place or variable values has been change

3. We can start tracing when we found out a variable has change to wrong values, or unexpected function is being called.

Step 4: Keep Finding (5 Why Approach)

1. Some engineer would stop at Step3.3 when finding out variable value has corrupted and start fixing the problem. That indeed is a bad practise. Bug fixing can only be perform when the root cause of the problem has been found, and not the symptom of the problem. E.g. a corrupted variable may be due to many reason, maybe it occur during memory copy that extend beyond a structure, or due to some timing issue, a function has been call and corrupted the variable.

2. Thus, always remember the ‘5 Why’, which means keep asking why. Why the problem occur, Why the variable corrupted, Why a particular function has been called that corrupted the variable and so on, keep digging.

Step 5: Validation Of Root Cause

1. Once we found out the root cause of the problem, it is important to validate it.

2. In addition, all the observation from step1 should also be check against with the root cause. Does all the observation during step 1 can be explain? If not, maybe we have more than one bug, or maybe our constructed system flow is wrong? But for sure something is missing out, thus we restudy the problem again.

Step 6: Fixing The Bug

1. As soon as we manage to find the root cause of the problem. Fixing should be easy. Do remember there is often more than one method in fixing the problem. Check out what options is available and choose the one best fit.

Case Study: DNS Resolve Process is Stop After Some Random Time (30 minutes ~ few hours)

‘Task A’ initiate ‘DNS Task'(DNS resolver), while ‘DNS Task’, initiate ‘ARP Task’

Upon using message print out, it is discovered DNS resolve is stop when the internal state of ‘DNS Task’ is stuck in ‘STATE_BUSY’. Following Step 4 of 5 Why, we continue studying on why the state is stuck to ‘STATE_BUSY’. Later on we found out that is due to ‘DNS Task’ timer that suppose to change the state to ‘STATE_READY’ upon timeout, but it is not happening. Upon timer timeout, a message will be send to DNS Task queue, once DNS Task wakes up, it would read from the queue and change the state to ‘STATE_BUSY’, but it is not happening. End of the day we discovered if DNS Task timer timeout and ARP Task timeout in about same time, both will be sending message into DNS Task queue, but DNS Task queue will only wake once and read a single message from the queue, causing the state is not reset to ‘STATE_READY’.

In short,

DNS Resolve stop <– DNS State in STATE_BUSY <– DNS Timer Fire But Not Working <–Task queue overflow <– Both DNS Task and ARP Task timer fire as almost same time

The above demonstrate why finding the root cause require detail investigation.


From → Uncategorized

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: