Skip to content

Uboot Speed Optimization

This is a post the follow from previous article “Embedded Linux Quick Boot

Below provided a brief summary of actions taken to speed up Uboot boot time.



  1. There is a lot of features being offered by Uboot. But depending on custom hardware, certain features will not be useful during production version.  Features like these is something we would like to remove. Removing redundant features will save bootup time (code not being execute) and reduce Uboot size (feature will not be compile)
  2. Example of such would be kernel boot method. Uboot support kernel boot over TFTP, nand, SPI flash, etc. In most cases, only 1 medium will be use and remaining boot method can be disable/remove

Command line

  1. Uboot provided command line option to enable dynamic bootup configurations. User is able to change execute command such as loading kernel over TFTP, or changing kernel bootup parameter.
  2. We can simplify and remove redundant script. A simplify script will save interpreter time, reduce size to speed up faster boot. Always be reminded script running is expensive as compare to code execution
  3. In production system, typcially command line will not be use and can be removed by removing CONFIG_CMD_xxx in config file
  4. This config file located at: Uboot_source/configs/xxxx_defconfig. Going through the content of this file and remove any redundant items


Default Script Configurations

  1. When uboot first program and bootup, there is no user specific default settings. In this case, a default settings will be use.
  2. This default setting is location at: Uboot_source/include/configs/xxx.h (e.g. i.MX6ULL will be Uboot_source/include/configs/mx6ullevk.h)
  3. We can simplify the contents of this file to shorten the script, thereby reduce interpreter time in interpreting the default settings.

Flat Device Tree

  1. New Uboot will support device tree (FDT: Flat Device Tree) similarly to kernel device tree.
  2. For ARM system, FDT is located at Uboot_source/arch/arm/dts/xxx.dts
  3. The main function of device tree is to describe available hardware in a system. The drawback of using device tree is now Uboot require to parse the contents of these file (as compared previously Uboot has been pre-build with a set of settings, which is faster)
  4. Modify flat device tree(FDT), to only keep minimum boot device, remove non-essential item


  1. Not to forget easy target: disable bootdelay features
  2. Disable print text output. (keep this for last, as this help for optimization process)

Falcon Mode

  1. This is the ultimate steps to really minimum the Uboot boot, by skipping Uboot during booting
  2. After primary/secondary bootloader, instead is jumping into Uboot, falcon mode will enable system directly boot into kernel, resultant the fastest boot time.
  3. More information can be refer online or follow this link as introduction.


After implemented above items(except falcon mode), the boot time has reduced from 1 seconds to 0.5 seconds (excluding boot-up delay, which typically 3 or 5 seconds)


Embedded Linux Quick Boot

To enable embedded Linux quick boot, we need to understand well system booting process, and identify system bottleneck.

Embedded hardware is always custom made, and thus the booting process is no identical from one hardware to another. However, a general boot process still follow for all system. From power switch turn on until system ready (maybe application running or login command prompt),  system will (generally) go through 6 main stages as below:

  1. Manufacturer boot loader (First stage boot loader)
  2. Second stage boot loader
  3. U-Boot
  4. Linux Kernel
  5. Root file system service initialization
  6. Application running

Manufacturer Boot Loader

This is the first boot code running within an SOC upon power up. Typically this code is stored in ROM within SOC. The key function is to enable SOC boot through various method (USB, UART, SPI/IIC EEPROM). Usually evaluation board will have boot switch to select between various boot method. As this is manufacturer boot ROM, there is nothing we can do on this, and usually the boot time at this stage is very short, as compare to other booting stage.

Second Stage Boot Loader

This is an intermediate stage between U-Boot and manufacturer boot loader. This would be the first boot loader that we build. The key function is to enabling reading of subsequent boot loader (u-boot), place it into RAM and boot from RAM. In some system, and minimal function of U-Boot is use as second stage boot loader, in order to load and run full functions U-boot. Usually this boot loader is stored on non-volatile storage,  external to SOC, e.g. flash, EEPROM.

In some systems, this boot loader is not require. Which means the manufacturer boot ROM will directly load and run full fledge U-Boot.


U-Boot is an common boot loader use in may embedded Linux product. Its main function is to reading Linux kernel from boot medium, place into RAM and finally transfer control to Linux kernel (where Linux start to boot).

U-boot provide more boot method such at network boot(over TFTP), USB boot. As compared to previous boot loader(manufacturer boot loader or second stage boot loader), U-Boot is capable reading file systems such as FAT/EXT3.

Apart from that, U-boot also provide user command prompt to input boot parameter (bootargs) that will be pass to kernel and configure kernel boot process.

U-boot will also read DTB (device tree blob) and pass to kernel to inform kernel about hardware configurations.

Linux Kernel

When Linux is booting, it will use DTB to perform initialization and matching of driver with detected devices. Kernel will bring up buses and peripheral, perform detection availability of device and initialization of device.

At the end of device initialization, kernel will mount root file system and start the first user process: init. This is where root file system service initialization starts.

Root file System Services Initialization

For the system to be ready, some services may need to be run such as command prompt login, background services such as web servers/NFS/GUI service (e.g. Wayland, x-11).

Kernel module will also be loaded in this stage if configure to do so.

Usually the last application that run on root file system service would be user application.


This is the application that suppose to perform product functionalities. This is product dependent

In subsequent article, I shall wrote methods I use to reduce system boot time on each of the boot stage.

Last Note

Before optimize system boot time, we need to:

  • define system boot time (from power turn on till login prompt, or till application running or application GUI). Different definition will translate to different work scope.
  • Have a tool to perform boot time measurement. A handy tool is grabserial.

Refer here for next article on “Uboot Speed Optimization

Yocto Build: My Rules Of Thumb


  1. When Yocto build is failed, try to rebuild the fail package again. It happens to me after rebuild, it is successful. I still unsure why this happen. My bet would be due to some concurrent package build whereby one of the package is depending on another package in build.
    1. Command to use: bitbake ‘package’ -c build -f or just
    2. bitbake ‘package’
  2. If steps 1 does not work, perform a clean and rebuild. Command to use:
    1. bitbake ‘package’ -c clean
    2. bitbake ‘package’ -c build -f
  3. This is bad, need to perform a deep dive to troubleshoot problem

Access Processor Special Function Register (SFR)

In MCU world, we often perform direct MCU peripheral control using SFR(Special Function Register). But things is different in embedded linux, where by the world is split into kernel space vs user space. Kernel space will have direct access over MPU SFR, but not from user space. Usually user application has to make function call into kernel driver in order to access SFR. At least this is my understanding so far, until recently where I found a new method.

From the user space, apparently there is still a way to access SFR control, and the right tools is /dev/mem with function ‘mmap’. We just need to open /dev/map file and map a memory into physical memory (which happen to be the SFR register address).

Below shown an example, the code is taken from this reference:

#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <unistd.h>

#define GPIO_BASE 0x80018000
#define GPIO_WRITE_PIN(gpio,value) GPIO_WRITE((gpio)>>5, (gpio)&31, value)
#define GPIO_WRITE(bank,pin,value) (gpio_mmap[0x140+((bank)<<2)+((value)?1:2)] = 1<<(pin))
#define GPIO_READ_PIN(gpio) GPIO_READ((gpio)>>5, (gpio)&31)
#define GPIO_READ(bank,pin) ((gpio_mmap[0x180+((bank)<<2)] >> (pin)) & 1)

int *gpio_mmap = 0;

int *gpio_map() {
    int fd;

    if (gpio_mmap != 0) return;
    fd = open("/dev/mem", O_RDWR);
    if( fd < 0 ) {
        perror("Unable to open /dev/mem");
        fd = 0;

    gpio_mmap = mmap(0, 0xfff, PROT_READ|PROT_WRITE, MAP_SHARED, fd, GPIO_BASE);
    if( -1 == (int)gpio_mmap) {
        perror("Unable to mmap file");
        gpio_mmap = 0;
    if( -1 == close(fd))
        perror("Couldn't close file");

int gpio_rd(long offset) {
    return gpio_mmap[offset/4];

void gpio_wr(long offset, long value) {
    gpio_mmap[offset/4] = value;

void gpio_output(int bank, int pin) {
    gpio_mmap[0x1C1 + (bank*4)] = 1 << pin;

void gpio_input(int bank, int pin) {
    gpio_mmap[0x1C2 + (bank*4)] = 1 << pin;


There is a small utility name: devregs, built by boundary device to perform this, refer here for the utility guide. This would be handy when if your device is support by this utility.

One site note: the mmap should always be in 4K pages aligned, which means that the SFR address mapping should always end with 000, e.g. 0x1234 5000

GitBucket: Git Error: 413 – Request Entity Too Large

Following online guide in setting up Gitbucket as local git server. I am able to get it work with git client. Everything works fine (e.g create new repo, clone, stage commit works). Until recently when I try to upload a kernel source code into the gitbucket and hit by the error: 413 – request entity too large.

After searching for online solution, the advise is to include below line in nginx server config file at ‘etc/nginx/nginx.conf’

client_max_body_size 2048M;

After implement the changes, I still hit by the 413 error. After some troubleshooting, it seems to me the git push will stop when transfer hit 500MBytes. This is unexpected as I had change the max size to 2GB as shown above.

Trying to search for online solution and most advise is on client_max_bod_size, which I had implement. Further checking on nginx config file and  I bump into gitbucket nginx config file as below

ll /etc/nginx/sites-enabled/
 total 8
 drwxr-xr-x 2 root root 4096 Jan 16 12:34 ./
 drwxr-xr-x 7 root root 4096 Feb 22 12:30 ../
 lrwxrwxrwx 1 root root 43 Jan 16 12:31 mygitbucket.conf -> /etc/nginx/sites-available/mygitbucket.conf

When I checked the target file,  ‘/etc/nginx/sites-available/mygitbucket.conf’, as shown below:


Viola, a field client_max_body_size that define as 500M, this must be the reason. Replaced the 500m with 2048m and restart service with : sudo systemctl restart nginx. Now I can push by kernel source into gitbucket server without any issue.

Storage File Read Write – Linux File Caching

My custom board SD is having an a maximum speed of 132MHz from linux sys as shown below:

clock: 132000000 Hz
actual clock: 132000000 Hz
vdd: 21 (3.3 ~ 3.4 V)
bus mode: 2 (push-pull)
chip select: 0 (don't care)
power mode: 2 (on)
bus width: 2 (4 bits)
timing spec: 6 (sd uhs SDR104)
signal voltage: 1 (1.80 V)
driver type: 0 (driver type B)

From the above, SDR104 should gives me maximum capacity of 104MBytes/s

When I run dd command as below, the writing provide 1/3 of the maximum speed. Most likely this is due to the slow write cycle of SD card.

//Testing with 100MByte writing
dd if=/dev/zero of=/media/sd-mmcblk0p1/test.txt bs=1 
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 3.09921 s, 33.8 MB/
//Testing with 1GB writing
dd if=/dev/zero of=/media/sd-mmcblk0p1/test.txt bs=1M
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 50.8004 s, 20.6 MB/s

Now, lets have a look on the reading speed

//Testing with 100MByte reading
# dd if=/media/sd-mmcblk0p1/test.txt of=/dev/null bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 2.00954 s, 52.2 MB/s
//Testing with 1GB reading
# dd if=/media/sd-mmcblk0p1/test.txt of=/dev/null bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 19.462 s, 53.9 MB/

However, things get interesting when I perform another dd read. This time my result is as below:

# dd if=/media/sd-mmcblk0p1/test.txt of=/dev/null bs=1M count-100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.330023 s, 318 MB/s

Out of sudden, the reading speed has increase tremendously, too 318MB/s, which is much higher then the SD card clock frequency.  This is weird. The only logical explanation (that I can think of) would be there is some sort of internal caching in Linux kernel. And it turns out to be truth,  when I perform the following command and dd again, the read speed has return to ‘normal’ values.

echo 1 > /proc/sys/vm/drop_caches

A search on Internet bring me to this page about drop_caches.

//Testing with 100MByte reading
# dd if=/media/sd-mmcblk0p1/test.txt of=/dev/null bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 2.00954 s, 52.7 MB/s


My Printf Does Not Work

Sometimes during development we tends to bump into mysterious problems. Code that previously work, out of sudden is not working. This happen again recently, out of sudden, the embedded board no longer print out any debug message. The program has been simplify to print ‘Hello World’ at the first line of code. And surprisingly nothing print out after execute the program. A sample of program as below

#include <stdio.h>
#include <stdlib.h>

void function1(void)
    printf("run function1...");
int main(void) {
    printf("hello world");
    function1();  //implemented program code
    return 0;

After looking into this for 15minutes without any clues, I decided is time for a break.

After coming back from short break. I finally spot the problem.  Apparently this is due to the missing of newline (‘\n’). A fix would be like below:

printf("hello world\n");

With this changes, debug printout works again. This shows Linux only buffered all the printf message until a newline is detected, then only buffered data will be flush out. In my case, flush out through a UART port.

Lesson of the day:

  1. Always append with newline(‘\n’) on printf to ensure message is flush out from buffer
  2. During development, when progress is stall, time for a break!