Skip to content

Access Processor Special Function Register (SFR)

In MCU world, we often perform direct MCU peripheral control using SFR(Special Function Register). But things is different in embedded linux, where by the world is split into kernel space vs user space. Kernel space will have direct access over MPU SFR, but not from user space. Usually user application has to make function call into kernel driver in order to access SFR. At least this is my understanding so far, until recently where I found a new method.

From the user space, apparently there is still a way to access SFR control, and the right tools is /dev/mem with function ‘mmap’. We just need to open /dev/map file and map a memory into physical memory (which happen to be the SFR register address).

Below shown an example, the code is taken from this reference:

#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <unistd.h>

#define GPIO_BASE 0x80018000
#define GPIO_WRITE_PIN(gpio,value) GPIO_WRITE((gpio)>>5, (gpio)&31, value)
#define GPIO_WRITE(bank,pin,value) (gpio_mmap[0x140+((bank)<<2)+((value)?1:2)] = 1<<(pin))
#define GPIO_READ_PIN(gpio) GPIO_READ((gpio)>>5, (gpio)&31)
#define GPIO_READ(bank,pin) ((gpio_mmap[0x180+((bank)<<2)] >> (pin)) & 1)

int *gpio_mmap = 0;

int *gpio_map() {
    int fd;

    if (gpio_mmap != 0) return;
    fd = open("/dev/mem", O_RDWR);
    if( fd < 0 ) {
        perror("Unable to open /dev/mem");
        fd = 0;

    gpio_mmap = mmap(0, 0xfff, PROT_READ|PROT_WRITE, MAP_SHARED, fd, GPIO_BASE);
    if( -1 == (int)gpio_mmap) {
        perror("Unable to mmap file");
        gpio_mmap = 0;
    if( -1 == close(fd))
        perror("Couldn't close file");

int gpio_rd(long offset) {
    return gpio_mmap[offset/4];

void gpio_wr(long offset, long value) {
    gpio_mmap[offset/4] = value;

void gpio_output(int bank, int pin) {
    gpio_mmap[0x1C1 + (bank*4)] = 1 << pin;

void gpio_input(int bank, int pin) {
    gpio_mmap[0x1C2 + (bank*4)] = 1 << pin;


There is a small utility name: devregs, built by boundary device to perform this, refer here for the utility guide. This would be handy when if your device is support by this utility.

One site note: the mmap should always be in 4K pages aligned, which means that the SFR address mapping should always end with 000, e.g. 0x1234 5000


GitBucket: Git Error: 413 – Request Entity Too Large

Following online guide in setting up Gitbucket as local git server. I am able to get it work with git client. Everything works fine (e.g create new repo, clone, stage commit works). Until recently when I try to upload a kernel source code into the gitbucket and hit by the error: 413 – request entity too large.

After searching for online solution, the advise is to include below line in nginx server config file at ‘etc/nginx/nginx.conf’

client_max_body_size 2048M;

After implement the changes, I still hit by the 413 error. After some troubleshooting, it seems to me the git push will stop when transfer hit 500MBytes. This is unexpected as I had change the max size to 2GB as shown above.

Trying to search for online solution and most advise is on client_max_bod_size, which I had implement. Further checking on nginx config file and  I bump into gitbucket nginx config file as below

ll /etc/nginx/sites-enabled/
 total 8
 drwxr-xr-x 2 root root 4096 Jan 16 12:34 ./
 drwxr-xr-x 7 root root 4096 Feb 22 12:30 ../
 lrwxrwxrwx 1 root root 43 Jan 16 12:31 mygitbucket.conf -> /etc/nginx/sites-available/mygitbucket.conf

When I checked the target file,  ‘/etc/nginx/sites-available/mygitbucket.conf’, as shown below:


Viola, a field client_max_body_size that define as 500M, this must be the reason. Replaced the 500m with 2048m and restart service with : sudo systemctl restart nginx. Now I can push by kernel source into gitbucket server without any issue.

Storage File Read Write – Linux File Caching

My custom board SD is having an a maximum speed of 132MHz from linux sys as shown below:

clock: 132000000 Hz
actual clock: 132000000 Hz
vdd: 21 (3.3 ~ 3.4 V)
bus mode: 2 (push-pull)
chip select: 0 (don't care)
power mode: 2 (on)
bus width: 2 (4 bits)
timing spec: 6 (sd uhs SDR104)
signal voltage: 1 (1.80 V)
driver type: 0 (driver type B)

From the above, SDR104 should gives me maximum capacity of 104MBytes/s

When I run dd command as below, the writing provide 1/3 of the maximum speed. Most likely this is due to the slow write cycle of SD card.

//Testing with 100MByte writing
dd if=/dev/zero of=/media/sd-mmcblk0p1/test.txt bs=1 
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 3.09921 s, 33.8 MB/
//Testing with 1GB writing
dd if=/dev/zero of=/media/sd-mmcblk0p1/test.txt bs=1M
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 50.8004 s, 20.6 MB/s

Now, lets have a look on the reading speed

//Testing with 100MByte reading
# dd if=/media/sd-mmcblk0p1/test.txt of=/dev/null bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 2.00954 s, 52.2 MB/s
//Testing with 1GB reading
# dd if=/media/sd-mmcblk0p1/test.txt of=/dev/null bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 19.462 s, 53.9 MB/

However, things get interesting when I perform another dd read. This time my result is as below:

# dd if=/media/sd-mmcblk0p1/test.txt of=/dev/null bs=1M count-100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.330023 s, 318 MB/s

Out of sudden, the reading speed has increase tremendously, too 318MB/s, which is much higher then the SD card clock frequency.  This is weird. The only logical explanation (that I can think of) would be there is some sort of internal caching in Linux kernel. And it turns out to be truth,  when I perform the following command and dd again, the read speed has return to ‘normal’ values.

echo 1 > /proc/sys/vm/drop_caches

A search on Internet bring me to this page about drop_caches.

//Testing with 100MByte reading
# dd if=/media/sd-mmcblk0p1/test.txt of=/dev/null bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 2.00954 s, 52.7 MB/s


My Printf Does Not Work

Sometimes during development we tends to bump into mysterious problems. Code that previously work, out of sudden is not working. This happen again recently, out of sudden, the embedded board no longer print out any debug message. The program has been simplify to print ‘Hello World’ at the first line of code. And surprisingly nothing print out after execute the program. A sample of program as below

#include <stdio.h>
#include <stdlib.h>

void function1(void)
    printf("run function1...");
int main(void) {
    printf("hello world");
    function1();  //implemented program code
    return 0;

After looking into this for 15minutes without any clues, I decided is time for a break.

After coming back from short break. I finally spot the problem.  Apparently this is due to the missing of newline (‘\n’). A fix would be like below:

printf("hello world\n");

With this changes, debug printout works again. This shows Linux only buffered all the printf message until a newline is detected, then only buffered data will be flush out. In my case, flush out through a UART port.

Lesson of the day:

  1. Always append with newline(‘\n’) on printf to ensure message is flush out from buffer
  2. During development, when progress is stall, time for a break!

Why Arduino code is not deploy in commercial/industry products


A friend of mine is asking this question and my answers are:

1. Reliability of Source Code

Arduino is originally target for electronics beginner hobbyists to quickly building electronics circuit or prototype. Thus the requirement is less stringent as compare to embedded development. If a device is not working, a simple device reset in Arduino is fine. However, this is not the case on embedded products.
In embedded development, getting the device to work is just part of the work. The most important part of the work would be ensuring system running well in all situations. This inherently means the firmware require to anticipate errors of input parameter, sensor values and there has to be some recovery to ensure system is reliable.

2. Efficiency of System

Arduino system is written with 2 main function, setup() and loop(). Setup() stage is called when MCU initialize during power up while loop() is call in a superloop case. Is some cases, these 2 function is sufficient to perform system functionality.
In embedded system, efficiency is important key aspect to development. An highly efficient system will require less powerful MCU, thus saving in resources and cost. To achieve that, a different light weight operating system(e.g. round-robin task scheduler, RTOS) will be use in system. A superloop will not be suitable in this case as superloop tends to have slower system responds as compare to operating system

3. System/Code Optimization

In embedded system, every bytes (on flash or RAM) count, thus system use to write in optimism way to produce highly efficient code with smallest code size. This may not be the case in Arduino code. When such Arduino code is being use in embedded systems, more code/RAM would be require. However, with advance of technologies, this impact will not be as great as old days.

4. Arduino Pin-out

Arduino has a standard pin-out assignment to ease development. While in embedded systems, each pin-out is use differently depending on hardware design. If a Arduino code hard coded to use these standard pin-out, then there is a possibilities libraries is not usable on embedded system due to this limitation.




Enable Parallel Compilation

Nowadays, there is more and more MCU vendors that adopt Eclipse platform as the MCU IDE. The drawback of eclipse is it always progress slower, which I suspect is due to the the implementation is done in Java. Nevertheless, sometimes waiting for compilation is becoming a norm.

If we have a multicore processor, which is very common with i5 processor or i7 processor. Then there is an option where we can speed up the compilation process : parallel job compilation. Using this features, each of the source code can be assign to different processor core for compilation, enabling a concurrent compilation.

To enable this settings, just right clicked on the project, then select ‘Properties’,  project settings window will display as below. Then follow image below to enable an parrallel job compilation.


It is possible to reduce the compilation process by half of the time or even more.

Since this is Eclipse based feature, if you are building desktop software using Eclipse, same settings is available.

While for terminal (command line) compilation using gcc, same option can be enable using ‘-j4’ (for 4 parallel job) or ‘-j8’ (for 8 parallel job).

C Language ‘static’ Keyword In Function Scope

In C language, when we want to have a global variable that exist inside a function(function scope), we can use ‘static’ keyword to achieve this. Example as below:

void function1(void){    static uint8_t global_variable=0;    uint8_t variable=0;

    printf("global_variable1:%d, variable:%d\n", global_variable, variable\n");}
void main(void)
    function1(); //output: global_variable:1, variable:1
    function1(); //output: global_variable:2, variable:1

Example Code 1

In superloop environment(all task is being called from a always-true while loop), and function1() is not called from ISR, above would works every well.

But the above code will not work(in some cases) when it is use into RTOS(real time operating system) environment. Depending on the system design, few scenario may happen and it require a different changes.

Case 1: Only Require One ‘global_variable’ In System

In this case, despite system having multiple task, we only require one global_variable for all task.

With RTOS enviornment, ‘global_variable’ become a share resource, thus it has to be ensure its atomic operation. Without the atomic operation gurantee, while task1 is accessing and changing ‘global_variable’, task1 may be preempt by task2, which may also make change ‘global_variable’. Mutex_lock/unlock can be use for avoid such scenario as below:

void function1(void){    
    static uint8_t global_variable=0;  
    uint8_ p_global_variable;  
    uint8_t variable=0;

    p_global_variable = global_variable;
    printf("global_variable1:%d, variable:%d\n", p_global_variable, variable\n");

Example Code 2

One example of such is when we want to know how many times function1 is being called by all the tasks in the system.

In above code, we have modified such that it can be use by multiple task, this is what generally call thread-safe function

Case 2: Each Task Require One ‘global_variable’

This case assume each task have their own copy of ‘global_variable’. E.g: 2 task is in the system and each one have a copy of ‘global_variable’.

In this scenario, we need to ‘duplicate’ the variable for eacth task. Since ‘static’ keyword only allow us to have a copy of ‘global_variable’, we can no longer use this keyword. Instead we have to create task dependant global variable and passing in such global variable

uint8_t Task1_global_variable=0;
uint8_t Task2_global_variable=0;

void function1(uint8_t *global_variable){    
    uint8_t variable=0;

    printf("global_variable1:%d, variable:%d\n", *global_variable, variable\n");
void Task1Loop(void)
void Task2Loop(void)

Example Code 3

When Task1_global_variable is use elsewhere of the code, we would need to ensure atomic operation by using mutex.


From above example, we can see how ‘static’ code would change when applying in RTOS environment. In general, RTOS code would be write in different form as compare to superloop environment. One have to employ the linux multi-thread programming methology in RTOS, particularly on resource sharing, concurrency issues.