Code and Life

Programming, electronics and other cool tech stuff

Supported by

Supported by Picotech

Simple FAT and SD Tutorial Part 2

In the last week’s part 1 of my FAT and SD tutorial, we got as far as reading the file entries in root directory, and peeking into a file with hex editor. Now we’ll cover the file allocation table itself to enable reading longer files, and adapt the code into a small footprint FAT16 library!

File allocation table in FAT16

In the previous part, we learned that the data on a FAT-formatted disk is stored in clusters. In our test image, the cluster size was 32 sectors, i.e. 16 kiB (16 384 bytes). Let’s imagine a newly formatted disk with 100 free clusters, with the clusters numbered from 2 (the first cluster at the beginning of data area) to 101 (the very last cluster). Now let’s copy some files there:

File operation File allocation
Copy HAMLET.TXT (193 082 bytes) to disk Clusters 2-13 allocated for the file (196 608 bytes)
Copy README.TXT (353 bytes) to disk Cluster 14 allocated for the file
Create SUBDIR on disk Cluster 15 allocated for the directory file entries
Create 1.TXT in SUBDIR Cluster 16 allocated for the file, a new file entry written to SUBDIR (beginning of cluster 15)

For now, each file occupies consecutive clusters, but what if we now remove README.TXT? Surely it’s too slow to move all files that were after README.TXT one cluster left (consider the case where we’d have hundreds of megabytes of data after that file) – it’s easier to just remove README.TXT from root directory entry, and mark the cluster 14 as free again.

But what if we now copy a second HAMLET2.TXT to the disk? It would be a waste of space to just continue from cluster 17, leaving the free cluster 14 unused. Instead of just allocating clusters 17-28 for HAMLET2.TXT, the operating system can actually reuse cluster 14, allocating clusters 14 + 17-27 for HAMLET2.TXT.

Obviously, after N file operations, the files can be spread all over the clusters, and if we allow appending to existing files, later parts of a file may even be stored on clusters that are physically before the earlier parts. We obviously need a map of some kind to keep track of the clusters occupied a single file. And that’s exactly what a file allocation table is – a map with one slot for each of the data clusters, each slot telling where to go next when reading a file. In FAT16, each slot takes 2 bytes (i.e. one word), and the first two slots (#0 and #1) are reserved for disk type and FAT type data – in case of FAT16 on a SD card or hard drive, these four bytes will always be F8 FF FF FF. So for a 100-cluster disk, the FAT would be 102 words (204 bytes) long. If a file starts at cluster 2, fat[2] would tell the next cluster, e.g. 3, fat[3] would in turn contain the next cluster and so on, forming a linked list.

If that sounds simple enough in theory, it’s also that in practice. Here’s the FAT from our test.img offset 0x10600:

In the above screen capture, fat[2] is marked on red (note that because we’re talking words, fat[0] is the first two bytes, fat[1] the next two, and so on). Least significant byte is first, so the value is 0x0003, i.e. sector 3. What’s after sector 3? Just look up fat[3] – it’s 0x0004 (the two bytes right after fat[2]). We can continue this right until fat[0xD] (cluster 13 in decimal), which reads 0xFFFF, meaning that is the last cluster of this file. For files occupying only a single cluster, 0xFFFF is the only FAT entry, as can be seen for all files after our HAMLET.TXT.

Congratulations, if you understood the above, you now understand everything there is to know about FAT16. It’s time to transform it into code form!

Reading whole files using FAT

Let’s say we want to read HAMLET.TXT and write it to a file. From running read_root.exe and our discussion above, we know that it starts from cluster 2, the first cluster in data area (after seeing how cluster numbers are used as indices in FAT, it’s easy to see why the numbering starts from 2 – this way the cluster number works straight as an index in FAT). Let’s read it all in one go:


unsigned char buffer[16384];
unsigned short cluster = 2;

// go to first cluster
fseek(in, data_start + (cluster-2) * cluster_size, SEEK_SET);
fread(buffer, 1, sizeof(buffer), in);
// do something with data

// go to fat[cluster]
fseek(in, fat_start + cluster * 2, SEEK_SET);
fread(&cluster, 2, 1, in);

if(cluster == 0xFFFF)
  return;

fseek(in, data_start + (cluster-2) * cluster_size, SEEK_SET);
fread(buffer, 1, sizeof(buffer), in);

...

The code above is obviously missing some important details, but I trust you get the basic idea:

  • Go to data cluster start
  • Read cluster contents
  • Go to FAT entry for this cluster
  • Read the number of next cluster
  • If number != 0xFFFF, go to the start of next data cluster…

Once we add an actual loop for doing the above, and adjust the buffer size to a slightly smaller value to handle also 4 kB and 8 kB sectors gracefully, we get the following routine. Note that I’m passing values prepared in main method to this function in order to minimize distractions.


void fat_read_file(FILE * in, FILE * out,
                   unsigned long fat_start, 
                   unsigned long data_start, 
                   unsigned long cluster_size, 
                   unsigned short cluster, 
                   unsigned long file_size) {
    unsigned char buffer[4096];
    size_t bytes_read, bytes_to_read,
           file_left = file_size, cluster_left = cluster_size;

    // Go to first data cluster
    fseek(in, data_start + cluster_size * (cluster-2), SEEK_SET);
    
    // Read until we run out of file or clusters
    while(file_left > 0 && cluster != 0xFFFF) {
        bytes_to_read = sizeof(buffer);
        
        // don't read past the file or cluster end
        if(bytes_to_read > file_left)
            bytes_to_read = file_left;
        if(bytes_to_read > cluster_left)
            bytes_to_read = cluster_left;
        
        // read data from cluster, write to file
        bytes_read = fread(buffer, 1, bytes_to_read, in);
        fwrite(buffer, 1, bytes_read, out);
        printf("Copied %d bytes\", bytes_read);
        
        // decrease byte counters for current cluster and whole file
        cluster_left -= bytes_read;
        file_left -= bytes_read;

        // if we have read the whole cluster, read next cluster # from FAT        
        if(cluster_left == 0) {
            fseek(in, fat_start + cluster*2, SEEK_SET);
            fread(&cluster, 2, 1, in);
            
            printf("End of cluster reached, next cluster %d\", cluster);
            
            fseek(in, data_start + cluster_size * (cluster-2), SEEK_SET);
            cluster_left = cluster_size; // reset cluster byte counter
        }
    }
}

Take your time in understanding the above code. Basically, it’s a loop that reads data until the whole file is consumed, writing everything to output file. Whenever all bytes in a cluster have been consumed (cluster_left counter goes to zero), a FAT lookup is made to go to next cluster. Note that for valid file entries, the condition cluster==0xFFFF should never be reached, as file size should always run out before that (at best, they run out at the same time so a FAT lookup is made but the condition is still never checked before file_left == 0).

Armed with this function, I modified the read_root.c a bit to take the filesystem image and the file to be read as command-line parameters (note that the filename is case sensitive, so only HAMLET.TXT will work, not hamlet.txt or Hamlet.txt). I saved the result as read_file.c. Simply compile that and try it out:

You should also try reading the HAMLET.TXT. You can download the UTF-8 Hamlet from Project Gutenberg and use fc HAMLET.TXT pg1524.txt see if the extracted file matches – mine did :)

Abstracting it to a FAT16 library

Now that we are successfully reading the file system image on a PC, it’s time to start thinking how this knowledge can be ported to 8-bit AVR side. First issue with the current implementation is the memory footprint: AVR chips can have as little as 128 bytes of SRAM, so a 4096 byte read buffer or even reading the whole boot sector into memory is out of the question. Second thing is that the I/O that will be completely different with an actual SD card.

To deal with the memory issue, I decided to set a little less strict limits than Petit FatFS (44 bytes + some stack) to avoid excess code optimization for size, but still stay under 64 bytes. For reading data, I decided to use a 32 byte buffer, which is enough to store a full Fat16 file entry in one go. For state variables, I chose the following 19 byte structure that is enough to navigate around the file system and read files after initialization, resulting in memory footprint of 51 bytes and some stack:


// State data required by FAT16 library
typedef struct {
   unsigned long fat_start; // FAT start position
   unsigned long data_start; // data start position
   unsigned char sectors_per_cluster; // cluster size in sectors
   unsigned short cluster; // current cluster being read
   unsigned long cluster_left; // bytes left in current cluster
   unsigned long file_left; // bytes left in the file being read
} __attribute((packed)) Fat16State;

#define FAT16_BUFFER_SIZE 32

// Global variables for read data and library state
extern unsigned char fat16_buffer[FAT16_BUFFER_SIZE];
extern Fat16State fat16_state;

Note that these are basically the essential variables used in fat_read_file() routine above. To abstract out the I/O access, the FAT16 library relies on two methods that need to be provided by the user: fat16_seek(offset) and fat16_read(bytes). The first is used to go to a given offset, and the next for reading a given amount of bytes to fat16_buffer. It is assumed that these routines increment and remember the file position like normal C file streams do, so for a SD implementation, additional “current offset” variable would be needed. But while on PC, the following implementations are adequate:


FILE * in;

void fat16_seek(unsigned long offset) {
    fseek(in, offset, SEEK_SET);
}

char fat16_read(unsigned char bytes) {
    return (char)fread(fat16_buffer, 1, bytes, in);
}

And of course the in file handle needs to be initialized with fopen() before calling any functions in the FAT library. I also commented out the unused parts of Fat16BootSector so the relevant parts of it can be read in one go to the buffer (resulting structure is named Fat16BootSectorFragment). The code in read_file.c has been refactored in the following manner:

  • Structure definitions, variable and function declarations moved to a separate header file, fat16.h
  • fseek() calls replaced with fat16_seek() calls
  • fread() replaced with fat16_read()
  • Separate data structures for partitions, boot sector and file entries (pt bs, entry) replaced with macros that actually point to data currently in fat16_buffer
  • All printouts surrounded in #ifdef DEBUG statements so they can be easily omitted
  • Comparisons with memcmp() replaced with for loops
  • Initialization code becomes fat16_init(), ends with file pointer at beginning of root directory
  • Reading of file entries becomes fat16_open_file(filename,ext), code is modified so it can also scan subdirectories (they are similar to root directory, but stored like normal files that can span several clusters – it actually uses the fat16_read_file() internally)
  • Reading a single file becomes fat16_read_file(bytes) – one invocation can read 32 bytes at most, and the result is always stored in fat16_buffer
  • fat16_init() and fat16_open_file() return zero on success and nonzero error codes, fat16_read_file() returns the amount of bytes read – zero indicates end of file
  • With buffer size of 32 bytes, no method should ever invoke fat16_read(bytes) in a manner that would cross sector boundaries (useful for SD implementation later on)

With the above changes, using the FAT16 routines becomes easy: Just include fat16.h, provide the two I/O routines and link against compiled fat16.c. Here’s the main function from library test module, test_lib.c:


int main(int argc, char *argv[]) {
    FILE *out = fopen("HAMLET.TXT", "wb");
    char bytes_read;

    // Open disk image so fat16_seek and fat16_read work
    in = fopen("test.img", "rb");
    
    // Use FAT16 library
    fat16_init();
    fat16_open_file("HAMLET  ", "TXT");
        
    while(fat16_state.file_left) {
        bytes_read = fat16_read_file(FAT16_BUFFER_SIZE);
        fwrite(fat16_buffer, 1, bytes_read, out); // Write read bytes
    }
    
    // Close file handles
    fclose(out);
    fclose(in);
    
    return 0;
}

In addition to the fat16_seek(offset) and fat16_read(bytes) shown above, this is really all that is needed to use the library. Also, if you read the above notes carefully, you’ll notice that subdirectories can be accessed. To open SUBDIR\1.TXT, just replace the fat16_open_file() statement in above code with the following – open_file basically just positions the read pointer to the beginning of a given file, and subdirectories are essentially just files with file entries as their contents:

fat16_open_file("SUBDIR ", " "); fat16_open_file("1 ", "TXT");

It’s important to remember that filenames are case-sensitive and need to be padded with spaces to 8+3 characters, otherwise the library won’t find your files. Furthermore, using mixed case letters when creating file will create dummy file entries that are used to store “long filenames” for the files, so “Hamlet.txt” becomes some dummy entries and the actual file might be called HAMLET~1.TXT (I’m sure that will bring back memories for many of you :).

Closing words

We’ve now created a rather nice, although read-only library for reading FAT16 file system. I recommend that you now grab the updated project zip and spend some time going through the fat16.h, fat16.c, and test_lib.c in order to understand how the library works and how it is used (I included a makefile so you can just use “make” to compile test_lib.exe. That is the only way you’ll truly benefit from this (after all, if all you are looking for are ready-made routines, there are more functionally complete libraries available). As a bonus, you’ll then be able to make changes if you want to achieve something not covered in this tutorial! Here are some examples that you could try as exercises:

  • Add a new function fat16_eof() that will return 1 if file has been completely read
  • Instead of fat16_open_file(), make a method fat16_next_file() that advances file entries one by one, allowing you to for example print out the file structure and navigate directories
  • Add fat16_skip_bytes() method to enable “forward seeking” in a file
  • Open the image file read/write and make fat16_write() user provided function and fat16_write_file() that can overwrite bits of currently read file

In the next parts, we’ll start interfacing with the SD card and port the libraries to ATtiny2313. Until then, take care!

Proceed to the next part of this tutorial

16 comments

eddy:

can’t wait to see the next tutorial with uc …..

Saravanan:

how can we know where cluster 0 starts(sector)?

Joonas Pihlajamaa:

It begins from the sector after file allocation tables, i.e. cluster 0 = start of partition data. At least if my memory serves me correctly, it’s been a while since I wrote this.

Neeraj Deshpande:

Where can I get fat32.c and fat32.h?

These files are not part of any tutorial zips.

Neeraj

Joonas Pihlajamaa:

Wow, it seems I have at some point started talking about FAT32 when I meant FAT16. It’s just a set of typos, everything here should be about FAT16. FAT32 is a (slightly) more complex topic I haven’t tackled, and is left as an excercise to to the reader, too. :)

josh:

hi is there a way i can read boot from the sd without the .img file?

Joonas Pihlajamaa:

I’m not sure I understand, but starting HxD with Administrator rights enables you to read the SD directly, thus getting to boot sector without creating an .img first.

josh:

Hi mate thank you for your reply just wants to say that your tutorial is amazing.Ahh I see. Ya I can read the files from the SD directly, without the .img file. Am stuck with that point now I don’t want to use the .img file any directions someone can pleas point me to ?

Joonas Pihlajamaa:

I am not sure what you are trying to accomplish? Reading the boot sector with PC? Looking at the boot sector with hex editor? Reading the SD with embedded device like AVR?

The tutorial is structured so that the first two parts introduce the FAT16 file system structure and code examples read an .img file. Later parts of the tutorial then use that same code on AVR to read files from SD card directly.

josh:

Hi joonas I went through all ur tutorial I am impressed with it. Yes I am implementing the code on an embedded device, and I am reading from files from the SD card perfectly fine such as if(ret = fat16_open_file(“TEST “, “TXT”)) that works I can read the file and see what’s inside it and that is not in the .img file, so basically all I want to do is now is to know what files are inside the sd card then read them such as test1 test2 test3 and use them for my project but I don’t want to use the .img file more or less what ur doing with read_root.c but without .img or even more to know if there are mor then one .img files (example test1.img and test2.img ) I have a display on the project so I can display the text on my LCD (I can display all the files on the LCD so the user chooses something) do you know what I mean sir ?
Thank you very much for all your time

james:

how would you write to a Fat16?

Joonas Pihlajamaa:

That is a more complex matter. I have the functionality in the works and if I manage to do it, I’ll write about it, but your current option is to check out some more extensive FAT/SD libraries to see how they do it.

blake:

Really great tutorial thank you very much,
there’s a thing that I don’t really understand;
why in read_file.c, at the part where the next cluster number is accessed in the FAT it is:
fseek(in, fat_start + cluster * 2, SEEK_SET);
instead of :
fseek(in, fat_start + cluster + 2, SEEK_SET); ?
Why is the offset is doubled ?

Joonas Pihlajamaa:

I’ve mostly forgotten about this code, but probably it is because each item is 16-bit number, i.e. two bytes long…

blake:

Yes that’s it, thank you

Bryan Rodríguez:

I am glad i found this, i am actually implementing FAT32 for a college project and this information has been great for me.
Thank you!