@@ -27,16 +27,17 @@ cheap, and can be very granular. For NOR flash specifically, byte-level
27
27
programs are quite common. Erasing, however, requires an expensive operation
28
28
that forces the state of large blocks of memory to reset in a destructive
29
29
reaction that gives flash its name. The [ Uncyclopedia entry] ( https://en.wikipedia.org/wiki/Flash_memory )
30
- has more information if you are interesting in how this works.
30
+ has more information if you are interested in how this works.
31
31
32
32
This leaves us with an interesting set of limitations that can be simplified
33
33
to three strong requirements:
34
34
35
35
1 . ** Power-loss resilient** - This is the main goal of the littlefs and the
36
- focus of this project. Embedded systems are usually designed without a
37
- shutdown routine and a notable lack of user interface for recovery, so
38
- filesystems targeting embedded systems must be prepared to lose power an
39
- any given time.
36
+ focus of this project.
37
+
38
+ Embedded systems are usually designed without a shutdown routine and a
39
+ notable lack of user interface for recovery, so filesystems targeting
40
+ embedded systems must be prepared to lose power at any given time.
40
41
41
42
Despite this state of things, there are very few embedded filesystems that
42
43
handle power loss in a reasonable manner, and most can become corrupted if
@@ -52,7 +53,8 @@ to three strong requirements:
52
53
which stores a file allocation table (FAT) at a specific offset from the
53
54
beginning of disk. Every block allocation will update this table, and after
54
55
100,000 updates, the block will likely go bad, rendering the filesystem
55
- unusable even if there are many more erase cycles available on the storage.
56
+ unusable even if there are many more erase cycles available on the storage
57
+ as a whole.
56
58
57
59
3 . ** Bounded RAM/ROM** - Even with the design difficulties presented by the
58
60
previous two limitations, we have already seen several flash filesystems
@@ -72,29 +74,29 @@ to three strong requirements:
72
74
73
75
## Existing designs?
74
76
75
- There are of course, many different existing filesystem. Heres a very rough
77
+ There are of course, many different existing filesystem. Here is a very rough
76
78
summary of the general ideas behind some of them.
77
79
78
80
Most of the existing filesystems fall into the one big category of filesystem
79
81
designed in the early days of spinny magnet disks. While there is a vast amount
80
82
of interesting technology and ideas in this area, the nature of spinny magnet
81
83
disks encourage properties, such as grouping writes near each other, that don't
82
84
make as much sense on recent storage types. For instance, on flash, write
83
- locality is not important and can actually increase wear destructively .
85
+ locality is not important and can actually increase wear.
84
86
85
87
One of the most popular designs for flash filesystems is called the
86
88
[ logging filesystem] ( https://en.wikipedia.org/wiki/Log-structured_file_system ) .
87
89
The flash filesystems [ jffs] ( https://en.wikipedia.org/wiki/JFFS )
88
- and [ yaffs] ( https://en.wikipedia.org/wiki/YAFFS ) are good examples. In
89
- logging filesystem, data is not store in a data structure on disk, but instead
90
+ and [ yaffs] ( https://en.wikipedia.org/wiki/YAFFS ) are good examples. In a
91
+ logging filesystem, data is not stored in a data structure on disk, but instead
90
92
the changes to the files are stored on disk. This has several neat advantages,
91
- such as the fact that the data is written in a cyclic log format naturally
93
+ such as the fact that the data is written in a cyclic log format and naturally
92
94
wear levels as a side effect. And, with a bit of error detection, the entire
93
95
filesystem can easily be designed to be resilient to power loss. The
94
- journalling component of most modern day filesystems is actually a reduced
96
+ journaling component of most modern day filesystems is actually a reduced
95
97
form of a logging filesystem. However, logging filesystems have a difficulty
96
98
scaling as the size of storage increases. And most filesystems compensate by
97
- caching large parts of the filesystem in RAM, a strategy that is unavailable
99
+ caching large parts of the filesystem in RAM, a strategy that is inappropriate
98
100
for embedded systems.
99
101
100
102
Another interesting filesystem design technique is that of [ copy-on-write (COW)] ( https://en.wikipedia.org/wiki/Copy-on-write ) .
@@ -107,14 +109,14 @@ where the COW data structures are synchronized.
107
109
## Metadata pairs
108
110
109
111
The core piece of technology that provides the backbone for the littlefs is
110
- the concept of metadata pairs. The key idea here, is that any metadata that
112
+ the concept of metadata pairs. The key idea here is that any metadata that
111
113
needs to be updated atomically is stored on a pair of blocks tagged with
112
114
a revision count and checksum. Every update alternates between these two
113
115
pairs, so that at any time there is always a backup containing the previous
114
116
state of the metadata.
115
117
116
118
Consider a small example where each metadata pair has a revision count,
117
- a number as data, and the xor of the block as a quick checksum. If
119
+ a number as data, and the XOR of the block as a quick checksum. If
118
120
we update the data to a value of 9, and then to a value of 5, here is
119
121
what the pair of blocks may look like after each update:
120
122
```
@@ -130,7 +132,7 @@ what the pair of blocks may look like after each update:
130
132
After each update, we can find the most up to date value of data by looking
131
133
at the revision count.
132
134
133
- Now consider what the blocks may look like if we suddenly loss power while
135
+ Now consider what the blocks may look like if we suddenly lose power while
134
136
changing the value of data to 5:
135
137
```
136
138
block 1 block 2 block 1 block 2 block 1 block 2
@@ -149,7 +151,7 @@ check our checksum we notice that block 1 was corrupted. So we fall back to
149
151
block 2 and use the value 9.
150
152
151
153
Using this concept, the littlefs is able to update metadata blocks atomically.
152
- There are a few other tweaks, such as using a 32 bit crc and using sequence
154
+ There are a few other tweaks, such as using a 32 bit CRC and using sequence
153
155
arithmetic to handle revision count overflow, but the basic concept
154
156
is the same. These metadata pairs define the backbone of the littlefs, and the
155
157
rest of the filesystem is built on top of these atomic updates.
@@ -161,7 +163,7 @@ requires two blocks for each block of data. I'm sure users would be very
161
163
unhappy if their storage was suddenly cut in half! Instead of storing
162
164
everything in these metadata blocks, the littlefs uses a COW data structure
163
165
for files which is in turn pointed to by a metadata block. When
164
- we update a file, we create a copies of any blocks that are modified until
166
+ we update a file, we create copies of any blocks that are modified until
165
167
the metadata blocks are updated with the new copy. Once the metadata block
166
168
points to the new copy, we deallocate the old blocks that are no longer in use.
167
169
@@ -184,7 +186,7 @@ Here is what updating a one-block file may look like:
184
186
update data in file update metadata pair
185
187
```
186
188
187
- It doesn't matter if we lose power while writing block 5 with the new data ,
189
+ It doesn't matter if we lose power while writing new data to block 5 ,
188
190
since the old data remains unmodified in block 4. This example also
189
191
highlights how the atomic updates of the metadata blocks provide a
190
192
synchronization barrier for the rest of the littlefs.
@@ -206,7 +208,7 @@ files in filesystems. Of these, the littlefs uses a rather unique [COW](https://
206
208
data structure that allows the filesystem to reuse unmodified parts of the
207
209
file without additional metadata pairs.
208
210
209
- First lets consider storing files in a simple linked-list. What happens when
211
+ First lets consider storing files in a simple linked-list. What happens when we
210
212
append a block? We have to change the last block in the linked-list to point
211
213
to this new block, which means we have to copy out the last block, and change
212
214
the second-to-last block, and then the third-to-last, and so on until we've
@@ -240,8 +242,8 @@ Exhibit B: A backwards linked-list
240
242
```
241
243
242
244
However, a backwards linked-list does come with a rather glaring problem.
243
- Iterating over a file _ in order_ has a runtime of O(n^2). Gah! A quadratic
244
- runtime to just _ read_ a file? That's awful. Keep in mind reading files are
245
+ Iterating over a file _ in order_ has a runtime cost of O(n^2). Gah! A quadratic
246
+ runtime to just _ read_ a file? That's awful. Keep in mind reading files is
245
247
usually the most common filesystem operation.
246
248
247
249
To avoid this problem, the littlefs uses a multilayered linked-list. For
@@ -266,7 +268,7 @@ Exhibit C: A backwards CTZ skip-list
266
268
```
267
269
268
270
The additional pointers allow us to navigate the data-structure on disk
269
- much more efficiently than in a single linked-list.
271
+ much more efficiently than in a singly linked-list.
270
272
271
273
Taking exhibit C for example, here is the path from data block 5 to data
272
274
block 1. You can see how data block 3 was completely skipped:
@@ -289,15 +291,15 @@ The path to data block 0 is even more quick, requiring only two jumps:
289
291
290
292
We can find the runtime complexity by looking at the path to any block from
291
293
the block containing the most pointers. Every step along the path divides
292
- the search space for the block in half. This gives us a runtime of O(logn ).
294
+ the search space for the block in half. This gives us a runtime of O(log n ).
293
295
To get to the block with the most pointers, we can perform the same steps
294
- backwards, which puts the runtime at O(2logn ) = O(logn ). The interesting
296
+ backwards, which puts the runtime at O(2 log n ) = O(log n ). The interesting
295
297
part about this data structure is that this optimal path occurs naturally
296
298
if we greedily choose the pointer that covers the most distance without passing
297
299
our target block.
298
300
299
301
So now we have a representation of files that can be appended trivially with
300
- a runtime of O(1), and can be read with a worst case runtime of O(nlogn ).
302
+ a runtime of O(1), and can be read with a worst case runtime of O(n log n ).
301
303
Given that the the runtime is also divided by the amount of data we can store
302
304
in a block, this is pretty reasonable.
303
305
@@ -362,7 +364,7 @@ N = file size in bytes
362
364
363
365
And this works quite well, but is not trivial to calculate. This equation
364
366
requires O(n) to compute, which brings the entire runtime of reading a file
365
- to O(n^2logn ). Fortunately, the additional O(n) does not need to touch disk,
367
+ to O(n^2 log n ). Fortunately, the additional O(n) does not need to touch disk,
366
368
so it is not completely unreasonable. But if we could solve this equation into
367
369
a form that is easily computable, we can avoid a big slowdown.
368
370
@@ -379,11 +381,11 @@ unintuitive property:
379
381
![ mindblown] ( https://latex.codecogs.com/svg.latex?%5Csum_i%5En%5Cleft%28%5Ctext%7Bctz%7D%28i%29&plus ; 1%5Cright%29%20%3D%202n-%5Ctext%7Bpopcount%7D%28n%29 )
380
382
381
383
where:
382
- ctz(i ) = the number of trailing bits that are 0 in i
383
- popcount(i ) = the number of bits that are 1 in i
384
+ ctz(x ) = the number of trailing bits that are 0 in x
385
+ popcount(x ) = the number of bits that are 1 in x
384
386
385
387
It's a bit bewildering that these two seemingly unrelated bitwise instructions
386
- are related by this property. But if we start to disect this equation we can
388
+ are related by this property. But if we start to dissect this equation we can
387
389
see that it does hold. As n approaches infinity, we do end up with an average
388
390
overhead of 2 pointers as we find earlier. And popcount seems to handle the
389
391
error from this average as it accumulates in the CTZ skip-list.
@@ -410,8 +412,7 @@ a bit to avoid integer overflow:
410
412
![ formulaforoff] ( https://latex.codecogs.com/svg.latex?%5Cmathit%7Boff%7D%20%3D%20N%20-%20%5Cleft%28B-2%5Cfrac%7Bw%7D%7B8%7D%5Cright%29n%20-%20%5Cfrac%7Bw%7D%7B8%7D%5Ctext%7Bpopcount%7D%28n%29 )
411
413
412
414
The solution involves quite a bit of math, but computers are very good at math.
413
- We can now solve for the block index + offset while only needed to store the
414
- file size in O(1).
415
+ Now we can solve for both the block index and offset from the file size in O(1).
415
416
416
417
Here is what it might look like to update a file stored with a CTZ skip-list:
417
418
```
@@ -500,16 +501,17 @@ scanned to find the most recent free list, but once the list was found the
500
501
state of all free blocks becomes known.
501
502
502
503
However, this approach had several issues:
504
+
503
505
- There was a lot of nuanced logic for adding blocks to the free list without
504
506
modifying the blocks, since the blocks remain active until the metadata is
505
507
updated.
506
- - The free list had to support both additions and removals in fifo order while
508
+ - The free list had to support both additions and removals in FIFO order while
507
509
minimizing block erases.
508
510
- The free list had to handle the case where the file system completely ran
509
511
out of blocks and may no longer be able to add blocks to the free list.
510
512
- If we used a revision count to track the most recently updated free list,
511
513
metadata blocks that were left unmodified were ticking time bombs that would
512
- cause the system to go haywire if the revision count overflowed
514
+ cause the system to go haywire if the revision count overflowed.
513
515
- Every single metadata block wasted space to store these free list references.
514
516
515
517
Actually, to simplify, this approach had one massive glaring issue: complexity.
@@ -539,7 +541,7 @@ would have an abhorrent runtime.
539
541
So the littlefs compromises. It doesn't store a bitmap the size of the storage,
540
542
but it does store a little bit-vector that contains a fixed set lookahead
541
543
for block allocations. During a block allocation, the lookahead vector is
542
- checked for any free blocks, if there are none, the lookahead region jumps
544
+ checked for any free blocks. If there are none, the lookahead region jumps
543
545
forward and the entire filesystem is scanned for free blocks.
544
546
545
547
Here's what it might look like to allocate 4 blocks on a decently busy
@@ -622,7 +624,7 @@ So, as a solution, the littlefs adopted a sort of threaded tree. Each
622
624
directory not only contains pointers to all of its children, but also a
623
625
pointer to the next directory. These pointers create a linked-list that
624
626
is threaded through all of the directories in the filesystem. Since we
625
- only use this linked list to check for existance , the order doesn't actually
627
+ only use this linked list to check for existence , the order doesn't actually
626
628
matter. As an added plus, we can repurpose the pointer for the individual
627
629
directory linked-lists and avoid using any additional space.
628
630
@@ -773,7 +775,7 @@ deorphan step that simply iterates through every directory in the linked-list
773
775
and checks it against every directory entry in the filesystem to see if it
774
776
has a parent. The deorphan step occurs on the first block allocation after
775
777
boot, so orphans should never cause the littlefs to run out of storage
776
- prematurely. Note that the deorphan step never needs to run in a readonly
778
+ prematurely. Note that the deorphan step never needs to run in a read-only
777
779
filesystem.
778
780
779
781
## The move problem
@@ -883,7 +885,7 @@ a power loss will occur during filesystem activity. We still need to handle
883
885
the condition, but runtime during a power loss takes a back seat to the runtime
884
886
during normal operations.
885
887
886
- So what littlefs does is unelegantly simple. When littlefs moves a file, it
888
+ So what littlefs does is inelegantly simple. When littlefs moves a file, it
887
889
marks the file as "moving". This is stored as a single bit in the directory
888
890
entry and doesn't take up much space. Then littlefs moves the directory,
889
891
finishing with the complete remove of the "moving" directory entry.
@@ -979,7 +981,7 @@ if it exists elsewhere in the filesystem.
979
981
So now that we have all of the pieces of a filesystem, we can look at a more
980
982
subtle attribute of embedded storage: The wear down of flash blocks.
981
983
982
- The first concern for the littlefs, is that prefectly valid blocks can suddenly
984
+ The first concern for the littlefs, is that perfectly valid blocks can suddenly
983
985
become unusable. As a nice side-effect of using a COW data-structure for files,
984
986
we can simply move on to a different block when a file write fails. All
985
987
modifications to files are performed in copies, so we will only replace the
@@ -1151,7 +1153,7 @@ develops errors and needs to be moved.
1151
1153
1152
1154
## Wear leveling
1153
1155
1154
- The second concern for the littlefs, is that blocks in the filesystem may wear
1156
+ The second concern for the littlefs is that blocks in the filesystem may wear
1155
1157
unevenly. In this situation, a filesystem may meet an early demise where
1156
1158
there are no more non-corrupted blocks that aren't in use. It's common to
1157
1159
have files that were written once and left unmodified, wasting the potential
@@ -1171,7 +1173,7 @@ of wear leveling:
1171
1173
1172
1174
In littlefs's case, it's possible to use the revision count on metadata pairs
1173
1175
to approximate the wear of a metadata block. And combined with the COW nature
1174
- of files, littlefs could provide your usually implementation of dynamic wear
1176
+ of files, littlefs could provide your usual implementation of dynamic wear
1175
1177
leveling.
1176
1178
1177
1179
However, the littlefs does not. This is for a few reasons. Most notably, even
@@ -1210,9 +1212,9 @@ So, to summarize:
1210
1212
metadata block is active
1211
1213
4 . Directory blocks contain either references to other directories or files
1212
1214
5 . Files are represented by copy-on-write CTZ skip-lists which support O(1)
1213
- append and O(nlogn ) reading
1215
+ append and O(n log n ) reading
1214
1216
6 . Blocks are allocated by scanning the filesystem for used blocks in a
1215
- fixed-size lookahead region is that stored in a bit-vector
1217
+ fixed-size lookahead region that is stored in a bit-vector
1216
1218
7 . To facilitate scanning the filesystem, all directories are part of a
1217
1219
linked-list that is threaded through the entire filesystem
1218
1220
8 . If a block develops an error, the littlefs allocates a new block, and
0 commit comments