Skip to content

Commit d82e34c

Browse files
authored
Merge pull request #21 from aldot/doc-tweaks
documentation touch up, take 2
2 parents 6d8e0e2 + 436707c commit d82e34c

File tree

3 files changed

+60
-58
lines changed

3 files changed

+60
-58
lines changed

DESIGN.md

Lines changed: 45 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -27,16 +27,17 @@ cheap, and can be very granular. For NOR flash specifically, byte-level
2727
programs are quite common. Erasing, however, requires an expensive operation
2828
that forces the state of large blocks of memory to reset in a destructive
2929
reaction that gives flash its name. The [Uncyclopedia entry](https://en.wikipedia.org/wiki/Flash_memory)
30-
has more information if you are interesting in how this works.
30+
has more information if you are interested in how this works.
3131

3232
This leaves us with an interesting set of limitations that can be simplified
3333
to three strong requirements:
3434

3535
1. **Power-loss resilient** - This is the main goal of the littlefs and the
36-
focus of this project. Embedded systems are usually designed without a
37-
shutdown routine and a notable lack of user interface for recovery, so
38-
filesystems targeting embedded systems must be prepared to lose power an
39-
any given time.
36+
focus of this project.
37+
38+
Embedded systems are usually designed without a shutdown routine and a
39+
notable lack of user interface for recovery, so filesystems targeting
40+
embedded systems must be prepared to lose power at any given time.
4041

4142
Despite this state of things, there are very few embedded filesystems that
4243
handle power loss in a reasonable manner, and most can become corrupted if
@@ -52,7 +53,8 @@ to three strong requirements:
5253
which stores a file allocation table (FAT) at a specific offset from the
5354
beginning of disk. Every block allocation will update this table, and after
5455
100,000 updates, the block will likely go bad, rendering the filesystem
55-
unusable even if there are many more erase cycles available on the storage.
56+
unusable even if there are many more erase cycles available on the storage
57+
as a whole.
5658

5759
3. **Bounded RAM/ROM** - Even with the design difficulties presented by the
5860
previous two limitations, we have already seen several flash filesystems
@@ -72,29 +74,29 @@ to three strong requirements:
7274

7375
## Existing designs?
7476

75-
There are of course, many different existing filesystem. Heres a very rough
77+
There are of course, many different existing filesystem. Here is a very rough
7678
summary of the general ideas behind some of them.
7779

7880
Most of the existing filesystems fall into the one big category of filesystem
7981
designed in the early days of spinny magnet disks. While there is a vast amount
8082
of interesting technology and ideas in this area, the nature of spinny magnet
8183
disks encourage properties, such as grouping writes near each other, that don't
8284
make as much sense on recent storage types. For instance, on flash, write
83-
locality is not important and can actually increase wear destructively.
85+
locality is not important and can actually increase wear.
8486

8587
One of the most popular designs for flash filesystems is called the
8688
[logging filesystem](https://en.wikipedia.org/wiki/Log-structured_file_system).
8789
The flash filesystems [jffs](https://en.wikipedia.org/wiki/JFFS)
88-
and [yaffs](https://en.wikipedia.org/wiki/YAFFS) are good examples. In
89-
logging filesystem, data is not store in a data structure on disk, but instead
90+
and [yaffs](https://en.wikipedia.org/wiki/YAFFS) are good examples. In a
91+
logging filesystem, data is not stored in a data structure on disk, but instead
9092
the changes to the files are stored on disk. This has several neat advantages,
91-
such as the fact that the data is written in a cyclic log format naturally
93+
such as the fact that the data is written in a cyclic log format and naturally
9294
wear levels as a side effect. And, with a bit of error detection, the entire
9395
filesystem can easily be designed to be resilient to power loss. The
94-
journalling component of most modern day filesystems is actually a reduced
96+
journaling component of most modern day filesystems is actually a reduced
9597
form of a logging filesystem. However, logging filesystems have a difficulty
9698
scaling as the size of storage increases. And most filesystems compensate by
97-
caching large parts of the filesystem in RAM, a strategy that is unavailable
99+
caching large parts of the filesystem in RAM, a strategy that is inappropriate
98100
for embedded systems.
99101

100102
Another interesting filesystem design technique is that of [copy-on-write (COW)](https://en.wikipedia.org/wiki/Copy-on-write).
@@ -107,14 +109,14 @@ where the COW data structures are synchronized.
107109
## Metadata pairs
108110

109111
The core piece of technology that provides the backbone for the littlefs is
110-
the concept of metadata pairs. The key idea here, is that any metadata that
112+
the concept of metadata pairs. The key idea here is that any metadata that
111113
needs to be updated atomically is stored on a pair of blocks tagged with
112114
a revision count and checksum. Every update alternates between these two
113115
pairs, so that at any time there is always a backup containing the previous
114116
state of the metadata.
115117

116118
Consider a small example where each metadata pair has a revision count,
117-
a number as data, and the xor of the block as a quick checksum. If
119+
a number as data, and the XOR of the block as a quick checksum. If
118120
we update the data to a value of 9, and then to a value of 5, here is
119121
what the pair of blocks may look like after each update:
120122
```
@@ -130,7 +132,7 @@ what the pair of blocks may look like after each update:
130132
After each update, we can find the most up to date value of data by looking
131133
at the revision count.
132134

133-
Now consider what the blocks may look like if we suddenly loss power while
135+
Now consider what the blocks may look like if we suddenly lose power while
134136
changing the value of data to 5:
135137
```
136138
block 1 block 2 block 1 block 2 block 1 block 2
@@ -149,7 +151,7 @@ check our checksum we notice that block 1 was corrupted. So we fall back to
149151
block 2 and use the value 9.
150152

151153
Using this concept, the littlefs is able to update metadata blocks atomically.
152-
There are a few other tweaks, such as using a 32 bit crc and using sequence
154+
There are a few other tweaks, such as using a 32 bit CRC and using sequence
153155
arithmetic to handle revision count overflow, but the basic concept
154156
is the same. These metadata pairs define the backbone of the littlefs, and the
155157
rest of the filesystem is built on top of these atomic updates.
@@ -161,7 +163,7 @@ requires two blocks for each block of data. I'm sure users would be very
161163
unhappy if their storage was suddenly cut in half! Instead of storing
162164
everything in these metadata blocks, the littlefs uses a COW data structure
163165
for files which is in turn pointed to by a metadata block. When
164-
we update a file, we create a copies of any blocks that are modified until
166+
we update a file, we create copies of any blocks that are modified until
165167
the metadata blocks are updated with the new copy. Once the metadata block
166168
points to the new copy, we deallocate the old blocks that are no longer in use.
167169

@@ -184,7 +186,7 @@ Here is what updating a one-block file may look like:
184186
update data in file update metadata pair
185187
```
186188

187-
It doesn't matter if we lose power while writing block 5 with the new data,
189+
It doesn't matter if we lose power while writing new data to block 5,
188190
since the old data remains unmodified in block 4. This example also
189191
highlights how the atomic updates of the metadata blocks provide a
190192
synchronization barrier for the rest of the littlefs.
@@ -206,7 +208,7 @@ files in filesystems. Of these, the littlefs uses a rather unique [COW](https://
206208
data structure that allows the filesystem to reuse unmodified parts of the
207209
file without additional metadata pairs.
208210

209-
First lets consider storing files in a simple linked-list. What happens when
211+
First lets consider storing files in a simple linked-list. What happens when we
210212
append a block? We have to change the last block in the linked-list to point
211213
to this new block, which means we have to copy out the last block, and change
212214
the second-to-last block, and then the third-to-last, and so on until we've
@@ -240,8 +242,8 @@ Exhibit B: A backwards linked-list
240242
```
241243

242244
However, a backwards linked-list does come with a rather glaring problem.
243-
Iterating over a file _in order_ has a runtime of O(n^2). Gah! A quadratic
244-
runtime to just _read_ a file? That's awful. Keep in mind reading files are
245+
Iterating over a file _in order_ has a runtime cost of O(n^2). Gah! A quadratic
246+
runtime to just _read_ a file? That's awful. Keep in mind reading files is
245247
usually the most common filesystem operation.
246248

247249
To avoid this problem, the littlefs uses a multilayered linked-list. For
@@ -266,7 +268,7 @@ Exhibit C: A backwards CTZ skip-list
266268
```
267269

268270
The additional pointers allow us to navigate the data-structure on disk
269-
much more efficiently than in a single linked-list.
271+
much more efficiently than in a singly linked-list.
270272

271273
Taking exhibit C for example, here is the path from data block 5 to data
272274
block 1. You can see how data block 3 was completely skipped:
@@ -289,15 +291,15 @@ The path to data block 0 is even more quick, requiring only two jumps:
289291

290292
We can find the runtime complexity by looking at the path to any block from
291293
the block containing the most pointers. Every step along the path divides
292-
the search space for the block in half. This gives us a runtime of O(logn).
294+
the search space for the block in half. This gives us a runtime of O(log n).
293295
To get to the block with the most pointers, we can perform the same steps
294-
backwards, which puts the runtime at O(2logn) = O(logn). The interesting
296+
backwards, which puts the runtime at O(2 log n) = O(log n). The interesting
295297
part about this data structure is that this optimal path occurs naturally
296298
if we greedily choose the pointer that covers the most distance without passing
297299
our target block.
298300

299301
So now we have a representation of files that can be appended trivially with
300-
a runtime of O(1), and can be read with a worst case runtime of O(nlogn).
302+
a runtime of O(1), and can be read with a worst case runtime of O(n log n).
301303
Given that the the runtime is also divided by the amount of data we can store
302304
in a block, this is pretty reasonable.
303305

@@ -362,7 +364,7 @@ N = file size in bytes
362364

363365
And this works quite well, but is not trivial to calculate. This equation
364366
requires O(n) to compute, which brings the entire runtime of reading a file
365-
to O(n^2logn). Fortunately, the additional O(n) does not need to touch disk,
367+
to O(n^2 log n). Fortunately, the additional O(n) does not need to touch disk,
366368
so it is not completely unreasonable. But if we could solve this equation into
367369
a form that is easily computable, we can avoid a big slowdown.
368370

@@ -379,11 +381,11 @@ unintuitive property:
379381
![mindblown](https://latex.codecogs.com/svg.latex?%5Csum_i%5En%5Cleft%28%5Ctext%7Bctz%7D%28i%29+1%5Cright%29%20%3D%202n-%5Ctext%7Bpopcount%7D%28n%29)
380382

381383
where:
382-
ctz(i) = the number of trailing bits that are 0 in i
383-
popcount(i) = the number of bits that are 1 in i
384+
ctz(x) = the number of trailing bits that are 0 in x
385+
popcount(x) = the number of bits that are 1 in x
384386

385387
It's a bit bewildering that these two seemingly unrelated bitwise instructions
386-
are related by this property. But if we start to disect this equation we can
388+
are related by this property. But if we start to dissect this equation we can
387389
see that it does hold. As n approaches infinity, we do end up with an average
388390
overhead of 2 pointers as we find earlier. And popcount seems to handle the
389391
error from this average as it accumulates in the CTZ skip-list.
@@ -410,8 +412,7 @@ a bit to avoid integer overflow:
410412
![formulaforoff](https://latex.codecogs.com/svg.latex?%5Cmathit%7Boff%7D%20%3D%20N%20-%20%5Cleft%28B-2%5Cfrac%7Bw%7D%7B8%7D%5Cright%29n%20-%20%5Cfrac%7Bw%7D%7B8%7D%5Ctext%7Bpopcount%7D%28n%29)
411413

412414
The solution involves quite a bit of math, but computers are very good at math.
413-
We can now solve for the block index + offset while only needed to store the
414-
file size in O(1).
415+
Now we can solve for both the block index and offset from the file size in O(1).
415416

416417
Here is what it might look like to update a file stored with a CTZ skip-list:
417418
```
@@ -500,16 +501,17 @@ scanned to find the most recent free list, but once the list was found the
500501
state of all free blocks becomes known.
501502

502503
However, this approach had several issues:
504+
503505
- There was a lot of nuanced logic for adding blocks to the free list without
504506
modifying the blocks, since the blocks remain active until the metadata is
505507
updated.
506-
- The free list had to support both additions and removals in fifo order while
508+
- The free list had to support both additions and removals in FIFO order while
507509
minimizing block erases.
508510
- The free list had to handle the case where the file system completely ran
509511
out of blocks and may no longer be able to add blocks to the free list.
510512
- If we used a revision count to track the most recently updated free list,
511513
metadata blocks that were left unmodified were ticking time bombs that would
512-
cause the system to go haywire if the revision count overflowed
514+
cause the system to go haywire if the revision count overflowed.
513515
- Every single metadata block wasted space to store these free list references.
514516

515517
Actually, to simplify, this approach had one massive glaring issue: complexity.
@@ -539,7 +541,7 @@ would have an abhorrent runtime.
539541
So the littlefs compromises. It doesn't store a bitmap the size of the storage,
540542
but it does store a little bit-vector that contains a fixed set lookahead
541543
for block allocations. During a block allocation, the lookahead vector is
542-
checked for any free blocks, if there are none, the lookahead region jumps
544+
checked for any free blocks. If there are none, the lookahead region jumps
543545
forward and the entire filesystem is scanned for free blocks.
544546

545547
Here's what it might look like to allocate 4 blocks on a decently busy
@@ -622,7 +624,7 @@ So, as a solution, the littlefs adopted a sort of threaded tree. Each
622624
directory not only contains pointers to all of its children, but also a
623625
pointer to the next directory. These pointers create a linked-list that
624626
is threaded through all of the directories in the filesystem. Since we
625-
only use this linked list to check for existance, the order doesn't actually
627+
only use this linked list to check for existence, the order doesn't actually
626628
matter. As an added plus, we can repurpose the pointer for the individual
627629
directory linked-lists and avoid using any additional space.
628630

@@ -773,7 +775,7 @@ deorphan step that simply iterates through every directory in the linked-list
773775
and checks it against every directory entry in the filesystem to see if it
774776
has a parent. The deorphan step occurs on the first block allocation after
775777
boot, so orphans should never cause the littlefs to run out of storage
776-
prematurely. Note that the deorphan step never needs to run in a readonly
778+
prematurely. Note that the deorphan step never needs to run in a read-only
777779
filesystem.
778780

779781
## The move problem
@@ -883,7 +885,7 @@ a power loss will occur during filesystem activity. We still need to handle
883885
the condition, but runtime during a power loss takes a back seat to the runtime
884886
during normal operations.
885887

886-
So what littlefs does is unelegantly simple. When littlefs moves a file, it
888+
So what littlefs does is inelegantly simple. When littlefs moves a file, it
887889
marks the file as "moving". This is stored as a single bit in the directory
888890
entry and doesn't take up much space. Then littlefs moves the directory,
889891
finishing with the complete remove of the "moving" directory entry.
@@ -979,7 +981,7 @@ if it exists elsewhere in the filesystem.
979981
So now that we have all of the pieces of a filesystem, we can look at a more
980982
subtle attribute of embedded storage: The wear down of flash blocks.
981983

982-
The first concern for the littlefs, is that prefectly valid blocks can suddenly
984+
The first concern for the littlefs, is that perfectly valid blocks can suddenly
983985
become unusable. As a nice side-effect of using a COW data-structure for files,
984986
we can simply move on to a different block when a file write fails. All
985987
modifications to files are performed in copies, so we will only replace the
@@ -1151,7 +1153,7 @@ develops errors and needs to be moved.
11511153

11521154
## Wear leveling
11531155

1154-
The second concern for the littlefs, is that blocks in the filesystem may wear
1156+
The second concern for the littlefs is that blocks in the filesystem may wear
11551157
unevenly. In this situation, a filesystem may meet an early demise where
11561158
there are no more non-corrupted blocks that aren't in use. It's common to
11571159
have files that were written once and left unmodified, wasting the potential
@@ -1171,7 +1173,7 @@ of wear leveling:
11711173

11721174
In littlefs's case, it's possible to use the revision count on metadata pairs
11731175
to approximate the wear of a metadata block. And combined with the COW nature
1174-
of files, littlefs could provide your usually implementation of dynamic wear
1176+
of files, littlefs could provide your usual implementation of dynamic wear
11751177
leveling.
11761178

11771179
However, the littlefs does not. This is for a few reasons. Most notably, even
@@ -1210,9 +1212,9 @@ So, to summarize:
12101212
metadata block is active
12111213
4. Directory blocks contain either references to other directories or files
12121214
5. Files are represented by copy-on-write CTZ skip-lists which support O(1)
1213-
append and O(nlogn) reading
1215+
append and O(n log n) reading
12141216
6. Blocks are allocated by scanning the filesystem for used blocks in a
1215-
fixed-size lookahead region is that stored in a bit-vector
1217+
fixed-size lookahead region that is stored in a bit-vector
12161218
7. To facilitate scanning the filesystem, all directories are part of a
12171219
linked-list that is threaded through the entire filesystem
12181220
8. If a block develops an error, the littlefs allocates a new block, and

0 commit comments

Comments
 (0)