|
59 | 59 |
|
60 | 60 | -define(SERVER, ?MODULE).
|
61 | 61 |
|
62 |
| --record(dqstate, {msg_location, % where are messages? |
| 62 | +-record(dqstate, {msg_location, %% where are messages? |
63 | 63 | file_summary, %% what's in the files?
|
64 | 64 | sequences, %% next read and write for each q
|
65 | 65 | current_file_num, %% current file name as number
|
|
71 | 71 | read_file_handles_limit %% how many file handles can we open?
|
72 | 72 | }).
|
73 | 73 |
|
| 74 | +%% The components: |
| 75 | +%% |
| 76 | +%% MsgLocation: this is a dets table which contains: |
| 77 | +%% {MsgId, RefCount, File, Offset, TotalSize} |
| 78 | +%% FileSummary: this is an ets table which contains: |
| 79 | +%% {File, ValidTotalSize, ContiguousTop, Left, Right} |
| 80 | +%% Sequences: this is an ets table which contains: |
| 81 | +%% {Q, ReadSeqId, WriteSeqId} |
| 82 | +%% rabbit_disk_queue: this is an mnesia table which contains: |
| 83 | +%% #dq_msg_loc { queue_and_seq_id = {Q, SeqId}, |
| 84 | +%% is_delivered = IsDelivered, |
| 85 | +%% msg_id = MsgId |
| 86 | +%% } |
| 87 | +%% |
| 88 | + |
| 89 | +%% The basic idea is that messages are appended to the current file up |
| 90 | +%% until that file becomes too big (> file_size_limit). At that point, |
| 91 | +%% the file is closed and a new file is created on the _right_ of the |
| 92 | +%% old file which is used for new messages. Files are named |
| 93 | +%% numerically ascending, thus the file with the lowest name is the |
| 94 | +%% eldest file. |
| 95 | +%% |
| 96 | +%% We need to keep track of which messages are in which files (this is |
| 97 | +%% the MsgLocation table); how much useful data is in each file and |
| 98 | +%% which files are on the left and right of each other. This is the |
| 99 | +%% purpose of the FileSummary table. |
| 100 | +%% |
| 101 | +%% As messages are removed from files, holes appear in these |
| 102 | +%% files. The field ValidTotalSize contains the total amount of useful |
| 103 | +%% data left in the file, whilst ContiguousTop contains the amount of |
| 104 | +%% valid data right at the start of each file. These are needed for |
| 105 | +%% garbage collection. |
| 106 | +%% |
| 107 | +%% On publish, we write the message to disk, record the changes to |
| 108 | +%% FileSummary and MsgLocation, and, should this be either a plain |
| 109 | +%% publish, or followed by a tx_commit, we record the message in the |
| 110 | +%% mnesia table. Sequences exists to enforce ordering of messages as |
| 111 | +%% they are published within a queue. |
| 112 | +%% |
| 113 | +%% On delivery, we read the next message to be read from disk |
| 114 | +%% (according to the ReadSeqId for the given queue) and record in the |
| 115 | +%% mnesia table that the message has been delivered. |
| 116 | +%% |
| 117 | +%% On ack we remove the relevant entry from MsgLocation, update |
| 118 | +%% FileSummary and delete from the mnesia table. |
| 119 | +%% |
| 120 | +%% In order to avoid extra mnesia searching, we return the SeqId |
| 121 | +%% during delivery which must be returned in ack - it is not possible |
| 122 | +%% to ack from MsgId alone. |
| 123 | + |
| 124 | +%% As messages are ack'd, holes develop in the files. When we discover |
| 125 | +%% that either a file is now empty or that it can be combined with the |
| 126 | +%% useful data in either its left or right file, we compact the two |
| 127 | +%% files together. This keeps disk utilisation high and aids |
| 128 | +%% performance. |
| 129 | +%% |
| 130 | +%% Given the compaction between two files, the left file is considered |
| 131 | +%% the ultimate destination for the good data in the right file. If |
| 132 | +%% necessary, the good data in the left file which is fragmented |
| 133 | +%% throughout the file is written out to a temporary file, then read |
| 134 | +%% back in to form a contiguous chunk of good data at the start of the |
| 135 | +%% left file. Thus the left file is garbage collected and |
| 136 | +%% compacted. Then the good data from the right file is copied onto |
| 137 | +%% the end of the left file. MsgLocation and FileSummary tables are |
| 138 | +%% updated. |
| 139 | +%% |
| 140 | +%% On startup, we scan the files we discover, dealing with the |
| 141 | +%% possibilites of a crash have occured during a compaction (this |
| 142 | +%% consists of tidyup - the compaction is deliberately designed such |
| 143 | +%% that data is duplicated on disk rather than risking it being lost), |
| 144 | +%% and rebuild the dets and ets tables (MsgLocation, FileSummary, |
| 145 | +%% Sequences) from what we find. We ensure that the messages we have |
| 146 | +%% discovered on disk match exactly with the messages recorded in the |
| 147 | +%% mnesia table. |
| 148 | + |
| 149 | +%% MsgLocation is deliberately a dets table, and the mnesia table is |
| 150 | +%% set to be a disk_only_table in order to ensure that we are not RAM |
| 151 | +%% constrained. |
| 152 | + |
74 | 153 | %% ---- PUBLIC API ----
|
75 | 154 |
|
76 | 155 | start_link(FileSizeLimit, ReadFileHandlesLimit) ->
|
|
0 commit comments