Skip to content

Commit 65344d0

Browse files
improve insertion after growth and add decision docs
1 parent 177f63f commit 65344d0

File tree

3 files changed

+16
-9
lines changed

3 files changed

+16
-9
lines changed

libc/docs/dev/undefined_behavior.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,3 +62,10 @@ Often the standard will imply an intended behavior through what it states is und
6262
Ignoring Bug-For-Bug Compatibility
6363
----------------------------------
6464
Any long running implementations will have bugs and deviations from the standard. Hyrum's Law states that “all observable behaviors of your system will be depended on by somebody” which includes these bugs. An example of a long-standing bug is glibc's scanf float parsing behavior. The behavior is specifically defined in the standard, but it isn't adhered to by all libc implementations. There is a longstanding bug in glibc where it incorrectly parses the string 100er and this caused the C standard to add that specific example to the definition for scanf. The intended behavior is for scanf, when parsing a float, to parse the longest possibly valid prefix and then accept it if and only if that complete parsed value is a float. In the case of 100er the longest possibly valid prefix is 100e but the float parsed from that string is only 100. Since there is no number after the e it shouldn't be included in the float, so scanf should return a parsing error. For LLVM's libc it was decided to follow the standard, even though glibc's version is slightly simpler to implement and this edge case is rare. Following the standard must be the first priority, since that's the goal of the library.
65+
66+
Design Decisions
67+
================
68+
69+
Resizable Tables for hsearch
70+
----------------------------
71+
The POSIX.1 standard does not delineate the behavior consequent to invoking hsearch or hdestroy without prior initialization of the hash table via hcreate. Furthermore, the standard does not articulate the outcomes of successive invocations of hsearch absent intervening hdestroy calls. Libraries such as MUSL and Glibc do not incorporate checks for these scenarios, potentially leading to memory corruption or leakage. Conversely, FreeBSD's libc and Bionic adopt a distinct methodology, automatically initializing the hash table to a minimal size if it is found uninitialized, and proceeding to destroy the table only if initialization has occurred. This approach also renders hcreate redundant if an initialized hash table is already present. Given that the hash table commences with a minimal size, resizing becomes necessary to accommodate additional user insertions. LLVM's libc mirrors the approach of FreeBSD's libc and Bionic, owing to its enhanced robustness and user-friendliness. Notably, such resizing behavior aligns with POSIX.1 standards, which explicitly permit implementations to modify the capacity of the hash table.

libc/src/__support/HashTable/table.h

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -184,7 +184,7 @@ struct HashTable {
184184
// Nor does the routine check if the table is full.
185185
// This is only to be used in grow() where we insert all existing entries
186186
// into a new table. Hence, the requirements are naturally satisfied.
187-
LIBC_INLINE void unsafe_insert(ENTRY item) {
187+
LIBC_INLINE ENTRY *unsafe_insert(ENTRY item) {
188188
uint64_t primary = oneshot_hash(item.key);
189189
uint8_t secondary = secondary_hash(primary);
190190
ProbeSequence sequence{static_cast<size_t>(primary), 0, entries_mask};
@@ -199,7 +199,7 @@ struct HashTable {
199199
entry(index).key = item.key;
200200
entry(index).data = item.data;
201201
available_slots--;
202-
return;
202+
return &entry(index);
203203
}
204204
}
205205
}
@@ -238,10 +238,8 @@ struct HashTable {
238238
// resized sccuessfully: clean up the old table and use the new one
239239
deallocate(table);
240240
table = new_table;
241-
// hash also need to be recomputed since the hash state is updated
242-
primary = table->oneshot_hash(item.key);
243-
index = table->find(item.key, primary);
244-
slot = &table->entry(index);
241+
// it is still valid to use the fastpath insertion.
242+
return table->unsafe_insert(item);
245243
}
246244

247245
table->set_ctrl(index, secondary_hash(primary));

libc/test/src/__support/HashTable/table_test.cpp

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -100,9 +100,11 @@ TEST(LlvmLibcTableTest, Insertion) {
100100
static_cast<ENTRY *>(nullptr));
101101
}
102102

103-
// one more insert should grow successfully.
104-
ASSERT_NE(HashTable::insert(table, {keys[CAP].bytes, keys[CAP].bytes}),
105-
static_cast<ENTRY *>(nullptr));
103+
// One more insert should grow the table successfully. We test the value
104+
// here because the grow finishes with a fastpath insertion that is different
105+
// from the normal insertion.
106+
ASSERT_EQ(HashTable::insert(table, {keys[CAP].bytes, keys[CAP].bytes})->data,
107+
static_cast<void *>(keys[CAP].bytes));
106108

107109
for (size_t i = 0; i <= CAP; ++i) {
108110
ASSERT_EQ(strcmp(table->find(keys[i].bytes)->key, keys[i].bytes), 0);

0 commit comments

Comments
 (0)