From ff031ab388e00afd5216cfb98505e4b59e9cfae2 Mon Sep 17 00:00:00 2001 From: David Leeds Date: Tue, 16 Jun 2020 14:46:48 -0700 Subject: [PATCH] hashmap: v2.0 initial commit (new API + algorithm improvements) Hashmap 2.0 Highlights: * New generic and type-safe API. We no longer need to use a macro to generate type-safe wrapper functions. * Improved linear probing algorithm. The previous algorithm could fail on insert, rehash, or remove if a particularly poor hash function was provided. The new algorithm can never fail, even with a worst-case hash function. This adds user confidence and reduces failure modes. * Added a supplemental hash function. Linear probing is especially sensitive to clustering due to poor hash functions. Since the hash function is user-supplied, adding a supplemental hash function provides more consistent performance. * Now, always provide hashmap statistics API with no additional overhead to ordinary hashmap operations. * Now, do lazy allocation on init. We reserve no memory on the heap until the first item is added. * Default hashmap size is reduced to 128 elements. * A hashmap_reserve() function was added to pre-allocate the hashmap. * hashmap_foreach macros have been added to hide the complexities of iterator usage and streamline iteration. --- CMakeLists.txt | 5 + README.md | 128 ++++---- include/hashmap.h | 502 ++++++++++++++++++---------- include/hashmap_base.h | 56 ++++ src/hashmap.c | 719 +++++++++++++++++++---------------------- test/CMakeLists.txt | 14 +- test/hashmap_example.c | 98 ++++++ test/hashmap_test.c | 289 ++++++++--------- 8 files changed, 1036 insertions(+), 775 deletions(-) create mode 100644 include/hashmap_base.h create mode 100644 test/hashmap_example.c diff --git a/CMakeLists.txt b/CMakeLists.txt index 2012fea..ee48f43 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -88,3 +88,8 @@ export(EXPORT hashmap-targets # Register package in user's package registry export(PACKAGE HashMap) + +############################################## +# Build unit test +enable_testing() +add_subdirectory(test) diff --git a/README.md b/README.md index 76c4a2a..6aec106 100644 --- a/README.md +++ b/README.md @@ -1,90 +1,106 @@ # hashmap -Flexible hashmap implementation in C using open addressing and linear probing for collision resolution. +Templated type-safe hashmap implementation in C using open addressing and linear probing for collision resolution. ### Summary -This project came into existence because there are a notable lack of flexible and easy to use data structures available in C. Sure, higher level languages have built-in libraries, but plenty of embedded projects or higher level libraries start with core C code. It was undesirable to add a bulky library like Glib as a dependency to my projects, or grapple with a restrictive license agreement. Searching for "C hashmap" yielded results with questionable algorithms and code quality, projects with difficult or inflexible interfaces, or projects with less desirable licenses. I decided it was time to create my own. +This project came into existence because there are a notable lack of flexible and easy to use data structures available in C. C data structures with efficient, type-safe interfaces are virtually non-existent. Sure, higher level languages have built-in libraries and templated classes, but plenty of embedded projects or higher level libraries are implemented in C. It was undesirable to add a bulky library like Glib as a dependency to my projects, or grapple with a restrictive license agreement. Searching for "C hashmap" yielded results with questionable algorithms and code quality, projects with difficult or inflexible interfaces, or projects with less desirable licenses. I decided it was time to create my own. ### Goals * **To scale gracefully to the full capacity of the numeric primitives in use.** E.g. on a 32-bit machine, you should be able to load a billion+ entries without hitting any bugs relating to integer overflows. Lookups on a hashtable with a billion entries should be performed in close to constant time, no different than lookups in a hashtable with 20 entries. Automatic rehashing occurs and maintains a load factor of 0.75 or less. -* **To provide a clean and easy-to-use interface.** C data structures often struggle to strike a balance between flexibility and ease of use. To this end, I provided a generic interface using void pointers for keys and data, and macros to generate type-specific wrapper functions, if desired. -* **To enable easy iteration and safe entry removal during iteration.** Applications often need these features, and the data structure should not hold them back. Both an iterator interface and a foreach function was provided to satisfy various use-cases. This hashmap also uses an open addressing scheme, which has superior iteration performance to a similar hashmap implemented using separate chaining (buckets with linked lists). This is because fewer instructions are needed per iteration, and array traversal has superior cache performance than linked list traversal. +* **To provide a clean and easy-to-use interface.** C data structures often struggle to strike a balance between flexibility and ease of use. To this end, I wrapped a generic C backend implementation with light-weight pre-processor macros to create a templated type-safe interface. All required type information is encoded in the hashmap declaration using the`HASHMAP()` macro. Unlike with header-only macro libraries, there is no code duplication or performance disadvantage over a traditional library with a non-type-safe `void *` interface. +* **To enable easy iteration and safe entry removal during iteration.** Applications often need these features, and the data structure should not hold them back. Easy to use `hashmap_foreach()` macros and a more flexible iterator interface are provided. This hashmap also uses an open addressing scheme, which has superior iteration performance to a similar hashmap implemented using separate chaining (buckets with linked lists). This is because fewer instructions are needed per iteration, and array traversal has superior cache performance than linked list traversal. * **To use a very unrestrictive software license.** Using no license was an option, but I wanted to allow the code to be tracked, simply for my own edification. I chose the MIT license because it is the most common open source license in use, and it grants full rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell the code. Basically, take this code and do what you want with it. Just be nice and leave the license comment and my name at top of the file. Feel free to add your name if you are modifying and redistributing. ### Code Example ```C #include +#include #include +#include +#include #include /* Some sample data structure with a string key */ struct blob { - char key[32]; - size_t data_len; - unsigned char data[1024]; + char key[32]; + size_t data_len; + unsigned char data[1024]; }; -/* Declare type-specific blob_hashmap_* functions with this handy macro */ -HASHMAP_FUNCS_CREATE(blob, const char, struct blob) - +/* + * Contrived function to allocate blob structures and populate + * them with randomized data. + * + * Returns NULL when there are no more blobs to load. + */ struct blob *blob_load(void) { - struct blob *b; - /* - * Hypothetical function that allocates and loads blob structures - * from somewhere. Returns NULL when there are no more blobs to load. - */ - return b; -} + static size_t count = 0; + struct blob *b; + + if (++count > 100) { + return NULL; + } + + if ((b = malloc(sizeof(*b))) == NULL) { + return NULL; + } + snprintf(b->key, sizeof(b->key), "%02lx", random() % 100); + b->data_len = random() % 10; + memset(b->data, random(), b->data_len); -/* Hashmap structure */ -struct hashmap map; + return b; +} int main(int argc, char **argv) { - struct blob *b; - struct hashmap_iter *iter; + /* Declare type-specific hashmap structure */ + HASHMAP(char, struct blob) map; + const char *key; + struct blob *b; + void *temp; + int r; + + /* Initialize with default string key hash function and comparator */ + hashmap_init(&map, hashmap_hash_string, strcmp); + + /* Load some sample data into the map and discard duplicates */ + while ((b = blob_load()) != NULL) { + r = hashmap_put(&map, b->key, b); + if (r < 0) { + /* Expect -EEXIST return value for duplicates */ + printf("putting blob[%s] failed: %s\n", b->key, strerror(-r)); + free(b); + } + } - /* Initialize with default string key functions and init size */ - hashmap_init(&map, hashmap_hash_string, hashmap_compare_string, 0); + /* Lookup a blob with key "AbCdEf" */ + b = hashmap_get(&map, "AbCdEf"); + if (b) { + printf("Found blob[%s]\n", b->key); + } - /* Load some sample data into the map and discard duplicates */ - while ((b = blob_load()) != NULL) { - if (blob_hashmap_put(&map, b->key, b) != b) { - printf("discarding blob with duplicate key: %s\n", b->key); - free(b); + /* Iterate through all blobs and print each one */ + hashmap_foreach(key, b, &map) { + printf("blob[%s]: data_len %zu bytes\n", key, b->data_len); } - } - - /* Lookup a blob with key "AbCdEf" */ - b = blob_hashmap_get(&map, "AbCdEf"); - if (b) { - printf("Found blob[%s]\n", b->key); - } - - /* Iterate through all blobs and print each one */ - for (iter = hashmap_iter(&map); iter; iter = hashmap_iter_next(&map, iter)) { - printf("blob[%s]: data_len %zu bytes\n", blob_hashmap_iter_get_key(iter), - blob_hashmap_iter_get_data(iter)->data_len); - } - - /* Remove all blobs with no data */ - iter = hashmap_iter(&map); - while (iter) { - b = blob_hashmap_iter_get_data(iter); - if (b->data_len == 0) { - iter = hashmap_iter_remove(&map, iter); - free(b); - } else { - iter = hashmap_iter_next(&map, iter); + + /* Remove all blobs with no data (using remove-safe foreach macro) */ + hashmap_foreach_data_safe(b, &map, temp) { + if (b->data_len == 0) { + printf("Discarding blob[%s] with no data\n", b->key); + hashmap_remove(&map, b->key); + free(b); + } } - } - /* Free all allocated resources associated with map and reset its state */ - hashmap_destroy(&map); + /* Cleanup time: free all the blobs, and destruct the hashmap */ + hashmap_foreach_data(b, &map) { + free(b); + } + hashmap_cleanup(&map); - return 0; + return 0; } - ``` diff --git a/include/hashmap.h b/include/hashmap.h index 462828c..4770c7a 100644 --- a/include/hashmap.h +++ b/include/hashmap.h @@ -1,5 +1,5 @@ /* - * Copyright (c) 2016-2018 David Leeds + * Copyright (c) 2016-2020 David Leeds * * Hashmap is free software; you can redistribute it and/or modify * it under the terms of the MIT license. See LICENSE for details. @@ -7,271 +7,429 @@ #pragma once +#ifdef __cplusplus +extern "C" { +#endif + #include +#include /* - * Define HASHMAP_METRICS to compile in performance analysis - * functions for use in assessing hash function performance. - */ -/* #define HASHMAP_METRICS */ - -/* - * Define HASHMAP_NOASSERT to compile out all assertions used internally. - */ -/* #define HASHMAP_NOASSERT */ - -/* - * Macros to declare type-specific versions of hashmap_*() functions to - * allow compile-time type checking and avoid the need for type casting. - */ -#define HASHMAP_FUNCS_DECLARE(name, key_type, data_type) \ - data_type *name##_hashmap_put(struct hashmap *map, \ - const key_type *key, data_type *data); \ - data_type *name##_hashmap_get(const struct hashmap *map, \ - const key_type *key); \ - data_type *name##_hashmap_remove(struct hashmap *map, \ - const key_type *key); \ - const key_type *name##_hashmap_iter_get_key( \ - const struct hashmap_iter *iter); \ - data_type *name##_hashmap_iter_get_data( \ - const struct hashmap_iter *iter); \ - void name##_hashmap_iter_set_data(const struct hashmap_iter *iter, \ - data_type *data); \ - int name##_hashmap_foreach(const struct hashmap *map, \ - int (*func)(const key_type *, data_type *, void *), void *arg); - -#define HASHMAP_FUNCS_CREATE(name, key_type, data_type) \ - data_type *name##_hashmap_put(struct hashmap *map, \ - const key_type *key, data_type *data) \ - { \ - return (data_type *)hashmap_put(map, (const void *)key, \ - (void *)data); \ - } \ - data_type *name##_hashmap_get(const struct hashmap *map, \ - const key_type *key) \ - { \ - return (data_type *)hashmap_get(map, (const void *)key); \ - } \ - data_type *name##_hashmap_remove(struct hashmap *map, \ - const key_type *key) \ - { \ - return (data_type *)hashmap_remove(map, (const void *)key); \ - } \ - const key_type *name##_hashmap_iter_get_key( \ - const struct hashmap_iter *iter) \ - { \ - return (const key_type *)hashmap_iter_get_key(iter); \ - } \ - data_type *name##_hashmap_iter_get_data( \ - const struct hashmap_iter *iter) \ - { \ - return (data_type *)hashmap_iter_get_data(iter); \ - } \ - void name##_hashmap_iter_set_data(const struct hashmap_iter *iter, \ - data_type *data) \ - { \ - hashmap_iter_set_data(iter, (void *)data); \ - } \ - struct __##name##_hashmap_foreach_state { \ - int (*func)(const key_type *, data_type *, void *); \ - void *arg; \ - }; \ - static inline int __##name##_hashmap_foreach_callback( \ - const void *key, void *data, void *arg) \ - { \ - struct __##name##_hashmap_foreach_state *s = \ - (struct __##name##_hashmap_foreach_state *)arg; \ - return s->func((const key_type *)key, \ - (data_type *)data, s->arg); \ - } \ - int name##_hashmap_foreach(const struct hashmap *map, \ - int (*func)(const key_type *, data_type *, void *), \ - void *arg) \ - { \ - struct __##name##_hashmap_foreach_state s = { func, arg }; \ - return hashmap_foreach(map, \ - __##name##_hashmap_foreach_callback, &s); \ - } + * INTERNAL USE ONLY: Updates an iterator structure after the current element was removed. + */ +#define __HASHMAP_ITER_RESET(iter) ({ \ + ((iter)->iter_pos = hashmap_base_iter(&(iter)->iter_map->map_base, (iter)->iter_pos)) != NULL; \ +}) +/* + * INTERNAL USE ONLY: foreach macro internals. + */ +#define __HASHMAP_CONCAT_2(x, y) x ## y +#define __HASHMAP_CONCAT(x, y) __HASHMAP_CONCAT_2(x, y) +#define __HASHMAP_MAKE_UNIQUE(prefix) __HASHMAP_CONCAT(__HASHMAP_CONCAT(prefix, __COUNTER__), _) +#define __HASHMAP_UNIQUE(unique, name) __HASHMAP_CONCAT(unique, name) +#define __HASHMAP_FOREACH(x, key, data, h) \ + for (HASHMAP_ITER(*(h)) __HASHMAP_UNIQUE(x, it) = hashmap_iter(h, &__HASHMAP_UNIQUE(x, it)); \ + ((key) = hashmap_iter_get_key(&__HASHMAP_UNIQUE(x, it))) && \ + ((data) = hashmap_iter_get_data(&__HASHMAP_UNIQUE(x, it))); \ + hashmap_iter_next(&__HASHMAP_UNIQUE(x, it))) +#define __HASHMAP_FOREACH_SAFE(x, key, data, h, temp_ptr) \ + for (HASHMAP_ITER(*(h)) __HASHMAP_UNIQUE(x, it) = hashmap_iter(h, &__HASHMAP_UNIQUE(x, it)); \ + ((temp_ptr) = (void *)((key) = hashmap_iter_get_key(&__HASHMAP_UNIQUE(x, it)))) && \ + ((data) = hashmap_iter_get_data(&__HASHMAP_UNIQUE(x, it))); \ + ((temp_ptr) == (void *)hashmap_iter_get_key(&__HASHMAP_UNIQUE(x, it))) ? \ + hashmap_iter_next(&__HASHMAP_UNIQUE(x, it)) : __HASHMAP_ITER_RESET(&__HASHMAP_UNIQUE(x, it))) +#define __HASHMAP_FOREACH_KEY(x, key, h) \ + for (HASHMAP_ITER(*(h)) __HASHMAP_UNIQUE(x, it) = hashmap_iter(h, &__HASHMAP_UNIQUE(x, it)); \ + (key = hashmap_iter_get_key(&__HASHMAP_UNIQUE(x, it))); \ + hashmap_iter_next(&__HASHMAP_UNIQUE(x, it))) +#define __HASHMAP_FOREACH_KEY_SAFE(x, key, h, temp_ptr) \ + for (HASHMAP_ITER(*(h)) __HASHMAP_UNIQUE(x, it) = hashmap_iter(h, &__HASHMAP_UNIQUE(x, it)); \ + ((temp_ptr) = (void *)((key) = hashmap_iter_get_key(&__HASHMAP_UNIQUE(x, it)))); \ + ((temp_ptr) == (void *)hashmap_iter_get_key(&__HASHMAP_UNIQUE(x, it))) ? \ + hashmap_iter_next(&__HASHMAP_UNIQUE(x, it)) : __HASHMAP_ITER_RESET(&__HASHMAP_UNIQUE(x, it))) +#define __HASHMAP_FOREACH_DATA(x, data, h) \ + for (HASHMAP_ITER(*(h)) __HASHMAP_UNIQUE(x, it) = hashmap_iter(h, &__HASHMAP_UNIQUE(x, it)); \ + (data = hashmap_iter_get_data(&__HASHMAP_UNIQUE(x, it))); \ + hashmap_iter_next(&__HASHMAP_UNIQUE(x, it))) +#define __HASHMAP_FOREACH_DATA_SAFE(x, data, h, temp_ptr) \ + for (HASHMAP_ITER(*(h)) __HASHMAP_UNIQUE(x, it) = hashmap_iter(h, &__HASHMAP_UNIQUE(x, it)); \ + ((temp_ptr) = (void *)hashmap_iter_get_key(&__HASHMAP_UNIQUE(x, it))) && \ + ((data) = hashmap_iter_get_data(&__HASHMAP_UNIQUE(x, it))); \ + ((temp_ptr) == (void *)hashmap_iter_get_key(&__HASHMAP_UNIQUE(x, it))) ? \ + hashmap_iter_next(&__HASHMAP_UNIQUE(x, it)) : __HASHMAP_ITER_RESET(&__HASHMAP_UNIQUE(x, it))) -struct hashmap_iter; -struct hashmap_entry; /* - * The hashmap state structure. + * Template macro to define a type-specific hashmap. + * + * Example declarations: + * HASHMAP(int, struct foo) map1; + * // key_type: const int * + * // data_type: struct foo * + * + * HASHMAP(char, char) map2; + * // key_type: const char * + * // data_type: char * + */ +#define HASHMAP(key_type, data_type) \ + struct { \ + struct hashmap_base map_base; \ + struct { \ + const key_type *t_key; \ + data_type *t_data; \ + size_t (*t_hash_func)(const key_type *); \ + int (*t_compare_func)(const key_type *, const key_type *); \ + key_type *(*t_key_dup_func)(const key_type *); \ + void (*t_key_free_func)(key_type *); \ + int (*t_foreach_func)(const key_type *, data_type *, void *); \ + } map_types[0]; \ + } + +/* + * Template macro to define a hashmap iterator. + * + * Example declarations: + * HASHMAP_ITER(my_hashmap) iter; */ -struct hashmap { - size_t table_size_init; - size_t table_size; - size_t num_entries; - struct hashmap_entry *table; - size_t (*hash)(const void *); - int (*key_compare)(const void *, const void *); - void *(*key_alloc)(const void *); - void (*key_free)(void *); -}; +#define HASHMAP_ITER(hashmap_type) \ + struct { \ + typeof(hashmap_type) *iter_map; \ + struct hashmap_entry *iter_pos; \ + } + /* * Initialize an empty hashmap. * - * hash_func should return an even distribution of numbers between 0 - * and SIZE_MAX varying on the key provided. If set to NULL, the default - * case-sensitive string hash function is used: hashmap_hash_string + * Parameters: + * HASHMAP(, ) *h - hashmap pointer + * size_t (*hash_func)(const *) - hash function that should return an + * even distribution of numbers between 0 and SIZE_MAX varying on the key provided. + * int (*compare_func)(const *, const *) - key comparison function that + * should return 0 if the keys match, and non-zero otherwise. * - * key_compare_func should return 0 if the keys match, and non-zero otherwise. - * If set to NULL, the default case-sensitive string comparator function is - * used: hashmap_compare_string + * This library provides some basic hash functions: + * size_t hashmap_hash_default(const void *data, size_t len) + * size_t hashmap_hash_string(const char *key) + * size_t hashmap_hash_string_i(const char *key) + */ +#define hashmap_init(h, hash_func, compare_func) do { \ + typeof((h)->map_types->t_hash_func) __map_hash = (hash_func); \ + typeof((h)->map_types->t_compare_func) __map_compare = (compare_func); \ + hashmap_base_init(&(h)->map_base, (size_t (*)(const void *))__map_hash, (int (*)(const void *, const void *))__map_compare); \ +} while (0) + +/* + * Free the hashmap and all associated memory. * - * initial_size is optional, and may be set to the max number of entries - * expected to be put in the hash table. This is used as a hint to - * pre-allocate the hash table to the minimum size needed to avoid - * gratuitous rehashes. If initial_size is 0, a default size will be used. + * Parameters: + * HASHMAP(, ) *h - hashmap pointer + */ +#define hashmap_cleanup(h) \ + hashmap_base_cleanup(&(h)->map_base) + +/* + * Enable internal memory allocation and management for hash keys. * - * Returns 0 on success and -errno on failure. + * Parameters: + * HASHMAP(, ) *h - hashmap pointer + * *(*key_dup_func)(const *) - allocate a copy of the key to be + * managed internally by the hashmap. + * void (*key_free_func)( *) - free resources associated with a key */ -int hashmap_init(struct hashmap *map, size_t (*hash_func)(const void *), - int (*key_compare_func)(const void *, const void *), - size_t initial_size); +#define hashmap_set_key_alloc_funcs(h, key_dup_func, key_free_func) do { \ + typeof((h)->map_types->t_key_dup_func) __map_key_dup = (key_dup_func); \ + typeof((h)->map_types->t_key_free_func) __map_key_free = (key_free_func); \ + hashmap_base_set_key_alloc_funcs(&(h)->map_base, (void *(*)(const void *))__map_key_dup, (void(*)(void *))__map_key_free); \ +} while (0) /* - * Free the hashmap and all associated memory. + * Return the number of entries in the hash map. + * + * Parameters: + * const HASHMAP(, ) *h - hashmap pointer */ -void hashmap_destroy(struct hashmap *map); +#define hashmap_size(h) \ + ((const typeof((h)->map_base.size))(h)->map_base.size) /* - * Enable internal memory allocation and management of hash keys. + * Set the hashmap's initial allocation size such that no rehashes are + * required to fit the specified number of entries. + * + * Parameters: + * HASHMAP(, ) *h - hashmap pointer + * size_t capacity - number of entries. + * + * Returns 0 on success, or -errno on failure. */ -void hashmap_set_key_alloc_funcs(struct hashmap *map, - void *(*key_alloc_func)(const void *), - void (*key_free_func)(void *)); +#define hashmap_reserve(h, capacity) \ + hashmap_base_reserve(&(h)->map_base, capacity) /* - * Add an entry to the hashmap. If an entry with a matching key already - * exists and has a data pointer associated with it, the existing data - * pointer is returned, instead of assigning the new value. Compare - * the return value with the data passed in to determine if a new entry was - * created. Returns NULL if memory allocation failed. + * Add a new entry to the hashmap. If an entry with a matching key + * already exists -EEXIST is returned. + * + * Parameters: + * HASHMAP(, ) *h - hashmap pointer + * *key - pointer to the entry's key + * *data - pointer to the entry's data + * + * Returns 0 on success, or -errno on failure. */ -void *hashmap_put(struct hashmap *map, const void *key, void *data); +#define hashmap_put(h, key, data) ({ \ + typeof((h)->map_types->t_key) __map_key = (key); \ + typeof((h)->map_types->t_data) __map_data = (data); \ + hashmap_base_put(&(h)->map_base, (const void *)__map_key, (void *)__map_data); \ +}) /* + * Do a constant-time lookup of a hashmap entry. + * + * Parameters: + * HASHMAP(, ) *h - hashmap pointer + * *key - pointer to the key to lookup + * * Return the data pointer, or NULL if no entry exists. */ -void *hashmap_get(const struct hashmap *map, const void *key); +#define hashmap_get(h, key) ({ \ + typeof((h)->map_types->t_key) __map_key = (key); \ + (typeof((h)->map_types->t_data))hashmap_base_get(&(h)->map_base, (const void *)__map_key); \ +}) /* * Remove an entry with the specified key from the map. + * + * Parameters: + * HASHMAP(, ) *h - hashmap pointer + * *key - pointer to the key to remove + * * Returns the data pointer, or NULL, if no entry was found. + * + * Note: it is not safe to call this function while iterating, unless + * the "safe" variant of the foreach macro is used, and only the current + * key is removed. */ -void *hashmap_remove(struct hashmap *map, const void *key); +#define hashmap_remove(h, key) ({ \ + typeof((h)->map_types->t_key) __map_key = (key); \ + (typeof((h)->map_types->t_data))hashmap_base_remove(&(h)->map_base, (const void *)__map_key); \ +}) /* * Remove all entries. + * + * Parameters: + * HASHMAP(, ) *h - hashmap pointer */ -void hashmap_clear(struct hashmap *map); +#define hashmap_clear(h) \ + hashmap_base_clear(&(h)->map_base) /* * Remove all entries and reset the hash table to its initial size. + * + * Parameters: + * HASHMAP(, ) *h - hashmap pointer */ -void hashmap_reset(struct hashmap *map); +#define hashmap_reset(h) \ + hashmap_base_reset(&(h)->map_base) /* - * Return the number of entries in the hash map. + * Initialize an iterator for this hashmap. The iterator is a type-specific + * structure that may be declared using the HASHMAP_ITER() macro. + * + * Parameters: + * HASHMAP(, ) *h - hashmap pointer + * HASHMAP_ITER() *iter - pointer to the iterator to initialize */ -size_t hashmap_size(const struct hashmap *map); +#define hashmap_iter(h, iter) ({ \ + *(iter) = (typeof(*(iter))){ (h), hashmap_base_iter(&(h)->map_base, NULL) }; \ +}) /* - * Get a new hashmap iterator. The iterator is an opaque - * pointer that may be used with hashmap_iter_*() functions. - * Hashmap iterators are INVALID after a put or remove operation is performed. - * hashmap_iter_remove() allows safe removal during iteration. + * Return true if an iterator is valid and safe to use. + * + * Parameters: + * HASHMAP_ITER() *iter - iterator pointer */ -struct hashmap_iter *hashmap_iter(const struct hashmap *map); +#define hashmap_iter_valid(iter) \ + hashmap_base_iter_valid(&(iter)->iter_map->map_base, (iter)->iter_pos) /* - * Return an iterator to the next hashmap entry. Returns NULL if there are - * no more entries. + * Advance an iterator to the next hashmap entry. + * + * Parameters: + * HASHMAP_ITER() *iter - iterator pointer + * + * Returns true if the iterator is valid after the operation. */ -struct hashmap_iter *hashmap_iter_next(const struct hashmap *map, - const struct hashmap_iter *iter); +#define hashmap_iter_next(iter) \ + hashmap_base_iter_next(&(iter)->iter_map->map_base, &(iter)->iter_pos) /* - * Remove the hashmap entry pointed to by this iterator and returns an - * iterator to the next entry. Returns NULL if there are no more entries. + * Remove the hashmap entry pointed to by this iterator and advance the + * iterator to the next entry. + * + * Parameters: + * HASHMAP_ITER() *iter - iterator pointer + * + * Returns true if the iterator is valid after the operation. */ -struct hashmap_iter *hashmap_iter_remove(struct hashmap *map, - const struct hashmap_iter *iter); +#define hashmap_iter_remove(iter) \ + hashmap_base_iter_remove(&(iter)->iter_map->map_base, &(iter)->iter_pos) /* * Return the key of the entry pointed to by the iterator. + * + * Parameters: + * HASHMAP_ITER() *iter - iterator pointer */ -const void *hashmap_iter_get_key(const struct hashmap_iter *iter); +#define hashmap_iter_get_key(iter) \ + ((typeof((iter)->iter_map->map_types->t_key))hashmap_base_iter_get_key((iter)->iter_pos)) /* * Return the data of the entry pointed to by the iterator. + * + * Parameters: + * HASHMAP_ITER() *iter - iterator pointer */ -void *hashmap_iter_get_data(const struct hashmap_iter *iter); +#define hashmap_iter_get_data(iter) \ + ((typeof((iter)->iter_map->map_types->t_data))hashmap_base_iter_get_data((iter)->iter_pos)) /* * Set the data pointer of the entry pointed to by the iterator. + * + * Parameters: + * HASHMAP_ITER() *iter - iterator pointer + * *data - new data pointer */ -void hashmap_iter_set_data(const struct hashmap_iter *iter, void *data); +#define hashmap_iter_set_data(iter, data) ({ \ + (typeof((iter)->iter_map->map_types->t_data)) __map_data = (data); \ + hashmap_base_iter_set_data((iter)->iter_pos), (void *)__map_data); \ +}) /* - * Invoke func for each entry in the hashmap. Unlike the hashmap_iter_*() - * interface, this function supports calls to hashmap_remove() during iteration. - * However, it is an error to put or remove an entry other than the current one, - * and doing so will immediately halt iteration and return an error. - * Iteration is stopped if func returns non-zero. Returns func's return - * value if it is < 0, otherwise, 0. + * Convenience macro to iterate through the contents of a hashmap. + * key and data are assigned pointers to the current hashmap entry. + * It is NOT safe to modify the hashmap while iterating. + * + * Parameters: + * const *key - key pointer assigned on each iteration + * *data - data pointer assigned on each iteration + * HASHMAP(, ) *h - hashmap pointer */ -int hashmap_foreach(const struct hashmap *map, - int (*func)(const void *, void *, void *), void *arg); +#define hashmap_foreach(key, data, h) \ + __HASHMAP_FOREACH(__HASHMAP_MAKE_UNIQUE(__map), (key), (data), (h)) /* - * Default hash function for string keys. - * This is an implementation of the well-documented Jenkins one-at-a-time - * hash function. + * Convenience macro to iterate through the contents of a hashmap. + * key and data are assigned pointers to the current hashmap entry. + * Unlike hashmap_foreach(), it is safe to call hashmap_remove() on the + * current entry. + * + * Parameters: + * const *key - key pointer assigned on each iteration + * *data - data pointer assigned on each iteration + * HASHMAP(, ) *h - hashmap pointer + * void *temp_ptr - opaque pointer assigned on each iteration */ -size_t hashmap_hash_string(const void *key); +#define hashmap_foreach_safe(key, data, h, temp_ptr) \ + __HASHMAP_FOREACH_SAFE(__HASHMAP_MAKE_UNIQUE(__map), (key), (data), (h), (temp_ptr)) /* - * Default key comparator function for string keys. + * Convenience macro to iterate through the keys of a hashmap. + * key is assigned a pointer to the current hashmap entry. + * It is NOT safe to modify the hashmap while iterating. + * + * Parameters: + * const *key - key pointer assigned on each iteration + * HASHMAP(, ) *h - hashmap pointer */ -int hashmap_compare_string(const void *a, const void *b); +#define hashmap_foreach_key(key, h) \ + __HASHMAP_FOREACH_KEY(__HASHMAP_MAKE_UNIQUE(__map), (key), (h)) /* - * Default key allocation function for string keys. Use free() for the - * key_free_func. + * Convenience macro to iterate through the keys of a hashmap. + * key is assigned a pointer to the current hashmap entry. + * Unlike hashmap_foreach_key(), it is safe to call hashmap_remove() on the + * current entry. + * + * Parameters: + * const *key - key pointer assigned on each iteration + * HASHMAP(, ) *h - hashmap pointer + * void *temp_ptr - opaque pointer assigned on each iteration */ -void *hashmap_alloc_key_string(const void *key); +#define hashmap_foreach_key_safe(key, h, temp_ptr) \ + __HASHMAP_FOREACH_KEY_SAFE(__HASHMAP_MAKE_UNIQUE(__map), (key), (h), (temp_ptr)) /* - * Case insensitive hash function for string keys. + * Convenience macro to iterate through the data of a hashmap. + * data is assigned a pointer to the current hashmap entry. + * It is NOT safe to modify the hashmap while iterating. + * + * Parameters: + * *data - data pointer assigned on each iteration + * HASHMAP(, ) *h - hashmap pointer */ -size_t hashmap_hash_string_i(const void *key); +#define hashmap_foreach_data(data, h) \ + __HASHMAP_FOREACH_DATA(__HASHMAP_MAKE_UNIQUE(__map), (data), (h)) /* - * Case insensitive key comparator function for string keys. + * Convenience macro to iterate through the data of a hashmap. + * data is assigned a pointer to the current hashmap entry. + * Unlike hashmap_foreach_data(), it is safe to call hashmap_remove() on the + * current entry. + * + * Parameters: + * *data - data pointer assigned on each iteration + * HASHMAP(, ) *h - hashmap pointer + * void *temp_ptr - opaque pointer assigned on each iteration */ -int hashmap_compare_string_i(const void *a, const void *b); - +#define hashmap_foreach_data_safe(data, h, temp_ptr) \ + __HASHMAP_FOREACH_DATA_SAFE(__HASHMAP_MAKE_UNIQUE(__map), (data), (h), (temp_ptr)) -#ifdef HASHMAP_METRICS /* * Return the load factor. + * + * Parameters: + * HASHMAP(, ) *h - hashmap pointer */ -double hashmap_load_factor(const struct hashmap *map); +#define hashmap_load_factor(h) \ + hashmap_base_load_factor(&(h)->map_base) + +/* + * Return the number of collisions for this key. + * This would always be 0 if a perfect hash function was used, but in ordinary + * usage, there may be a few collisions, depending on the hash function and + * load factor. + * + * Parameters: + * HASHMAP(, ) *h - hashmap pointer + * *key - pointer to the entry's key + */ +#define hashmap_collisions(h, key) ({ \ + typeof((h)->map_types->t_key) __map_key = (key); \ + hashmap_base_collisions_mean(&(h)->map_base, (const void *)__map_key); \ +}) /* * Return the average number of collisions per entry. + * + * Parameters: + * HASHMAP(, ) *h - hashmap pointer */ -double hashmap_collisions_mean(const struct hashmap *map); +#define hashmap_collisions_mean(h) \ + hashmap_base_collisions_mean(&(h)->map_base) /* - * Return the variance between entry collisions. The higher the variance, + * Return the variance between entry collisions. The higher the variance, * the more likely the hash function is poor and is resulting in clustering. + * + * Parameters: + * HASHMAP(, ) *h - hashmap pointer */ -double hashmap_collisions_variance(const struct hashmap *map); -#endif +#define hashmap_collisions_variance(h) \ + hashmap_base_collisions_variance(&(h)->map_base) +#ifdef __cplusplus +} +#endif diff --git a/include/hashmap_base.h b/include/hashmap_base.h new file mode 100644 index 0000000..cdf1aac --- /dev/null +++ b/include/hashmap_base.h @@ -0,0 +1,56 @@ +/* + * Copyright (c) 2016-2020 David Leeds + * + * Hashmap is free software; you can redistribute it and/or modify + * it under the terms of the MIT license. See LICENSE for details. + */ + +#pragma once + +struct hashmap_entry; + +struct hashmap_base { + size_t table_size_init; + size_t table_size; + size_t size; + struct hashmap_entry *table; + size_t (*hash)(const void *); + int (*compare)(const void *, const void *); + void *(*key_dup)(const void *); + void (*key_free)(void *); +}; + +void hashmap_base_init(struct hashmap_base *hb, + size_t (*hash_func)(const void *), int (*compare_func)(const void *, const void *)); +void hashmap_base_cleanup(struct hashmap_base *hb); + +void hashmap_base_set_key_alloc_funcs(struct hashmap_base *hb, + void *(*key_dup_func)(const void *), void (*key_free_func)(void *)); + +int hashmap_base_reserve(struct hashmap_base *hb, size_t capacity); + +int hashmap_base_put(struct hashmap_base *hb, const void *key, void *data); +void *hashmap_base_get(const struct hashmap_base *hb, const void *key); +void *hashmap_base_remove(struct hashmap_base *hb, const void *key); + +void hashmap_base_clear(struct hashmap_base *hb); +void hashmap_base_reset(struct hashmap_base *hb); + +struct hashmap_entry *hashmap_base_iter(const struct hashmap_base *hb, + const struct hashmap_entry *pos); +bool hashmap_base_iter_valid(const struct hashmap_base *hb, const struct hashmap_entry *iter); +bool hashmap_base_iter_next(const struct hashmap_base *hb, struct hashmap_entry **iter); +bool hashmap_base_iter_remove(struct hashmap_base *hb, struct hashmap_entry **iter); +const void *hashmap_base_iter_get_key(const struct hashmap_entry *iter); +void *hashmap_base_iter_get_data(const struct hashmap_entry *iter); +int hashmap_base_iter_set_data(struct hashmap_entry *iter, void *data); + +double hashmap_base_load_factor(const struct hashmap_base *hb); +size_t hashmap_base_collisions(const struct hashmap_base *hb, const void *key); +double hashmap_base_collisions_mean(const struct hashmap_base *hb); +double hashmap_base_collisions_variance(const struct hashmap_base *hb); + +size_t hashmap_hash_default(const void *data, size_t len); +size_t hashmap_hash_string(const char *key); +size_t hashmap_hash_string_i(const char *key); + diff --git a/src/hashmap.c b/src/hashmap.c index dfb7912..bc38ebc 100644 --- a/src/hashmap.c +++ b/src/hashmap.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2016-2018 David Leeds + * Copyright (c) 2016-2020 David Leeds * * Hashmap is free software; you can redistribute it and/or modify * it under the terms of the MIT license. See LICENSE for details. @@ -10,87 +10,78 @@ #include #include #include +#include #include -#include +#include -#ifndef HASHMAP_NOASSERT -#include -#define HASHMAP_ASSERT(expr) assert(expr) -#else -#define HASHMAP_ASSERT(expr) -#endif /* Table sizes must be powers of 2 */ -#define HASHMAP_SIZE_MIN (1 << 5) /* 32 */ -#define HASHMAP_SIZE_DEFAULT (1 << 8) /* 256 */ +#define HASHMAP_SIZE_MIN 32 +#define HASHMAP_SIZE_DEFAULT 128 #define HASHMAP_SIZE_MOD(map, val) ((val) & ((map)->table_size - 1)) -/* Limit for probing is 1/2 of table_size */ -#define HASHMAP_PROBE_LEN(map) ((map)->table_size >> 1) /* Return the next linear probe index */ #define HASHMAP_PROBE_NEXT(map, index) HASHMAP_SIZE_MOD(map, (index) + 1) -/* Check if index b is less than or equal to index a */ -#define HASHMAP_INDEX_LE(map, a, b) \ - ((a) == (b) || (((b) - (a)) & ((map)->table_size >> 1)) != 0) - struct hashmap_entry { void *key; void *data; -#ifdef HASHMAP_METRICS - size_t num_collisions; -#endif }; -/* - * Enforce a maximum 0.75 load factor. - */ -static inline size_t hashmap_table_min_size_calc(size_t num_entries) -{ - return num_entries + (num_entries / 3); -} - /* * Calculate the optimal table size, given the specified max number * of elements. */ -static size_t hashmap_table_size_calc(size_t num_entries) +static inline size_t hashmap_calc_table_size(const struct hashmap_base *hb, size_t size) { size_t table_size; - size_t min_size; - table_size = hashmap_table_min_size_calc(num_entries); + /* Enforce a maximum 0.75 load factor */ + table_size = size + (size / 3); - /* Table size is always a power of 2 */ - min_size = HASHMAP_SIZE_MIN; - while (min_size < table_size) { - min_size <<= 1; + /* Ensure capacity is not lower than the hashmap initial size */ + if (table_size < hb->table_size_init) { + table_size = hb->table_size_init; + } else { + /* Round table size up to nearest power of 2 */ + table_size = 1 << ((sizeof(unsigned long) << 3) - __builtin_clzl(table_size - 1)); } - return min_size; + + return table_size; } /* * Get a valid hash table index from a key. */ -static inline size_t hashmap_calc_index(const struct hashmap *map, - const void *key) +static inline size_t hashmap_calc_index(const struct hashmap_base *hb, const void *key) { - return HASHMAP_SIZE_MOD(map, map->hash(key)); + size_t index = hb->hash(key); + + /* + * Run a secondary hash on the index. This is a small performance hit, but + * reduces clustering and provides more consistent performance if a poor + * hash function is used. + */ + index = hashmap_hash_default(&index, sizeof(index)); + + return HASHMAP_SIZE_MOD(hb, index); } /* * Return the next populated entry, starting with the specified one. * Returns NULL if there are no more valid entries. */ -static struct hashmap_entry *hashmap_entry_get_populated( - const struct hashmap *map, struct hashmap_entry *entry) +static struct hashmap_entry *hashmap_entry_get_populated(const struct hashmap_base *hb, + const struct hashmap_entry *entry) { - for (; entry < &map->table[map->table_size]; ++entry) { - if (entry->key) { - return entry; + if (hb->size > 0) { + for (; entry < &hb->table[hb->table_size]; ++entry) { + if (entry->key) { + return (struct hashmap_entry *)entry; + } } } return NULL; @@ -100,79 +91,68 @@ static struct hashmap_entry *hashmap_entry_get_populated( * Find the hashmap entry with the specified key, or an empty slot. * Returns NULL if the entire table has been searched without finding a match. */ -static struct hashmap_entry *hashmap_entry_find(const struct hashmap *map, +static struct hashmap_entry *hashmap_entry_find(const struct hashmap_base *hb, const void *key, bool find_empty) { size_t i; size_t index; - size_t probe_len = HASHMAP_PROBE_LEN(map); struct hashmap_entry *entry; - index = hashmap_calc_index(map, key); + index = hashmap_calc_index(hb, key); /* Linear probing */ - for (i = 0; i < probe_len; ++i) { - entry = &map->table[index]; + for (i = 0; i < hb->table_size; ++i) { + entry = &hb->table[index]; if (!entry->key) { if (find_empty) { -#ifdef HASHMAP_METRICS - entry->num_collisions = i; -#endif return entry; } return NULL; } - if (map->key_compare(key, entry->key) == 0) { + if (hb->compare(key, entry->key) == 0) { return entry; } - index = HASHMAP_PROBE_NEXT(map, index); + index = HASHMAP_PROBE_NEXT(hb, index); } return NULL; } /* - * Removes the specified entry and processes the proceeding entries to reduce - * the load factor and keep the chain continuous. This is a required - * step for hash maps using linear probing. + * Removes the specified entry and processes the following entries to + * keep the chain contiguous. This is a required step for hash maps + * using linear probing. */ -static void hashmap_entry_remove(struct hashmap *map, - struct hashmap_entry *removed_entry) +static void hashmap_entry_remove(struct hashmap_base *hb, struct hashmap_entry *removed_entry) { size_t i; -#ifdef HASHMAP_METRICS - size_t removed_i = 0; -#endif size_t index; size_t entry_index; - size_t removed_index = (removed_entry - map->table); + size_t removed_index = (removed_entry - hb->table); struct hashmap_entry *entry; /* Free the key */ - if (map->key_free) { - map->key_free(removed_entry->key); + if (hb->key_free) { + hb->key_free(removed_entry->key); } - --map->num_entries; + --hb->size; /* Fill the free slot in the chain */ - index = HASHMAP_PROBE_NEXT(map, removed_index); - for (i = 1; i < map->table_size; ++i) { - entry = &map->table[index]; + index = HASHMAP_PROBE_NEXT(hb, removed_index); + for (i = 0; i < hb->size; ++i) { + entry = &hb->table[index]; if (!entry->key) { /* Reached end of chain */ break; } - entry_index = hashmap_calc_index(map, entry->key); - /* Shift in entries with an index <= to the removed slot */ - if (HASHMAP_INDEX_LE(map, removed_index, entry_index)) { -#ifdef HASHMAP_METRICS - entry->num_collisions -= (i - removed_i); - removed_i = i; -#endif - memcpy(removed_entry, entry, sizeof(*removed_entry)); + entry_index = hashmap_calc_index(hb, entry->key); + /* Shift in entries in the chain with an index at or before the removed slot */ + if (HASHMAP_SIZE_MOD(hb, index - entry_index) > + HASHMAP_SIZE_MOD(hb, removed_index - entry_index)) { + *removed_entry = *entry; removed_index = index; removed_entry = entry; } - index = HASHMAP_PROBE_NEXT(map, index); + index = HASHMAP_PROBE_NEXT(hb, index); } /* Clear the last removed entry */ memset(removed_entry, 0, sizeof(*removed_entry)); @@ -183,7 +163,7 @@ static void hashmap_entry_remove(struct hashmap *map, * new_size MUST be a power of 2. * Returns 0 on success and -errno on allocation or hash function failure. */ -static int hashmap_rehash(struct hashmap *map, size_t new_size) +static int hashmap_rehash(struct hashmap_base *hb, size_t table_size) { size_t old_size; struct hashmap_entry *old_table; @@ -191,59 +171,48 @@ static int hashmap_rehash(struct hashmap *map, size_t new_size) struct hashmap_entry *entry; struct hashmap_entry *new_entry; - HASHMAP_ASSERT(new_size >= HASHMAP_SIZE_MIN); - HASHMAP_ASSERT((new_size & (new_size - 1)) == 0); + assert((table_size & (table_size - 1)) == 0); + assert(table_size >= hb->size); - new_table = (struct hashmap_entry *)calloc(new_size, - sizeof(struct hashmap_entry)); + new_table = (struct hashmap_entry *)calloc(table_size, sizeof(struct hashmap_entry)); if (!new_table) { return -ENOMEM; } - /* Backup old elements in case of rehash failure */ - old_size = map->table_size; - old_table = map->table; - map->table_size = new_size; - map->table = new_table; + old_size = hb->table_size; + old_table = hb->table; + hb->table_size = table_size; + hb->table = new_table; + /* Rehash */ for (entry = old_table; entry < &old_table[old_size]; ++entry) { - if (!entry->data) { - /* Only copy entries with data */ + if (!entry->key) { continue; } - new_entry = hashmap_entry_find(map, entry->key, true); - if (!new_entry) { - /* - * The load factor is too high with the new table - * size, or a poor hash function was used. - */ - goto revert; - } - /* Shallow copy (intentionally omits num_collisions) */ - new_entry->key = entry->key; - new_entry->data = entry->data; + new_entry = hashmap_entry_find(hb, entry->key, true); + /* Failure indicates an algorithm bug */ + assert(new_entry != NULL); + + /* Shallow copy */ + *new_entry = *entry; } free(old_table); return 0; -revert: - map->table_size = old_size; - map->table = old_table; - free(new_table); - return -EINVAL; } /* * Iterate through all entries and free all keys. */ -static void hashmap_free_keys(struct hashmap *map) +static void hashmap_free_keys(struct hashmap_base *hb) { - struct hashmap_iter *iter; + struct hashmap_entry *entry; - if (!map->key_free) { + if (!hb->key_free || hb->size == 0) { return; } - for (iter = hashmap_iter(map); iter; - iter = hashmap_iter_next(map, iter)) { - map->key_free((void *)hashmap_iter_get_key(iter)); + for (entry = hb->table; entry < &hb->table[hb->table_size]; ++entry) { + if (entry->key) { + hb->key_free(entry->key); + } } } @@ -251,138 +220,139 @@ static void hashmap_free_keys(struct hashmap *map) * Initialize an empty hashmap. * * hash_func should return an even distribution of numbers between 0 - * and SIZE_MAX varying on the key provided. If set to NULL, the default - * case-sensitive string hash function is used: hashmap_hash_string + * and SIZE_MAX varying on the key provided. * - * key_compare_func should return 0 if the keys match, and non-zero otherwise. - * If set to NULL, the default case-sensitive string comparator function is - * used: hashmap_compare_string - * - * initial_size is optional, and may be set to the max number of entries - * expected to be put in the hash table. This is used as a hint to - * pre-allocate the hash table to the minimum size needed to avoid - * gratuitous rehashes. If initial_size is 0, a default size will be used. - * - * Returns 0 on success and -errno on failure. + * compare_func should return 0 if the keys match, and non-zero otherwise. */ -int hashmap_init(struct hashmap *map, size_t (*hash_func)(const void *), - int (*key_compare_func)(const void *, const void *), - size_t initial_size) +void hashmap_base_init(struct hashmap_base *hb, + size_t (*hash_func)(const void *), int (*compare_func)(const void *, const void *)) { - HASHMAP_ASSERT(map != NULL); + assert(hash_func != NULL); + assert(compare_func != NULL); - if (!initial_size) { - initial_size = HASHMAP_SIZE_DEFAULT; - } else { - /* Convert init size to valid table size */ - initial_size = hashmap_table_size_calc(initial_size); - } - map->table_size_init = initial_size; - map->table_size = initial_size; - map->num_entries = 0; - map->table = (struct hashmap_entry *)calloc(initial_size, - sizeof(struct hashmap_entry)); - if (!map->table) { - return -ENOMEM; - } - map->hash = hash_func ? - hash_func : hashmap_hash_string; - map->key_compare = key_compare_func ? - key_compare_func : hashmap_compare_string; - map->key_alloc = NULL; - map->key_free = NULL; - return 0; + memset(hb, 0, sizeof(*hb)); + + hb->table_size_init = HASHMAP_SIZE_DEFAULT; + hb->hash = hash_func; + hb->compare = compare_func; } /* * Free the hashmap and all associated memory. */ -void hashmap_destroy(struct hashmap *map) +void hashmap_base_cleanup(struct hashmap_base *hb) { - if (!map) { + if (!hb) { return; } - hashmap_free_keys(map); - free(map->table); - memset(map, 0, sizeof(*map)); + hashmap_free_keys(hb); + free(hb->table); + memset(hb, 0, sizeof(*hb)); } /* * Enable internal memory management of hash keys. */ -void hashmap_set_key_alloc_funcs(struct hashmap *map, - void *(*key_alloc_func)(const void *), +void hashmap_base_set_key_alloc_funcs(struct hashmap_base *hb, + void *(*key_dup_func)(const void *), void (*key_free_func)(void *)) { - HASHMAP_ASSERT(map != NULL); + hb->key_dup = key_dup_func; + hb->key_free = key_free_func; +} + +/* + * Set the hashmap's initial allocation size such that no rehashes are + * required to fit the specified number of entries. + * Returns 0 on success, or -errno on failure. + */ +int hashmap_base_reserve(struct hashmap_base *hb, size_t capacity) +{ + size_t old_size_init; + int r = 0; - map->key_alloc = key_alloc_func; - map->key_free = key_free_func; + /* Backup original init size in case of failure */ + old_size_init = hb->table_size_init; + + /* Set the minimal table init size to support the specified capacity */ + hb->table_size_init = HASHMAP_SIZE_MIN; + hb->table_size_init = hashmap_calc_table_size(hb, capacity); + + if (hb->table_size_init > hb->table_size) { + r = hashmap_rehash(hb, hb->table_size_init); + if (r < 0) { + hb->table_size_init = old_size_init; + } + } + return r; } /* - * Add an entry to the hashmap. If an entry with a matching key already - * exists and has a data pointer associated with it, the existing data - * pointer is returned, instead of assigning the new value. Compare - * the return value with the data passed in to determine if a new entry was - * created. Returns NULL if memory allocation failed. + * Add a new entry to the hashmap. If an entry with a matching key + * already exists -EEXIST is returned. + * Returns 0 on success, or -errno on failure. */ -void *hashmap_put(struct hashmap *map, const void *key, void *data) +int hashmap_base_put(struct hashmap_base *hb, const void *key, void *data) { struct hashmap_entry *entry; + size_t table_size; + int r = 0; - HASHMAP_ASSERT(map != NULL); - HASHMAP_ASSERT(key != NULL); + if (!key || !data) { + return -EINVAL; + } - /* Rehash with 2x capacity if load factor is approaching 0.75 */ - if (map->table_size <= hashmap_table_min_size_calc(map->num_entries)) { - hashmap_rehash(map, map->table_size << 1); + /* Preemptively rehash with 2x capacity if load factor is approaching 0.75 */ + table_size = hashmap_calc_table_size(hb, hb->size); + if (table_size > hb->table_size) { + r = hashmap_rehash(hb, table_size); } - entry = hashmap_entry_find(map, key, true); + + /* Get the entry for this key */ + entry = hashmap_entry_find(hb, key, true); if (!entry) { /* - * Cannot find an empty slot. Either out of memory, or using - * a poor hash function. Attempt to rehash once to reduce - * chain length. + * Cannot find an empty slot. Either out of memory, + * or hash or compare functions are malfunctioning. */ - if (hashmap_rehash(map, map->table_size << 1) < 0) { - return NULL; - } - entry = hashmap_entry_find(map, key, true); - if (!entry) { - return NULL; + if (r < 0) { + /* Return rehash error, if set */ + return r; } + return -EADDRNOTAVAIL; + } + + if (entry->key) { + /* Do not overwrite existing data */ + return -EEXIST; } - if (!entry->key) { + + if (hb->key_dup) { /* Allocate copy of key to simplify memory management */ - if (map->key_alloc) { - entry->key = map->key_alloc(key); - if (!entry->key) { - return NULL; - } - } else { - entry->key = (void *)key; + entry->key = hb->key_dup(key); + if (!entry->key) { + return -ENOMEM; } - ++map->num_entries; - } else if (entry->data) { - /* Do not overwrite existing data */ - return entry->data; + } else { + entry->key = (void *)key; } entry->data = data; - return data; + ++hb->size; + return 0; } /* * Return the data pointer, or NULL if no entry exists. */ -void *hashmap_get(const struct hashmap *map, const void *key) +void *hashmap_base_get(const struct hashmap_base *hb, const void *key) { struct hashmap_entry *entry; - HASHMAP_ASSERT(map != NULL); - HASHMAP_ASSERT(key != NULL); + if (!key) { + return NULL; + } - entry = hashmap_entry_find(map, key, false); + entry = hashmap_entry_find(hb, key, false); if (!entry) { return NULL; } @@ -393,251 +363,248 @@ void *hashmap_get(const struct hashmap *map, const void *key) * Remove an entry with the specified key from the map. * Returns the data pointer, or NULL, if no entry was found. */ -void *hashmap_remove(struct hashmap *map, const void *key) +void *hashmap_base_remove(struct hashmap_base *hb, const void *key) { struct hashmap_entry *entry; void *data; - HASHMAP_ASSERT(map != NULL); - HASHMAP_ASSERT(key != NULL); + if (!key) { + return NULL; + } - entry = hashmap_entry_find(map, key, false); + entry = hashmap_entry_find(hb, key, false); if (!entry) { return NULL; } data = entry->data; /* Clear the entry and make the chain contiguous */ - hashmap_entry_remove(map, entry); + hashmap_entry_remove(hb, entry); return data; } /* * Remove all entries. */ -void hashmap_clear(struct hashmap *map) +void hashmap_base_clear(struct hashmap_base *hb) { - HASHMAP_ASSERT(map != NULL); - - hashmap_free_keys(map); - map->num_entries = 0; - memset(map->table, 0, sizeof(struct hashmap_entry) * map->table_size); + hashmap_free_keys(hb); + hb->size = 0; + memset(hb->table, 0, sizeof(struct hashmap_entry) * hb->table_size); } /* * Remove all entries and reset the hash table to its initial size. */ -void hashmap_reset(struct hashmap *map) +void hashmap_base_reset(struct hashmap_base *hb) { struct hashmap_entry *new_table; - HASHMAP_ASSERT(map != NULL); - - hashmap_clear(map); - if (map->table_size == map->table_size_init) { - return; - } - new_table = (struct hashmap_entry *)realloc(map->table, - sizeof(struct hashmap_entry) * map->table_size_init); - if (!new_table) { - return; + hashmap_free_keys(hb); + hb->size = 0; + if (hb->table_size != hb->table_size_init) { + new_table = (struct hashmap_entry *)realloc(hb->table, + sizeof(struct hashmap_entry) * hb->table_size_init); + if (new_table) { + hb->table = new_table; + hb->table_size = hb->table_size_init; + } } - map->table = new_table; - map->table_size = map->table_size_init; -} - -/* - * Return the number of entries in the hash map. - */ -size_t hashmap_size(const struct hashmap *map) -{ - HASHMAP_ASSERT(map != NULL); - - return map->num_entries; + memset(hb->table, 0, sizeof(struct hashmap_entry) * hb->table_size); } /* - * Get a new hashmap iterator. The iterator is an opaque + * Get a new hashmap iterator. The iterator is an opaque * pointer that may be used with hashmap_iter_*() functions. * Hashmap iterators are INVALID after a put or remove operation is performed. * hashmap_iter_remove() allows safe removal during iteration. */ -struct hashmap_iter *hashmap_iter(const struct hashmap *map) +struct hashmap_entry *hashmap_base_iter(const struct hashmap_base *hb, + const struct hashmap_entry *pos) { - HASHMAP_ASSERT(map != NULL); - - if (!map->num_entries) { - return NULL; + if (!pos) { + pos = hb->table; } - return (struct hashmap_iter *)hashmap_entry_get_populated(map, - map->table); + return hashmap_entry_get_populated(hb, pos); } /* - * Return an iterator to the next hashmap entry. Returns NULL if there are - * no more entries. + * Return true if an iterator is valid and safe to use. */ -struct hashmap_iter *hashmap_iter_next(const struct hashmap *map, - const struct hashmap_iter *iter) +bool hashmap_base_iter_valid(const struct hashmap_base *hb, const struct hashmap_entry *iter) { - struct hashmap_entry *entry = (struct hashmap_entry *)iter; - - HASHMAP_ASSERT(map != NULL); + return hb && iter && iter->key && iter >= hb->table && iter < &hb->table[hb->table_size]; +} - if (!iter) { - return NULL; +/* + * Advance an iterator to the next hashmap entry. + * Returns false if there are no more entries. + */ +bool hashmap_base_iter_next(const struct hashmap_base *hb, struct hashmap_entry **iter) +{ + if (!*iter) { + return false; } - return (struct hashmap_iter *)hashmap_entry_get_populated(map, - entry + 1); + return (*iter = hashmap_entry_get_populated(hb, *iter + 1)) != NULL; } /* - * Remove the hashmap entry pointed to by this iterator and return an - * iterator to the next entry. Returns NULL if there are no more entries. + * Remove the hashmap entry pointed to by this iterator and advance the + * iterator to the next entry. + * Returns true if the iterator is valid after the operation. */ -struct hashmap_iter *hashmap_iter_remove(struct hashmap *map, - const struct hashmap_iter *iter) +bool hashmap_base_iter_remove(struct hashmap_base *hb, struct hashmap_entry **iter) { - struct hashmap_entry *entry = (struct hashmap_entry *)iter; - - HASHMAP_ASSERT(map != NULL); - - if (!iter) { - return NULL; + if (!*iter) { + return false; } - if (!entry->key) { - /* Iterator is invalid, so just return the next valid entry */ - return hashmap_iter_next(map, iter); + if ((*iter)->key) { + /* Remove entry if iterator is valid */ + hashmap_entry_remove(hb, *iter); } - hashmap_entry_remove(map, entry); - return (struct hashmap_iter *)hashmap_entry_get_populated(map, entry); + return (*iter = hashmap_entry_get_populated(hb, *iter)) != NULL; } /* * Return the key of the entry pointed to by the iterator. */ -const void *hashmap_iter_get_key(const struct hashmap_iter *iter) +const void *hashmap_base_iter_get_key(const struct hashmap_entry *iter) { if (!iter) { return NULL; } - return (const void *)((struct hashmap_entry *)iter)->key; + return (const void *)iter->key; } /* * Return the data of the entry pointed to by the iterator. */ -void *hashmap_iter_get_data(const struct hashmap_iter *iter) +void *hashmap_base_iter_get_data(const struct hashmap_entry *iter) { if (!iter) { return NULL; } - return ((struct hashmap_entry *)iter)->data; + return iter->data; } /* * Set the data pointer of the entry pointed to by the iterator. */ -void hashmap_iter_set_data(const struct hashmap_iter *iter, void *data) +int hashmap_base_iter_set_data(struct hashmap_entry *iter, void *data) { if (!iter) { - return; + return -EFAULT; + } + if (!data) { + return -EINVAL; } - ((struct hashmap_entry *)iter)->data = data; + iter->data = data; + return 0; +} + +/* + * Return the load factor. + */ +double hashmap_base_load_factor(const struct hashmap_base *hb) +{ + if (!hb->table_size) { + return 0; + } + return (double)hb->size / hb->table_size; } /* - * Invoke func for each entry in the hashmap. Unlike the hashmap_iter_*() - * interface, this function supports calls to hashmap_remove() during iteration. - * However, it is an error to put or remove an entry other than the current one, - * and doing so will immediately halt iteration and return an error. - * Iteration is stopped if func returns non-zero. Returns func's return - * value if it is < 0, otherwise, 0. + * Return the number of collisions for this key. + * This would always be 0 if a perfect hash function was used, but in ordinary + * usage, there may be a few collisions, depending on the hash function and + * load factor. */ -int hashmap_foreach(const struct hashmap *map, - int (*func)(const void *, void *, void *), void *arg) +size_t hashmap_base_collisions(const struct hashmap_base *hb, const void *key) { + size_t i; + size_t index; struct hashmap_entry *entry; - size_t num_entries; - const void *key; - int rc; - HASHMAP_ASSERT(map != NULL); - HASHMAP_ASSERT(func != NULL); + if (!key) { + return 0; + } + + index = hashmap_calc_index(hb, key); - entry = map->table; - for (entry = map->table; entry < &map->table[map->table_size]; - ++entry) { + /* Linear probing */ + for (i = 0; i < hb->table_size; ++i) { + entry = &hb->table[index]; if (!entry->key) { - continue; - } - num_entries = map->num_entries; - key = entry->key; - rc = func(entry->key, entry->data, arg); - if (rc < 0) { - return rc; - } - if (rc > 0) { + /* Key does not exist */ return 0; } - /* Run this entry again if func() deleted it */ - if (entry->key != key) { - --entry; - } else if (num_entries != map->num_entries) { - /* Stop immediately if func put/removed another entry */ - return -1; + if (hb->compare(key, entry->key) == 0) { + break; } + index = HASHMAP_PROBE_NEXT(hb, index); } - return 0; + + return i; } /* - * Default hash function for string keys. - * This is an implementation of the well-documented Jenkins one-at-a-time - * hash function. + * Return the average number of collisions per entry. */ -size_t hashmap_hash_string(const void *key) +double hashmap_base_collisions_mean(const struct hashmap_base *hb) { - const char *key_str = (const char *)key; - size_t hash = 0; + struct hashmap_entry *entry; + size_t total_collisions = 0; - for (; *key_str; ++key_str) { - hash += *key_str; - hash += (hash << 10); - hash ^= (hash >> 6); + if (!hb->size) { + return 0; } - hash += (hash << 3); - hash ^= (hash >> 11); - hash += (hash << 15); - return hash; -} + for (entry = hb->table; entry < &hb->table[hb->table_size]; ++entry) { + if (!entry->key) { + continue; + } -/* - * Default key comparator function for string keys. - */ -int hashmap_compare_string(const void *a, const void *b) -{ - return strcmp((const char *)a, (const char *)b); + total_collisions += hashmap_base_collisions(hb, entry->key); + } + return (double)total_collisions / hb->size; } /* - * Default key allocation function for string keys. Use free() for the - * key_free_func. + * Return the variance between entry collisions. The higher the variance, + * the more likely the hash function is poor and is resulting in clustering. */ -void *hashmap_alloc_key_string(const void *key) +double hashmap_base_collisions_variance(const struct hashmap_base *hb) { - return (void *)strdup((const char *)key); + struct hashmap_entry *entry; + double mean_collisions; + double variance; + double total_variance = 0; + + if (!hb->size) { + return 0; + } + mean_collisions = hashmap_base_collisions_mean(hb); + for (entry = hb->table; entry < &hb->table[hb->table_size]; ++entry) { + if (!entry->key) { + continue; + } + variance = (double)hashmap_base_collisions(hb, entry->key) - mean_collisions; + total_variance += variance * variance; + } + return total_variance / hb->size; } /* - * Case insensitive hash function for string keys. + * Recommended hash function for data keys. + * + * This is an implementation of the well-documented Jenkins one-at-a-time + * hash function. See https://en.wikipedia.org/wiki/Jenkins_hash_function */ -size_t hashmap_hash_string_i(const void *key) +size_t hashmap_hash_default(const void *data, size_t len) { - const char *key_str = (const char *)key; + const uint8_t *byte = (const uint8_t *)data; size_t hash = 0; - for (; *key_str; ++key_str) { - hash += tolower(*key_str); + for (size_t i = 0; i < len; ++i) { + hash += *byte++; hash += (hash << 10); hash ^= (hash >> 6); } @@ -648,76 +615,40 @@ size_t hashmap_hash_string_i(const void *key) } /* - * Case insensitive key comparator function for string keys. - */ -int hashmap_compare_string_i(const void *a, const void *b) -{ - return strcasecmp((const char *)a, (const char *)b); -} - - -#ifdef HASHMAP_METRICS -/* - * Return the load factor. - */ -double hashmap_load_factor(const struct hashmap *map) -{ - HASHMAP_ASSERT(map != NULL); - - if (!map->table_size) { - return 0; - } - return (double)map->num_entries / map->table_size; -} - -/* - * Return the average number of collisions per entry. + * Recommended hash function for string keys. + * + * This is an implementation of the well-documented Jenkins one-at-a-time + * hash function. See https://en.wikipedia.org/wiki/Jenkins_hash_function */ -double hashmap_collisions_mean(const struct hashmap *map) +size_t hashmap_hash_string(const char *key) { - struct hashmap_entry *entry; - size_t total_collisions = 0; - - HASHMAP_ASSERT(map != NULL); + size_t hash = 0; - if (!map->num_entries) { - return 0; - } - for (entry = map->table; entry < &map->table[map->table_size]; - ++entry) { - if (!entry->key) { - continue; - } - total_collisions += entry->num_collisions; + for (; *key; ++key) { + hash += *key; + hash += (hash << 10); + hash ^= (hash >> 6); } - return (double)total_collisions / map->num_entries; + hash += (hash << 3); + hash ^= (hash >> 11); + hash += (hash << 15); + return hash; } /* - * Return the variance between entry collisions. The higher the variance, - * the more likely the hash function is poor and is resulting in clustering. + * Case insensitive hash function for string keys. */ -double hashmap_collisions_variance(const struct hashmap *map) +size_t hashmap_hash_string_i(const char *key) { - struct hashmap_entry *entry; - double mean_collisions; - double variance; - double total_variance = 0; - - HASHMAP_ASSERT(map != NULL); + size_t hash = 0; - if (!map->num_entries) { - return 0; - } - mean_collisions = hashmap_collisions_mean(map); - for (entry = map->table; entry < &map->table[map->table_size]; - ++entry) { - if (!entry->key) { - continue; - } - variance = (double)entry->num_collisions - mean_collisions; - total_variance += variance * variance; + for (; *key; ++key) { + hash += tolower(*key); + hash += (hash << 10); + hash ^= (hash >> 6); } - return total_variance / map->num_entries; + hash += (hash << 3); + hash ^= (hash >> 11); + hash += (hash << 15); + return hash; } -#endif diff --git a/test/CMakeLists.txt b/test/CMakeLists.txt index b1a1b53..6c4c125 100644 --- a/test/CMakeLists.txt +++ b/test/CMakeLists.txt @@ -1,8 +1,14 @@ cmake_minimum_required(VERSION 3.5) -add_executable(hashmap_test ../src/hashmap.c hashmap_test.c) +# Hashmap unit test +add_executable(hashmap_test hashmap_test.c) target_compile_options(hashmap_test PRIVATE $<$:-Wall -Werror>) -target_include_directories(hashmap_test PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/../include) +target_link_libraries(hashmap_test PRIVATE HashMap::HashMap) -# Build hashmap library with metrics enabled -target_compile_definitions(hashmap_test PRIVATE HASHMAP_METRICS) +# Register with CTest +add_test(NAME hashmap_test COMMAND hashmap_test) + +# Hashmap example +add_executable(hashmap_example hashmap_example.c) +target_compile_options(hashmap_example PRIVATE $<$:-Wall -Werror>) +target_link_libraries(hashmap_example PRIVATE HashMap::HashMap) \ No newline at end of file diff --git a/test/hashmap_example.c b/test/hashmap_example.c new file mode 100644 index 0000000..a46cf64 --- /dev/null +++ b/test/hashmap_example.c @@ -0,0 +1,98 @@ +/* + * Copyright (c) 2016-2020 David Leeds + * + * Hashmap is free software; you can redistribute it and/or modify + * it under the terms of the MIT license. See LICENSE for details. + */ + +#include +#include +#include +#include +#include + +#include + +/* Some sample data structure with a string key */ +struct blob { + char key[32]; + size_t data_len; + unsigned char data[1024]; +}; + +/* + * Contrived function to allocate blob structures and populate + * them with randomized data. + * + * Returns NULL when there are no more blobs to load. + */ +struct blob *blob_load(void) +{ + static size_t count = 0; + struct blob *b; + + if (count++ > 100) { + return NULL; + } + + if ((b = malloc(sizeof(*b))) == NULL) { + return NULL; + } + snprintf(b->key, sizeof(b->key), "%02lx", random() % 100); + b->data_len = random() % 10; + memset(b->data, random(), b->data_len); + + return b; +} + +int main(int argc, char **argv) +{ + /* Declare type-specific hashmap structure */ + HASHMAP(char, struct blob) map; + const char *key; + struct blob *b; + void *temp; + int r; + + /* Initialize with default string key hash function and comparator */ + hashmap_init(&map, hashmap_hash_string, strcmp); + + /* Load some sample data into the map and discard duplicates */ + while ((b = blob_load()) != NULL) { + r = hashmap_put(&map, b->key, b); + if (r < 0) { + /* Expect -EEXIST return value for duplicates */ + printf("putting blob[%s] failed: %s\n", b->key, strerror(-r)); + free(b); + } + } + + /* Lookup a blob with key "AbCdEf" */ + b = hashmap_get(&map, "AbCdEf"); + if (b) { + printf("Found blob[%s]\n", b->key); + } + + /* Iterate through all blobs and print each one */ + hashmap_foreach(key, b, &map) { + printf("blob[%s]: data_len %zu bytes\n", key, b->data_len); + } + + /* Remove all blobs with no data (using remove-safe foreach macro) */ + hashmap_foreach_data_safe(b, &map, temp) { + if (b->data_len == 0) { + printf("Discarding blob[%s] with no data\n", b->key); + hashmap_remove(&map, b->key); + free(b); + } + } + + /* Cleanup time: free all the blobs, and destruct the hashmap */ + hashmap_foreach_data(b, &map) { + free(b); + } + hashmap_cleanup(&map); + + return 0; +} + diff --git a/test/hashmap_test.c b/test/hashmap_test.c index 6235edb..73949aa 100644 --- a/test/hashmap_test.c +++ b/test/hashmap_test.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2016-2018 David Leeds + * Copyright (c) 2016-2020 David Leeds * * Hashmap is free software; you can redistribute it and/or modify * it under the terms of the MIT license. See LICENSE for details. @@ -12,37 +12,34 @@ #include #include #include +#include #include -#define ARRAY_LEN(array) (sizeof(array) / sizeof(array[0])) +#define ARRAY_SIZE(array) (sizeof(array) / sizeof(array[0])) -#define TEST_NUM_KEYS 196607 /* Results in max load factor */ -#define TEST_KEY_STR_LEN 32 +#define TEST_NUM_KEYS 196607 /* Results in max load factor */ +#define TEST_KEY_STR_LEN 32 void **keys_str_random; void **keys_str_sequential; void **keys_int_random; void **keys_int_sequential; -struct hashmap str_map; -struct hashmap int_map; +HASHMAP(char, void) str_map; +HASHMAP(uint64_t, uint64_t) int_map; +typedef HASHMAP(void, void) hashmap_void_t; struct test { const char *name; const char *description; - bool (*run)(struct hashmap *map, void **keys); + bool (*run)(hashmap_void_t *map, void **keys); bool pre_load; }; -/* - * Test type-specific generation macros - */ -HASHMAP_FUNCS_DECLARE(test, const void, void) -HASHMAP_FUNCS_CREATE(test, const void, void) -uint64_t test_time_us(void) +uint64_t time_mono_us(void) { struct timespec now; @@ -93,9 +90,9 @@ void *test_key_alloc_random_int(void) exit(1); } /* RAND_MAX is not guaranteed to be more than 32K */ - *key = (uint64_t)(random() & 0xffff) << 48 | - (uint64_t)(random() & 0xffff) << 32 | - (uint64_t)(random() & 0xffff) << 16 | + *key = ((uint64_t)(random() & 0xffff) << 48) | + ((uint64_t)(random() & 0xffff) << 32) | + ((uint64_t)(random() & 0xffff) << 16) | (uint64_t)(random() & 0xffff); return key; } @@ -148,36 +145,37 @@ void test_keys_generate(void) keys_int_sequential[i] = NULL; } -void test_load_keys(struct hashmap *map, void **keys) +void test_load_keys(hashmap_void_t *map, void **keys) { void **key; + int r; for (key = keys; *key; ++key) { - if (!test_hashmap_put(map, *key, *key)) { - printf("hashmap_put() failed"); + r = hashmap_put(map, *key, *key); + if (r < 0) { + printf("hashmap_put() failed: %s\n", strerror(-r)); exit(1); } } } -void test_reset_map(struct hashmap *map) +void test_reset_map(hashmap_void_t *map) { hashmap_reset(map); } -void test_print_stats(struct hashmap *map, const char *label) +void test_print_stats(hashmap_void_t *map, const char *label) { printf("Hashmap stats: %s\n", label); printf(" # entries: %zu\n", hashmap_size(map)); - printf(" Table size: %zu\n", map->table_size); + printf(" Table size: %zu\n", map->map_base.table_size); printf(" Load factor: %.4f\n", hashmap_load_factor(map)); printf(" Collisions mean: %.4f\n", hashmap_collisions_mean(map)); - printf(" Collisions variance: %.4f\n", - hashmap_collisions_variance(map)); + printf(" Collisions variance: %.4f\n", hashmap_collisions_variance(map)); } -bool test_run(struct hashmap *map, void **keys, const struct test *t) +bool test_run(hashmap_void_t *map, void **keys, const struct test *t) { bool success; uint64_t time_us; @@ -192,13 +190,13 @@ bool test_run(struct hashmap *map, void **keys, const struct test *t) printf("done\n"); } printf("Running...\n"); - time_us = test_time_us(); + time_us = time_mono_us(); success = t->run(map, keys); - time_us = test_time_us() - time_us; + time_us = time_mono_us() - time_us; if (success) { printf("Completed successfully\n"); } else { - printf("Failed\n"); + printf("FAILED\n"); } printf("Run time: %llu microseconds\n", (long long unsigned)time_us); test_print_stats(map, t->name); @@ -206,7 +204,7 @@ bool test_run(struct hashmap *map, void **keys, const struct test *t) return success; } -bool test_run_all(struct hashmap *map, void **keys, +bool test_run_all(hashmap_void_t *map, void **keys, const struct test *tests, size_t num_tests, const char *env) { const struct test *t; @@ -217,8 +215,7 @@ bool test_run_all(struct hashmap *map, void **keys, printf(" %s\n", env); printf("**************************************************\n\n"); for (t = tests; t < &tests[num_tests]; ++t) { - printf("\n**************************************************" - "\n"); + printf("\n**************************************************\n"); printf("Test %02u: %s\n", (unsigned)(t - tests) + 1, t->name); if (t->description) { printf(" Description: %s\n", t->description); @@ -236,74 +233,85 @@ bool test_run_all(struct hashmap *map, void **keys, return (num_failed == 0); } -size_t test_hash_uint64(const void *key) +/* + * Worst case hash function. + */ +size_t test_hash_uint64_bad1(const uint64_t *key) { - const uint8_t *byte = (const uint8_t *)key; - uint8_t i; - size_t hash = 0; - - for (i = 0; i < sizeof(uint64_t); ++i, ++byte) { - hash += *byte; - hash += (hash << 10); - hash ^= (hash >> 6); - } - hash += (hash << 3); - hash ^= (hash >> 11); - hash += (hash << 15); - return hash; + return 999; +} + +/* + * Potentially bad hash function. Depending on the linear probing + * implementation, this could cause clustering and long chains when + * consecutive numeric keys are loaded. + */ +size_t test_hash_uint64_bad2(const uint64_t *key) +{ + return *key; +} + +/* + * Potentially bad hash function. Depending on the linear probing + * implementation, this could cause clustering and long chains when + * consecutive numeric keys are loaded. + */ +size_t test_hash_uint64_bad3(const uint64_t *key) +{ + return *key + *key; +} + +/* + * Use generic hash algorithm supplied by the hashmap library. + */ +size_t test_hash_uint64(const uint64_t *key) +{ + return hashmap_hash_default(key, sizeof(*key)); } -int test_compare_uint64(const void *a, const void *b) +int test_compare_uint64(const uint64_t *a, const uint64_t *b) { - return *(int64_t *)a - *(int64_t *)b; + return memcmp(a, b, sizeof(uint64_t)); } -bool test_put(struct hashmap *map, void **keys) +bool test_put(hashmap_void_t *map, void **keys) { void **key; - void *data; + int r; for (key = keys; *key; ++key) { - data = test_hashmap_put(map, *key, *key); - if (!data) { - printf("malloc failed\n"); - exit(1); - } - if (data != *key) { - printf("duplicate key found\n"); + r = hashmap_put(map, *key, *key); + if (r < 0) { + printf("hashmap_put failed: %s\n", strerror(-r)); return false; } } return true; } -bool test_put_existing(struct hashmap *map, void **keys) +bool test_put_existing(hashmap_void_t *map, void **keys) { void **key; - void *data; + int r; int temp_data = 99; for (key = keys; *key; ++key) { - data = hashmap_put(map, *key, &temp_data); - if (!data) { - printf("malloc failed\n"); - exit(1); - } - if (data != *key) { - printf("did not return existing data\n"); + r = hashmap_put(map, *key, &temp_data); + if (r != -EEXIST) { + printf("did not return existing data: %s\n", strerror(-r)); return false; } } return true; } -bool test_get(struct hashmap *map, void **keys) +bool test_get(hashmap_void_t *map, void **keys) { void **key; void *data; for (key = keys; *key; ++key) { - data = test_hashmap_get(map, *key); + data = hashmap_get(map, *key); if (!data) { printf("entry not found\n"); return false; @@ -316,7 +324,7 @@ bool test_get(struct hashmap *map, void **keys) return true; } -bool test_get_nonexisting(struct hashmap *map, void **keys) +bool test_get_nonexisting(hashmap_void_t *map, void **keys) { void **key; void *data; @@ -332,13 +340,13 @@ bool test_get_nonexisting(struct hashmap *map, void **keys) return true; } -bool test_remove(struct hashmap *map, void **keys) +bool test_remove(hashmap_void_t *map, void **keys) { void **key; void *data; for (key = keys; *key; ++key) { - data = test_hashmap_remove(map, *key); + data = hashmap_remove(map, *key); if (!data) { printf("entry not found\n"); return false; @@ -351,11 +359,12 @@ bool test_remove(struct hashmap *map, void **keys) return true; } -bool test_put_remove(struct hashmap *map, void **keys) +bool test_put_remove(hashmap_void_t *map, void **keys) { size_t i = 0; void **key; void *data; + int r; if (!test_put(map, keys)) { return false; @@ -364,7 +373,7 @@ bool test_put_remove(struct hashmap *map, void **keys) if (i++ >= TEST_NUM_KEYS / 2) { break; } - data = test_hashmap_remove(map, *key); + data = hashmap_remove(map, *key); if (!data) { printf("key not found\n"); return false; @@ -380,26 +389,31 @@ bool test_put_remove(struct hashmap *map, void **keys) if (i++ >= TEST_NUM_KEYS / 2) { break; } - data = test_hashmap_put(map, *key, *key); - if (!data) { - printf("malloc failed\n"); - exit(1); - } - if (data != *key) { - printf("duplicate key found\n"); + r = hashmap_put(map, *key, *key); + if (r < 0) { + printf("hashmap_put failed: %s\n", strerror(-r)); return false; } } return true; } -bool test_iterate(struct hashmap *map, void **keys) +bool test_iterate(hashmap_void_t *map, void **keys) { size_t i = 0; - struct hashmap_iter *iter = hashmap_iter(map); + const void *key; + void *data; - for (; iter; iter = hashmap_iter_next(map, iter)) { + hashmap_foreach(key, data, map) { ++i; + if (!key) { + printf("key %zu is NULL\n", i); + return false; + } + if (!data) { + printf("data %zu is NULL\n", i); + return false; + } } if (i != TEST_NUM_KEYS) { printf("did not iterate through all entries: " @@ -409,23 +423,20 @@ bool test_iterate(struct hashmap *map, void **keys) return true; } -bool test_iterate_remove(struct hashmap *map, void **keys) +bool test_iterate_remove(hashmap_void_t *map, void **keys) { size_t i = 0; - struct hashmap_iter *iter = hashmap_iter(map); const void *key; + void *data, *temp; - while (iter) { + hashmap_foreach_safe(key, data, map, temp) { ++i; - key = test_hashmap_iter_get_key(iter); - if (test_hashmap_get(map, key) != key) { + if (hashmap_get(map, key) != data) { printf("invalid iterator on entry #%zu\n", i); return false; } - iter = hashmap_iter_remove(map, iter); - if (test_hashmap_get(map, key) != NULL) { - printf("iter_remove failed on entry #%zu\n", i); - return false; + if (hashmap_remove(map, key) != data) { + printf("key/data mismatch %zu: %p != %p\n", i, key, data); } } if (i != TEST_NUM_KEYS) { @@ -436,50 +447,42 @@ bool test_iterate_remove(struct hashmap *map, void **keys) return true; } -struct test_foreach_arg { - struct hashmap *map; - size_t i; -}; - -int test_foreach_callback(const void *key, void *data, void *arg) +bool test_iterate_remove_odd(hashmap_void_t *map, void **keys) { - struct test_foreach_arg *state = (struct test_foreach_arg *)arg; - - if (state->i & 1) { - /* Remove every other key */ - if (!test_hashmap_remove(state->map, key)) { - printf("could not remove expected key\n"); - return -1; + size_t size = hashmap_size(map); + size_t i = 0; + size_t removed = 0; + const void *key; + void *temp; + + hashmap_foreach_key_safe(key, map, temp) { + if (i & 1) { + /* Remove odd indices */ + if (!hashmap_remove(map, key)) { + printf("could not remove expected key\n"); + return false; + } + ++removed; } + ++i; } - ++state->i; - return 0; -} -bool test_foreach(struct hashmap *map, void **keys) -{ - struct test_foreach_arg arg = { map, 1 }; - size_t size = hashmap_size(map); - - if (test_hashmap_foreach(map, test_foreach_callback, &arg) < 0) { - return false; - } - if (hashmap_size(map) != size / 2) { + if (hashmap_size(map) != size - removed) { printf("foreach delete did not remove expected # of entries: " "contains %zu vs. expected %zu\n", hashmap_size(map), - size / 2); + size - removed); return false; } return true; } -bool test_clear(struct hashmap *map, void **keys) +bool test_clear(hashmap_void_t *map, void **keys) { hashmap_clear(map); return true; } -bool test_reset(struct hashmap *map, void **keys) +bool test_reset(hashmap_void_t *map, void **keys) { hashmap_reset(map); return true; @@ -533,9 +536,9 @@ const struct test tests[] = { .pre_load = true }, { - .name = "removal in foreach", - .description = "iterate and delete 1/2 using hashmap_foreach", - .run = test_foreach, + .name = "iterate remove odd indices", + .description = "iterate and delete alternate entries", + .run = test_iterate_remove_odd, .pre_load = true }, { @@ -560,45 +563,33 @@ int main(int argc, char **argv) bool success = true; /* Initialize */ - printf("Initializing hash maps..."); - if (hashmap_init(&str_map, hashmap_hash_string, hashmap_compare_string, - 0) < 0) { - success = false; - } - /* - hashmap_set_key_alloc_funcs(&str_map, hashmap_alloc_key_string, free); - */ - if (hashmap_init(&int_map, test_hash_uint64, test_compare_uint64, - 0) < 0) { - success = false; - } - printf("done\n"); + printf("Initializing hash maps...\n"); + hashmap_init(&str_map, hashmap_hash_string, strcmp); - if (!success) { - printf("Hashmap init failed"); - return 1; - } +// hashmap_set_key_alloc_funcs(&str_map, strdup, (void(*)(char *))free); + + hashmap_init(&int_map, test_hash_uint64_bad2, test_compare_uint64); printf("Generating test %u test keys...", TEST_NUM_KEYS); test_keys_generate(); printf("done\n"); printf("Running tests\n\n"); - success &= test_run_all(&str_map, keys_str_random, tests, - ARRAY_LEN(tests), "Hashmap w/randomized string keys"); - success &= test_run_all(&str_map, keys_str_sequential, tests, - ARRAY_LEN(tests), "Hashmap w/sequential string keys"); + success &= test_run_all((hashmap_void_t *)&str_map, keys_str_random, tests, + ARRAY_SIZE(tests), "Hashmap w/randomized string keys"); + success &= test_run_all((hashmap_void_t *)&str_map, keys_str_sequential, tests, + ARRAY_SIZE(tests), "Hashmap w/sequential string keys"); - success &= test_run_all(&int_map, keys_int_random, tests, - ARRAY_LEN(tests), "Hashmap w/randomized integer keys"); + success &= test_run_all((hashmap_void_t *)&int_map, keys_int_random, tests, + ARRAY_SIZE(tests), "Hashmap w/randomized integer keys"); - success &= test_run_all(&int_map, keys_int_sequential, tests, - ARRAY_LEN(tests), "Hashmap w/sequential integer keys"); + success &= test_run_all((hashmap_void_t *)&int_map, keys_int_sequential, tests, + ARRAY_SIZE(tests), "Hashmap w/sequential integer keys"); printf("\nTests finished\n"); - hashmap_destroy(&str_map); - hashmap_destroy(&int_map); + hashmap_cleanup(&str_map); + hashmap_cleanup(&int_map); if (!success) { printf("Tests FAILED\n");