arrow_back How to get a numeric hash of a given length?

2 votes
There is a string consisting of Russian or any alphabet. We need to get its numeric hash of a given length. Even if it is a string consisting of one or more letters, the hash must have the same length. This is needed to sort alphabetically, but through the numeric hash, to quickly search for the first occurrence to display hints.

2 Answers

0 votes
This sounds like an auto-complete problem. Wouldn't it be better to try using a prefix tree instead of a hash?
It is well described here, for example:


Ruslan . I just need a hash to store it in the database and sort by it. To sort strings alphabetically as well as by numbers in ascending or descending order from a to z.
ddddd tttt then you do not need an ordinary hash, but one that does not break the alphabetic order when recalculating from string to number. In general this is not the case, i.e. a hash changing one bit on the input can give drastically different values on the output and the string AA may turn out to be larger than ZZ.

When I wrote about the prefix tree, I meant that you build it once in memory based on database data. And then you don't access the database any more, but search the tree, and you don't need to sort anything there.

But if the goal is to search the database every time, then why don't you want to sort the text in the database without hashes? You can for example use like function to select appropriate rows from database (if you search from beginning of row and have index by text field, like works well) and sort result, and then return for example first row of received dataset - all this can be done by one SQL query.
Ruslan . It seems to me that it will be slow for 1 billion lines
ddddd tttt Ruslan .
0 votes
Take any hash function. For example, the Jenkins hash function.

uint32_t jenkins_one_at_a_time_hash(char *key, size_t len)
uint32_t hash, i;
for(hash = i = 0; i < len; ++i)
hash += key[i];
hash += (hash << 10);
hash ^= (hash >> 6);
hash += (hash << 3);
hash ^= (hash >> 11);
hash += (hash << 15);
return hash;


ddddd tttt , I found the appropriate on stackoverflow. The response provided a link to an article about minimum ideal hash function which converts a set of n keys into an integer number from 0 to n-1. If your tags have a length limit, this hash function will work for you.
Vasily Demin , these are tags and there won't be many collisions.
ddddd tttt , No, the function works correctly
Ruslan . This is true for any hash function, since there are no hash functions without collisions.
Vasily Demin , this function cannot sort alphabetically. So it does not fit. aa > ab, ab < as, i.e. neither in descending nor ascending order can be sorted.

145642639725 ab
179039592072 aa
179208876726 ace.
Vasily Demin , or I made a mistake with the Unicode
ddddd tttt I don't fully understand why you need the hash function. Do you want to use it to organize strings? It is hardly possible to do this with a hash function, because collisions are inevitable when using it.
Here you just need to keep in mind that if the length of the hash is less than the string length, then collisions are possible, ie absolutely different strings can get the same hash value and it should be able to handle it.