Hash functions. Cryptographic hash function Find a hash image of a message using a hash function

18.09.2023

Internet

The hash function you choose should be easy to calculate and create as few collisions as possible, i.e. should distribute the keys evenly across the existing indexes in the table. Of course, it is impossible to determine whether a particular hash function will distribute keys correctly unless those keys are known in advance. However, although the keys themselves are rarely known before choosing a hash function, some properties of those keys that affect their distribution are usually known. Let's look at the most common methods for specifying a hash function.

Division method. The initial data is some integer key key and table size m. The result of this function is the remainder when this key is divided by the size of the table. General view of the function:

int h(int key, int m) (

return key % m; // Values

For m= 10 hash function returns the least significant digit of the key.

For m= 100 hash function returns the least significant two digits of the key.

Additive method, in which the key is a character string. In a hash function, a string is converted to an integer by summing all the characters and returning the remainder after division by m(usually table size m= 256).

int h(char *key, int m) (

Collisions occur in strings consisting of the same set of characters, for example, abc And cab.

This method can be slightly modified, obtaining the result by summing only the first and last characters of the key string.

int h(char *key, int m) (

int len = strlen(key), s = 0;

if(len< 2) // Если длина ключа равна 0 или 1,

s = key; // return key

s = key + key;

In this case, collisions will only occur in lines, for example, abc And amc.

Mid-square method, in which the key is squared (multiplied by itself) and several middle digits of the resulting value are used as an index.

For example, the key is a 32-bit integer, and the hash function returns the average 10 bits of its square:

int h(int key) (

key >>= 11; // Discard 11 least significant bits

return key % 1024; // Return 10 least significant bits

Exclusive OR method for row keys (usually table size m=256). This method is similar to the additive method, but it distinguishes similar words. The method is that the “exclusive OR” operation is sequentially applied to the elements of the string.

IN multiplicative method additionally a random real number is used r from the interval . If this product is multiplied by the size of the table m, then the integer part of the resulting product will give a value in the range from 0 to m–1.

int h(int key, int m) (

double r = key * rnd();

r = r – (int)r; // Select the fractional part

In general, for large values m the indices generated by the hash function have a wide spread. Moreover, mathematical theory states that the distribution is more uniform if m is a prime number.

In the examples considered, the hash function i = h(key) only determines the position from which to search (or initially place in the table) a record with a key key. Therefore the hashing scheme must include conflict resolution algorithm , which determines the order of actions if the position i = h(key) turns out to be already occupied by a record with a different key.

Hash Schemes

In most problems, two or more keys have the same hash, but they cannot occupy the same cell in the hash table. There are two possible options: either find a different position for the new key, or create a separate list for each hash table index, which contains all the keys mapped to that index.

These options represent two classic schemes:

– hashing using the chain method (with lists), or the so-called multidimensional hashing – chaining with separate lists;

– hashing using the open addressing method with linear sampling – linear probe open addressing.

Open addressing method with linear sampling. Initially, all cells of the hash table, which is a regular one-dimensional array, are marked as unoccupied. Therefore, when adding a new key, it is checked whether the given cell is occupied. If the cell is occupied, then the algorithm searches in a circle until a free space is found (“open address”), i.e. either elements with homogeneous keys are placed near the resulting index, or double hashing is performed using different but interrelated hash functions.

In the future, when searching, first find the position by key i in the table, and if the key does not match, then the subsequent search is carried out in accordance with the conflict resolution algorithm, starting from position i by the list.

Chain method is used more often than the previous one. In this case, the index obtained by the hash function i is treated as an index in a hash table of lists, i.e. key key next entry is mapped to position i = h(key) tables. If the position is free, then the element with the key is placed in it key, if it is busy, then a conflict resolution algorithm is worked out, as a result of which such keys are added to the list starting at i th cell of the hash table. For example, denoting N –NULL:

As a result, we have a table of an array of linked lists or trees.

The process of populating (reading) a hash table is simple, but accessing the elements requires the following operations:

– index calculation i;

– search in the corresponding chain.

To improve the search when adding a new element, you can use the insertion algorithm not at the end of the list, but with ordering, i.e. add an element to the desired location.

When solving problems in practice, it is necessary to select a hash function i= h(key), which displays the key values as evenly as possible key per interval, m– size of the hash table. And most often, if there is no information about the probability of key distribution among records, using the division method, they take a hash function i = h(key) = key%m.

When solving the inverse problem, access (search) to a specific subset is possible from a hash table (hash structure), which provides quick access to the desired element by hash address (index).

What is a hash? A hash function is a mathematical transformation of information into a short, defined string.

Why is this necessary? Analysis using hash functions is often used to monitor the integrity of important operating system files, important programs, and important data. Control can be carried out either as needed or on a regular basis.

How it's done? First, determine the integrity of which files need to be monitored. For each file, its hash value is calculated using a special algorithm and the result is saved. After the required time, a similar calculation is made and the results are compared. If the values are different, then the information contained in the file has been changed.

What characteristics should a hash function have?

must be able to perform arbitrary to fixed length data conversions;
must have an open algorithm so that its cryptographic strength can be investigated;
must be one-sided, that is, it should not be mathematically possible to determine the initial data based on the result;
must “resist” collisions, that is, it must not produce the same values for different input data;
should not require large computing resources;
with the slightest change in input data, the result should change significantly.

What are the popular hashing algorithms? The following hash functions are currently used:

CRC – cyclic redundancy code or checksum. The algorithm is very simple and has a large number of variations depending on the required output length. Not cryptographic!
MD 5 is a very popular algorithm. Like its previous version, MD 4 is a cryptographic function. The hash size is 128 bits.
SHA -1 is also a very popular cryptographic function. The hash size is 160 bits.
GOST R 34.11-94 is a Russian cryptographic standard for hash function calculations. The hash size is 256 bits.

When can a system administrator use these algorithms? Often, when downloading any content, for example, programs from the manufacturer’s website, music, movies or other information, there is a value of checksums calculated using a certain algorithm. For security reasons, after downloading, you must independently calculate the hash function and compare the value with what is indicated on the website or in the attachment to the file. Have you ever done this?

What is more convenient to calculate a hash? Now there are a large number of similar utilities, both paid and free for use. I personally liked HashTab. Firstly, during installation the utility is built in as a tab in the file properties, secondly, it allows you to select a large number of hashing algorithms, and thirdly, it is free for private non-commercial use.

What is Russian? As mentioned above, in Russia there is a hashing standard GOST R 34.11-94, which is widely used by many manufacturers of information security tools. One of these tools is the program for fixing and monitoring the initial state of the FIX software package. This program is a means of monitoring the effectiveness of the use of information security.

FIX (version 2.0.1) for Windows 9x/NT/2000/XP

Calculation of checksums of given files using one of 5 implemented algorithms.
Fixation and subsequent monitoring of the initial state of the software package.
Comparison of software package versions.
Fixation and control of directories.
Monitoring changes in specified files (directories).
Generating reports in TXT, HTML, SV formats.
The product has a FSTEC certificate for NDV 3 No. 913 until June 1, 2013.

What about digital signature? The result of the hash function calculation, together with the user’s secret key, goes to the input of the cryptographic algorithm, where the electronic digital signature is calculated. Strictly speaking, the hash function is not part of the digital signature algorithm, but often this is done on purpose in order to exclude an attack using a public key.

Currently, many e-commerce applications allow you to store the user's secret key in a private token area (ruToken, eToken) without the technical ability to retrieve it from there. The token itself has a very limited memory area, measured in kilobytes. To sign a document, there is no way to transfer the document to the token itself, but it is very simple to transfer the hash of the document to the token and receive an electronic digital signature as a result.

To solve the problem of finding the required element among large data, an algorithm was proposed hashing (hashing- mixing), in which keys are created that define the array data and, on their basis, the data is written into a table called hash table . Recording keys are determined using the function i = h(key) , called hash function . The hashing algorithm determines the position of the searched element in the hash table based on the value of its key obtained by the hash function.

Concept hashing– This is the partitioning of a common (base) set of unique keys of data elements into disjoint sets with a certain property.

Take, for example, a dictionary or encyclopedia. In this case, letters of the alphabet can be taken as search keys, i.e. The main element of the hashing algorithm is key (key). In most applications, the key provides an indirect reference to the data.

In fact, hashing is a special method of addressing data to quickly find the necessary information by keys .

If the basic set contains N elements, then it can be divided into 2 N various subsets.

Hash table and hash functions

A function that maps the keys of data items to a set of integers (indexes in a table - hash table ), called hashing function , or hash function :

i = h(key);

Where key– convertible key, i– the resulting table index, i.e. the key is mapped to a set of integers ( hash addresses ), which are subsequently used to access the data.

However, a hash function for multiple key values may produce the same position value i in the table. The situation in which two or more keys share the same index (hash address) is called collision when hashing.

A good hash function is a function that minimizes collisions and distributes data evenly throughout the table, and a perfect hash function is a function that does not generate collisions:

There are two methods to resolve hashing collisions:

– open addressing method with linear testing;

– chain method.

Hash table

A hash table is a regular array with unusual addressing specified by a hash function.

Hash structure is considered a generalization of an array that provides fast, direct access to data by index.

There are many hashing schemes, differing in the choice of a successful function h(key), and a conflict resolution algorithm. The effectiveness of solving a real practical problem will significantly depend on the chosen strategy.

Examples of hash functions

int h(int key, int m) (

return key % m; // Values

For m= 10 hash function returns the least significant digit of the key.

For m= 100 hash function returns the least significant two digits of the key.

int h(char *key, int m) (

Collisions occur in strings consisting of the same set of characters, for example, abc And cab.

This method can be slightly modified, obtaining the result by summing only the first and last characters of the key string.

int h(char *key, int m) (

int len = strlen(key), s = 0;

if(len< 2) // Если длина ключа равна 0 или 1,

s = key; // return key

s = key + key;

In this case, collisions will only occur in lines, for example, abc And amc.

Mid-square method, in which the key is squared (multiplied by itself) and several middle digits of the resulting value are used as an index.

For example, the key is a 32-bit integer, and the hash function returns the average 10 bits of its square:

int h(int key) (

key >>= 11; // Discard 11 least significant bits

return key % 1024; // Return 10 least significant bits

int h(int key, int m) (

double r = key * rnd();

r = r – (int)r; // Select the fractional part

In general, for large values m the indices generated by the hash function have a wide spread. Moreover, mathematical theory states that the distribution is more uniform if m is a prime number.

Hash function is a function that converts data (file or document) into a short digital code.

For example, converts text
"Northern branch of RGUITP"
into code
745

A hash is similar to an array, which is a collection of scalar data whose individual elements are selected by index value. Unlike an array, hash index values are not small non-negative integers, but arbitrary scalars. These scalars (called keys) are used to retrieve values from an array.

The hash elements are not in any particular order. You can think of them as a stack of bibliography cards. The top half of each card is the key and the bottom half is the value. Every time you put a value into the hash, a new card is created. When you need to change a value, you specify the key and Perl finds the required card. Therefore, the order of the cards, in fact, does not matter. Perl stores all cards (that is, key-value pairs) in a special internal order that makes it easier to find a specific card, so you don't have to look through all the pairs when searching. The order in which cards are stored cannot be changed, so don't even try.

The hash function is designed to compress the signed document M to several tens or hundreds of bits. The hash function h( ) takes as an argument a message (document) M of arbitrary length and returns a hash value h(M)=H of a fixed length. Typically, the hashed information is a compressed binary representation of the underlying message of arbitrary length. It should be noted that the value of the hash function h(M) depends in a complex way on the document M and does not allow the document M itself to be restored.

The hash function must satisfy a number of conditions:

the hash function must be sensitive to all kinds of changes in the text M, such as insertions, emissions, permutations, etc.;

the hash function must have the property of irreversibility, that is, the task of selecting a document M that would have the required hash function value must be computationally intractable;

the probability that the hash values of two different documents (regardless of their lengths) will coincide should be negligible.

For example, the sum of the ASCII character code values is 65(A)+66(B)+67(C)=198. Ideally, the hash function value should be unique, i.e. the number 198 should only be returned for ABC. In practice, this is not always achieved (cases when two different sequences of characters have the same hash are called collisions). Sometimes it is imposed on the hash function irreversibility condition: i.e. the value cannot be used to restore the original string. A classic example of such a function is MD5, which is used to check the integrity of downloaded ISO images. For MD5 and similar ones, collisions are a very negative phenomenon: it reduces the “degree of irreversibility” of the hash function and allows, for example, to guess a password by knowing its MD5 hash.
Cryptographic hash functions are divided into two classes:

Hash functions without a key (MDC (Modification (Manipulation) Detect Code) codes),

- hash- functionsckey(MAC (Message Authentication Code) - codes).

Keyless hash functions are divided into two subclasses:

Weak hash functions

Strong hash functions.

A weak hash function is a one-way function H(x) that satisfies the following conditions:

1) argument x can be a bit string of arbitrary length;

2) the value of H(x) must be a fixed-length bit string;

3) the value of H(x) is easy to calculate;

4) for any fixed x it is computationally impossible to find another

May 12, 2010 at 01:28

Hash algorithms

Information Security

As I believe, many people know that since 2007, the US National Institute of Standards and Technology (NIST) has been holding a competition to develop a hash algorithm to replace SHA-1, and a family of SHA-2 algorithms. However, for some reason this topic has been neglected on the site. This is actually what brought me to you. I bring to your attention a series of articles devoted to hash algorithms. In this series, we will together study the basics of hash functions, consider the most famous hash algorithms, plunge into the atmosphere of the SHA-3 competition and consider the algorithms that claim to win it, and we will definitely test them. Also, if possible, Russian hashing standards will be considered.

About Me

Student of the Department of Information Security.

About hashing

Nowadays, almost no cryptography application is complete without the use of hashing.
Hash functions are functions designed to “compress” an arbitrary message or set of data, usually written in the binary alphabet, into some fixed-length bit pattern called a convolution. Hash functions have a variety of applications when conducting statistical experiments, testing logical devices, and constructing algorithms for quick searches and checking the integrity of records in databases. The main requirement for hash functions is the uniform distribution of their values when the argument values are randomly selected.
A cryptographic hash function is any hash function that is cryptographically stable, that is, it satisfies a number of requirements specific to cryptographic applications. In cryptography, hash functions are used to solve the following problems:
- building data integrity monitoring systems during their transmission or storage,
- data source authentication.

Any function is called a hash function h:X -> Y, easily computable and such that for any message M meaning h(M) = H (convolution) has a fixed bit length. X- set of all messages, Y- a set of binary vectors of fixed length.

As a rule, hash functions are built on the basis of so-called one-step compression functions y = f(x 1 , x 2) two variables, where x 1, x 2 And y- binary vectors of length m, n And n accordingly, and n is the length of the convolution, and m- message block length.
To get the value h(M) the message is first split into blocks of length m(at the same time, if the message length is not a multiple m then the last block is supplemented in some special way until it is complete), and then to the resulting blocks M 1, M 2,.., M N apply the following sequential procedure for calculating convolution:

H o = v,
H i = f(M i ,H i-1), i = 1,.., N,
h(M) = H N

Here v- some constant, often called an initializing vector. She gets out
for various reasons and can be a secret constant or a set of random data (a sample of date and time, for example).
With this approach, the properties of the hash function are completely determined by the properties of the one-step compression function.

There are two important types of cryptographic hash functions - key and keyless. Key hash functions are called message authentication codes. They make it possible, without additional means, to guarantee both the correctness of the data source and the integrity of the data in systems with users who trust each other.
Keyless hash functions are called error detection codes. They make it possible to guarantee data integrity using additional means (encryption, for example). These hash functions can be used in systems with both trusting and distrusting users.

About statistical properties and requirements

As I already said, the main requirement for hash functions is the uniform distribution of their values when the argument values are randomly selected. For cryptographic hash functions, it is also important that with the slightest change in the argument, the value of the function changes greatly. This is called the avalanche effect.

The key hashing functions have the following requirements:
- impossibility of fabrication,
- impossibility of modification.

The first requirement means that it is very difficult to find a message with the correct collapse value. The second is the high complexity of selecting, for a given message with a known convolution value, another message with the correct convolution value.

The requirements for keyless functions are:
- unidirectionality,
- collision resistance,
- resistance to finding a second preimage.

Unidirectionality refers to the high difficulty of finding a message based on a given convolution value. It should be noted that at the moment there are no hash functions in use with proven one-wayness.
Collision resistance refers to the difficulty of finding a pair of messages with the same convolution values. Usually, it is the finding of a way for constructing collisions by cryptanalysts that serves as the first signal that the algorithm is becoming obsolete and that it needs to be replaced quickly.
Resistance to finding a second preimage refers to the difficulty of finding a second message with the same convolution value for a given message with a known convolution value.

This was the theoretical part, which will be useful to us in the future...

About popular hash algorithms

Algorithms CRC16/32- checksum (not cryptographic conversion).

Algorithms MD2/4/5/6. They are the creation of Ron Rivest, one of the authors of the RSA algorithm.
The MD5 algorithm was once very popular, but the first prerequisites for hacking appeared in the late nineties, and now its popularity is rapidly declining.
The MD6 algorithm is a very interesting algorithm from a design point of view. It was nominated for the SHA-3 competition, but, unfortunately, the authors did not have time to bring it up to standard, and this algorithm is not on the list of candidates that made it to the second round.

Ruler algorithms SHA Algorithms that are widely used today. There is an active transition from SHA-1 to SHA-2 version standards. SHA-2 is the collective name for the SHA224, SHA256, SHA384 and SHA512 algorithms. SHA224 and SHA384 are essentially analogues of SHA256 and SHA512, respectively, only after calculating the convolution, some of the information in it is discarded. They should be used only to ensure compatibility with equipment of older models.

Russian standard - GOST 34.11-94.

In the next article

Review of MD algorithms (MD4, MD5, MD6).

Literature

A. P. Alferov, Fundamentals of cryptography.

Bruce Schneier, Applied Cryptography.