Introduction to various primary key generation methods and comparison of their advantages and disadvantages.

In a system, each table must have a column to store a unique primary key ID. If the system is distributed and has multiple distributed databases, it is necessary to ensure that the IDs in each database are not duplicated. This requires the following characteristics for the unique ID:

The ID must be unique throughout the system.
The ID is of numeric type and has a trend of increasing.
The ID is short and has fast query efficiency.

There are multiple ways to generate IDs, and large companies definitely use more complex methods. However, for small systems, the following methods are sufficient:

UUID#

This is the most common solution, which generates a UUID using a utility class method.

Advantages:

Simple implementation.
Generated locally without performance issues.
Easy data migration because it is a globally unique ID.

Disadvantages:

The generated IDs are unordered and cannot guarantee a trend of increasing.
UUIDs are stored as strings, resulting in slow query efficiency.
Requires large storage space.
The ID itself has no business meaning and is not readable.

Use cases:

Generating tokens or tokens.
Not suitable for scenarios that require a trend of increasing IDs.

MySQL Auto-Increment Primary Key#

This method is also commonly used, and it is simple to set up. It utilizes the auto-increment feature of MySQL's primary key, which increments the ID by 1 by default.

Advantages:

Numeric IDs with increasing values.
High query efficiency.
Has some business readability.

Disadvantages:

Single point of failure - if MySQL goes down, IDs cannot be generated.
High database load, unable to handle high concurrency.

MySQL Multi-Instance Auto-Increment Primary Key#

uuid1

Each instance has an initial value of 1, 2, 3...N, with a step size of N (in this example, the step size is 4).

Advantages: Solves the single point of failure issue.

Disadvantages: Once the step size is determined, it cannot be expanded. Additionally, individual databases have high load and cannot meet the performance requirements of high concurrency.

Use cases: Scenarios where data does not need to be expanded.

Snowflake Algorithm#

The Snowflake algorithm generates a 64-bit binary positive integer and then converts it to a decimal number. The 64-bit binary number consists of the following parts:

uuid2

1-bit identifier: always 0
41-bit timestamp: represents the difference between the current timestamp and a specified start timestamp (the start timestamp is usually specified by the ID generator)
10-bit machine identifier: can be deployed on 1024 nodes. If machines are deployed in different data centers (IDCs), these 10 bits can be composed of 5 bits for the data center ID and 5 bits for the machine ID.
12-bit sequence: represents the count within a millisecond. Each node can generate 4096 sequence numbers per millisecond.

Java implementation:

(Java code snippet)

Advantages:

This solution can generate approximately 4.096 million IDs per second, with fast performance.
The timestamp is in the high bits and the sequence number is in the low bits, resulting in IDs that are trend-increasing and ordered by time.
High flexibility - the bit allocation can be adjusted according to business requirements to meet different needs.

Disadvantages:

Relies on the clock of the machine. If the server clock is rolled back, it may result in duplicate ID generation.

In distributed scenarios, server clock rollback is a common occurrence, usually within 10 milliseconds. Some may argue that 10 milliseconds is short and can be ignored. However, this algorithm is based on millisecond-level generation, so once a rollback occurs, duplicate IDs are likely to be generated.

Redis Generation Solution#

Using the atomic operation "incr" in Redis to increment the ID. The algorithm is usually: year + day of the year + day + hour + Redis increment.

Advantages: Trend-increasing and highly readable.

Disadvantages: Occupies bandwidth and requires making requests to Redis each time.

References: