Generating sequential ids is simple and effective: you create a sequence or an
auto_increment column and boom! you generate ids for new records easily in any table in your relational database. So, why generating non-sequential ids? What's wrong with sequential ids?
- You may want to hide ids from competitors. For example if a competitor goes to your app and creates a user, and later uses the API and sees that the user has id=100, they'll know you don't have many users, and they can also check weeks or months later, create another user and see how many new users approximately you got.
- You may want to hide ids from attackers. With guessable or predictable ids an attacker can easily go through all records of the same type.
- In distributed systems sometimes you have multiple places where you'd want to create new ids. If the database is in charge of that, this could be a limitation or bottleneck.
Even though sequential ids may have some issues, they are great for very good reasons. Such as:
- They are small and cheap to generate and store. Typically you will store them in a single
INTin a database which is more efficient that an identifier composed by multiple chars, and the database will help you generating them (with a sequence or a special attribute on the column).
- They give you a way to sort records in the database by creation time indirectly. And since they are primary key, you have the column already indexed, so there's no need for an extra column and index to sort by creation time. Although in many cases you still want to store the timestamp anyway, but you may decide not to index it and keep using the primary key for sorting.
- They guarantee uniqueness.
Obfuscated sequential ids
One strategy to generate non-sequenatial ids is to still generate sequential ids, but obfuscate those when sending data through your API. This is for example what hashids do. With hashids you can encode and decode numeric ids using a salt, which should be kept secret. You get the benefits of sequential ids (cheap) and also the benefits of non-sequential ids (unguessable and unpredictable, more or less). The main problem is that you need to translate ids all the time. When a URL contains an id, you'll need to trnaslate it to its local version, and every time you serialize a record, you'll need to calculate the public version of the id. This is also true while debugging, which can be quite annoying.
Besides, these ids still leak some information. These ids are larger when the encoded numeric id grows, and are subject to brute force attacks to guess the salt.
GUIDs and UUIDs
An standard in the industry to solve this is generating UUIDs or GUIDs. There are different versions that basically are different strategies to generate a set of ids based on 128 bits usually expressed in groups of hexadecimal characters separated by
-. Some strategies use the current time, the MAC address of the device where the code is running,... and there's a version that generates completely random bits.
Finally another option is to generate random ids. There are many options being the most popular ones
cuids. You can see a great comparison on the
cuid2 GitHub repo.
One moment. Why not using both? That's what PlanetScale does. They store a sequential id as primary key, but they also have a
nanoid in a separate column that has a unique index.
Non-sequential ids can be very convinient. If you additionally prefix them based on the type of entity they represent, such as
user_abcd1234, you get unguessable, unpredictable and globally unique ids. But how to generate them?
- If you still really want all the benefits from sequential ids, obfuscate them, with hashids probably.
- If you want to go with random ids, I'd recommend either
cuid2because they are more flexible than UUID v4 and are less likely to collide.
- If you want the simplest option: use UUID v4. In JS runtimes you can just do
crypto.randomUUID()and that's it. The dashes are not great in some situations, but if that's an issue you can just strip them.