ID, Ego, Super-Ego

Bartek „Koziołek“ Kuczyński

Questions – sli.do

Code – #geecon

ID

Instinct without morality

How JVM identify Objects?

Ordinary Object Pointers – OOPS

  • This internal stuff!!!
  • Mark Word
  • Klass Word

Mark Word

  • Identity hashCode
  • Locks
  • GC metadata

Klass Word

  • Klass pointer
  • Compressed Class

In most cases you should not care

When you should care?

  • You use object as monitor
  • CPU cache fit - Old code
  • High performance

Ego

How do I identify myself

What can be there?

int(eger) or long

Sequence

Pros

  • It is simple
  • It is fast
  • It has „business value”

Cons

  • It is not secure
  • It is slow
  • The „missed value” problem

Ideal Identifier

Technical

  • Unique
  • Efficient & Fast
  • Easy to implement

Business

  • Unique
  • Easy to use
  • Has business meaning
  • Sortable/Gap-free
  • Is it unique? Yes but no
  • Is it fast? Yes
  • Is it easy to use? Yes
  • Is it sortable? Yes
  • Is it gap-free? Yes but no

Other solution

UUID

  • Is it unique? Yes
  • Is it fast? Not so
  • Is it easy to use? No
  • Is it sortable? No
  • Is it gap-free? Unapplicable (Sparse)

Other solution

ULID

What is ULID

  • Universally Unique Lexicographically Sortable Identifier
  • UUID with time
  • 48bits of timestamp + 80bits of randomness
  • Is it unique? Yes
  • Is it fast? Not so
  • Is it easy to use? No
  • Is it sortable? Yes
  • Is it gap-free? Unapplicable

Super-EGO

How society identify us

Natural identifiers

Email

Pros

  • It is simple
  • Easy to maintain

Cons

  • GDPR

National ID

Pros

  • It is simple
  • Easy to maintain

Cons

  • GDPR
  • Design flaws

Business-value ID

  • Invoice ID
  • Customer ID
  • Account number

Business-value IDs

  • Have some specific requirements
  • Could be hard to implement
  • Exist in different context

What about users?

  • UX
  • Links
  • Randomness

How many unique ID we can generate?

24986644000165537791

24 quintillion 986 quadrillion 644 trillion 165 million 537 thousand 791
24 tryliardy 986 tryliony 644 miliardów 165 miliony 537 tysiące 791

Alternatives

  • Custom Random ID for UUID
  • Snowflake ID for ULID
  • Business ID for Sequence

What does it mean efficient & fast?

  • Easy to use/maintain
  • Works well with many nodes
  • Index friendly
  • Number of values per time unit

Stats time

  • OS: Ubuntu 20.04.6 LTS x86_64
  • Kernel: 5.8.0-43-generic
  • CPU: AMD Ryzen Threadripper 3960X (48) @ 3.800GHz
  • Memory: 128746MiB
  • Java: OpenJDK Runtime Environment (build 20+36-2344)
Generate ID ops/s (higher is better)
1 2 24 48
Seq 1943,8±5,5 1076,6±43,6 604,6±72,7 682,3±117,3
UUID 20,9 ± 0,2 9,7 ± 2 9,9 ±0,1 9,9 ±0,1
ULID 15,0 ± 0,2 5,0 ± 1,5 4,3 ± 0,1 4,0 ± 0,1
Custom 196,0 ± 1,0 89,0 ± 5,2 69,0 ± 2,1 66,0 ± 0,3
SecCustom 25,4 ± 0,2 12,7 ± 2,6 11,6 ± 0,2 10,8 ± 0,1
UnSecCustom 206,7 ± 4,1 106,6 ± 3,2 93,3 ± 2,4 67,0 ± 1,2

Databases…

  • Blocking and parallel inserts
  • UUID type in databases
  • Indexes

So…

  • ID is important part of entity
  • Sequence are almost never the best solution
  • UUID and ULID are better
  • Think about structure of your data
  • What is Aggregate Root?
  • Do you need gap-free ID?
  • Where you will use that ID?
  • Can you use natural IDs?

Feedback

Thanks!