Ticket #535 (closed enhancement: fixed)

Opened 10 months ago

Last modified 4 months ago

Implement proper UIDs for primary keys

Reported by: admin Owned by: padams
Priority: critical Milestone: 1.5.0
Component: base module Version:
Keywords: 1.5.0-final Cc:

Description (last modified by admin) (diff)

Currently OWA uses a CRC32 hash to construct it's primary keys for facts and dimensions, which, while efficient, can cause collisions under high volumes.

Instead we need to transition to better UID generation strategies.

The UID needs to be unique but also efficient. This rules out the use of string or binary UUIDs. Instead we can use 64 bit integers as the PK columns are already BIGINT in MySql?.

For fact tables the strategy for key generation will be time + random + serverId (as defined in the OWA_SERVER_ID constant).

For dimension tables we need to move to using a partial MD5 or SHA1 hash converted to INT.

Change History

Changed 4 months ago by admin

  • summary changed from Implement proper UUIDs for use as primary keys to Implement proper UIDs for primary keys
  • description modified (diff)
  • milestone changed from 1.6.0 to 1.5.0

Changed 4 months ago by admin

  • keywords 1.5.0-final added

Changed 4 months ago by admin

  • status changed from new to closed
  • resolution set to fixed

Trackers now produce a UID based on time plus a very long random number. The UID is treated as a string in PHP in order to work around the lack of good 64bit int handling on 32 bit platforms. but the UID is stored in MySQL as a valid BIGINT upon insert.

Deterministic UIDs are another story. Here we have moved from using crc32 to a partial SHA1 hash converted back into a 64 bit INT. However, this only works on a 64 bit platform and has no current fall back if you are running 32 bit. Thus the use of this new 64 bit hash requires that you explicitly turn on its use by setting the 'use_64bit_hash' config setting to 'true' in your config file.

**NOTE:** only turn on 64bit hashes if you are setting up a new OWA install as it will create duplicate rows in your existing dimensions tables. In an upcoming release there will be a migration from 32bit hashes to 64bit hashes.

Note: See TracTickets for help on using tickets.