there is no such thing as “non-relational” data

in data modeling discussions I often hear the phrase “non-relational data”. it’s usually someone making a case for why data should be in a NoSQL store and live denormalized. the argument is usually that the data itself is somehow inherently non-relational and so it should be put in a non-relational database.

in reality, most complex data have relationships and every conversation about storage systems is fundamentally about how to most effectively store and fetch that data. it’s a lower level implementation detail. so depending on the use cases and constraints, that data may be stored in a relational database or a non-relational database, but the data itself is not inherently “non-relational”.

for instance, lets say you’re storing user analytics data and each user has many devices. there’s a one to many between a user and their devices, but if the devices list is append-only and you’ll never have to mutate any individual device, maybe there’s no need to have a normalized representation between users and devices that requires you to pay processing time for reads which require JOINs.

{
  "_id": ObjectId("5f5a1f5e3dc1264c7ee03d5a"),
  "username": "john_doe",
  "email": "[email protected]",
  "devices": [
    {
      "device_id": "123456789",
      "device_name": "Smartphone",
      "os": "iOS"
    },
    {
      "device_id": "987654321",
      "device_name": "Laptop",
      "os": "Windows"
    },
    {
      "device_id": "567890123",
      "device_name": "Tablet",
      "os": "Android"
    }
  ]
}

still, I don’t think that makes the data non-relational – there’s a one to many relationship. what it does mean that there’s a processing performance trade-off if it’s stored in a normalized form (which you can usually do in a non-relational database nowadays).

Leave a Reply Cancel reply