Factless Fact table — not so absurd it may sound at first
We use Fact tables to store our facts, right? So how can a factless table even make remotely sense?
In order to understand this concept of factless fact tables, I want to remind you first that a fact and a fact table are 2 different things. A fact is a measurement, which can be recorded. This fact is stored in a fact table. This enables you to analyse the measurement with its corresponding dimensions to gain the insights you need.
Contrary in a factless fact table, you either store a transaction which has no measurement (e.g. a customer created a new account) into your fact table. Or you want to provide information about a relationship or association, without an actual transaction even occurring.
For both of these cases I want to give you a little demo data model. This should make it easy to understand this concept.
Transaction which has no measurement
We’ll use a super simple data model to showcase this type of a factless fact table. It only has 2 dimensions (customer, date) and the facttable, which collects all transactions about a new registration. For every account created by a customer, this facttable has a row with the customer_key and a date_key.
This fact table enables you to answer questions like “how many customers created an new account on a certain date” or “how many customers with the first name Anna have an account”. Aggregations are simply done by using Count().
If you just can’t live in a world, where a fact table has no facts, there is also the possibility to add a dummy fact, a so called tracking fact. This is just an additional column inside your fact table, that always has the value 1. This enables you to use Sum() instead of Count(), but brings no real further value with it.
No transaction at all — relationships/associations
The second type of factless fact tables records relationships among various dimensions. Even if no transaction occurred at all. Most of the time, the rows inside this fact table also content a ending and start date/time for this particular relation.
To explain this, let’s use the same two dimensions from before, with the addition of a sales person dimension. Inside this dimension, all people working in sales are listed.
Let’s imagine, every customer has his own sales person assigned to him and we want to provide this information within a fact table. The data model for this would look something like this:
Here we can see that the fact table stores the information of the relationship between a customer and a sales person, with corresponding start and end date from this assignement. So why do we need this?
- If a customer would interact with his sales contact, we could use something like the first type of factless fact table.
- If he purchases something, we could even use a classic fact table.
- But if a customer never interacts with his sales contact, we need a construct like this.
Through this model, we are capable of answering questions like: How many customers are assigned to this particular sales person ? How many customers do sales persons have on avarage? How often does the assignment change on average per customer?
Without the advantages gained by this type of fact table, it would be very hard or even impossible to answer some of these questions.
Hope you enjoyed this little showcase and that you could learn something!