|
I disagree that NULL is a correct value here, and I disagree that it's normal/good database design to have NULL mean 'not mandatory'. There are other more correct ways to model not mandatory, such as join tables. NULL is specifically undefined, unknown; a hole inside a row with no value. There is very little you can do with it, other than asserting its absence. In fact, the original relational database design didn't even have NULL! But I digress; my general point is that we define this data, and thus we are able to always provide a value for it. I don't see why having the empty string is not acceptable for the domain of this column - we certainly treat NULL in that manner anyway. A good example of where NULL is required is with the barcode column. In this case the empty string and NULL do have different meanings - one means "we don't know" and the other means "we know there is no barcode". We will never be in a scenario when we don't know if an artist has a disambiguation comment or not. 1. Moot point - it is a required field. Okay, we discussed this on irc and sounds like I have misunderstood a nullable column as meaning not mandatory, when in fact it means not known, usually this doesn't make much difference as often both conditions do apply but in this case I see that the disambuguation comment isn't unknown information that we are trying to find out. therefore missses the not known condition. Ok, first inevitable problem has come up By making clients and the master run this at different times, we introduce the problem of both having the last_updated column be out of sync. Lets consider a very simple database with only a single artist row with comment=NULL, last_updated=1. 1. The master runs the upgrade schema script at t=2. The last_updated column is thus updated to 2. I can see 3 solutions: 1. Forgeddaboudit! This problem isn't worth worrying about, thus we leave with this caveat. I think my personal preference is option 3, and we should do this for all tables at the start of migration. Questions: for Nevertheless, I think 3 is probably the best option, and I agree it should be for all tables – in fact, two scripts for disabling/enabling last_updated triggers would seem to me to be a generally valuable introduction to our codebase for future schema changes/debugging/etc. Another important aspect is that 0 = 0 and '' = '' but NULL <> NULL. You can still do that with null, fwiw - you just need 2 indexes. One is UNIQUE INDEX artist (name) WHERE comment IS NULL and the other is UNIQUE INDEX artist (name, comment) WHERE comment IS NOT NULL All merged to schema change branch Yes. It is just stored as an empty string '' instead of NULL. @yindesu: yes, they are. This doesn't actually change anything on the frontend or the semantics of the data (other than places where our code is incorrect...) – it just means the empty join phrase is always stored as '' rather than sometimes as NULL. On the frontend, these two didn't display differently before; this just removes the duplication of '' and NULL being the same, by disallowing NULL. There is one change (not this ticket, but within the same schema change) that affects join phrases, but only in one strange case: when the join phrase at the end of an artist credit (i.e., after the last artist) is entirely whitespace. Once again, this doesn't actually change semantics – we've always considered final join phrases without actual content to be meaningless (they're also usually mistakes, byproducts of the release editor interface being... less than ideal). |
||||||||||||||||||||||||||||||||||
This seems like the wrong way round to me for disambuguation comments, in relational databases its normal to have nullable columns for fields that don't require a value. Would make more sense to enforce a constraint to not allow the value to be set to "", and in the UI if the user delete a comment by setting to "" this should cause the underlying value in the database to be set back to NULL.
There are two advantages to this.
1. The datamodel clearly shows which fields are mandatory i.e release title is mandatory therefore not nullable, disambugation column nor mandatory so nullable..
2. I assume that we only have disambuguation comments set to "" if we have deleted a previously entered comment, so that we must have many more database rows with disambuguation comment set to NULL, then "", so the size of replication packets required to change "" to NULL would be much smaller than changing NULL to "".