Message ID | 1517228795.5965.24.camel@redhat.com |
---|---|
State | Not Applicable, archived |
Headers | show |
Series | [GIT,PULL] inode->i_version rework for v4.16 | expand |
On Mon, Jan 29, 2018 at 4:26 AM, Jeff Layton <jlayton@redhat.com> wrote: > > This pile of patches is a rework of the inode->i_version field. We have > traditionally incremented that field on every inode data or metadata > change. Typically this increment needs to be logged on disk even when > nothing else has changed, which is rather expensive. Hmm. I have pulled this, but it is really really broken in one place, to the degree that I always went "no, I won't pull this garbage". But the breakage is potential, not actual, and can be fixed trivially, so I'll let it slide - but I do require it to be fixed. And I require people to *think* about it. So what's to horribly horribly wrong? The inode_cmp_iversion{+raw}() functions are pure and utter crap. Why? You say that they return 0/negative/positive, but they do so in a completely broken manner. They return that ternary value as the sequence number difference in a 's64', which means that if you actually care about that ternary value, and do the *sane* thing that the kernel-doc of the function implies is the right thing, you would do int cmp = inode_cmp_iversion(inode, old); if (cmp < 0 ... and as a result you get code that looks sane, but that doesn't actually *WORK* right. To make it even worse, it will actually work in practice by accident in 99.99999% of all cases, so now you have (a) subtly buggy code (b) that looks fine (c) and that works in testing which is just about the worst possible case for any code. The interface is simply garbage that encourages bugs. And the bug wouldn't be in the user, the bug would be in this code you just sent me. The interface is simply wrong. So this absolutely needs to be fixed. I see two fixes: - just return a boolean. That's all that any current user actually wants, so the ternary value seems pointless. - make it return an 'int', and not just any int, but -1/0/1. That way there is no worry about uses, and if somebody *really* cares about the ternary value, they can now use a "switch" statement to get it (alternatively, make it return an enum, but whatever). That "ternary" function that has 18446744069414584320 incorrect return values really is unacceptable. Linus
On Mon, Jan 29, 2018 at 1:50 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Hmm. I have pulled this, but it is really really broken in one place, > to the degree that I always went "no, I won't pull this garbage". always=almost. I'd blame auto-correct, but I'm not on the phone. Linus
On Mon, 2018-01-29 at 13:50 -0800, Linus Torvalds wrote: > On Mon, Jan 29, 2018 at 4:26 AM, Jeff Layton <jlayton@redhat.com> wrote: > > > > This pile of patches is a rework of the inode->i_version field. We have > > traditionally incremented that field on every inode data or metadata > > change. Typically this increment needs to be logged on disk even when > > nothing else has changed, which is rather expensive. > > Hmm. I have pulled this, but it is really really broken in one place, > to the degree that I always went "no, I won't pull this garbage". > > But the breakage is potential, not actual, and can be fixed trivially, > so I'll let it slide - but I do require it to be fixed. And I require > people to *think* about it. > > So what's to horribly horribly wrong? > > The inode_cmp_iversion{+raw}() functions are pure and utter crap. > > Why? > > You say that they return 0/negative/positive, but they do so in a > completely broken manner. They return that ternary value as the > sequence number difference in a 's64', which means that if you > actually care about that ternary value, and do the *sane* thing that > the kernel-doc of the function implies is the right thing, you would > do > > int cmp = inode_cmp_iversion(inode, old); > > if (cmp < 0 ... > > and as a result you get code that looks sane, but that doesn't > actually *WORK* right. > My intent here was to have this handle wraparound using the same sort of method that the time_before/time_after macros do. Obviously, I didn't document that well. I want to make sure I understand what's actually broken here thoug. Is it only broken when the two values are more than 2**63 apart, or is there something else more fundamentally wrong here? > To make it even worse, it will actually work in practice by accident > in 99.99999% of all cases, so now you have > > (a) subtly buggy code > (b) that looks fine > (c) and that works in testing > > which is just about the worst possible case for any code. The > interface is simply garbage that encourages bugs. > > And the bug wouldn't be in the user, the bug would be in this code you > just sent me. The interface is simply wrong. > > So this absolutely needs to be fixed. I see two fixes: > > - just return a boolean. That's all that any current user actually > wants, so the ternary value seems pointless. > > - make it return an 'int', and not just any int, but -1/0/1. That way > there is no worry about uses, and if somebody *really* cares about the > ternary value, they can now use a "switch" statement to get it > (alternatively, make it return an enum, but whatever). > > That "ternary" function that has 18446744069414584320 incorrect return > values really is unacceptable. > I think I'll just make it return a boolean value like you suggested first. I'll send a patch to fix it once I've done some basic testing with it. Many thanks, -- Jeff Layton <jlayton@redhat.com>
On Tue, Jan 30, 2018 at 07:05:48AM -0500, Jeff Layton wrote: > > I want to make sure I understand what's actually broken here thoug. Is > it only broken when the two values are more than 2**63 apart, or is > there something else more fundamentally wrong here? The other problem is that returning a s64 (or a u64) is extremely inefficient on a 32-bit platform. So in general, it's better to avoid returning a 64-bit type unless it's really necessary.... - Ted
On Tue, Jan 30, 2018 at 4:05 AM, Jeff Layton <jlayton@redhat.com> wrote: > > My intent here was to have this handle wraparound using the same sort of > method that the time_before/time_after macros do. Obviously, I didn't > document that well. Oh, the intent was clear. The implementation was wrong. Note that "time_before()" returns a _boolean_. So yes, the time comparison functions do a 64-bit subtraction, exactly like yours do. BUT THEY DO NOT RETURN THAT DIFFERENCE. They test the sign in 64 bits and return that boolean single-bit value. > I want to make sure I understand what's actually broken here thoug. Is > it only broken when the two values are more than 2**63 apart, or is > there something else more fundamentally wrong here? There's something fundamentally wrong. The _intent_ is to allow numbers up to 2**63 apart, but if somebody does that int cmp = inode_cmp_iversion(inode, old); if (cmp < 0 ... then it actually only ever tests numbers 2**31 apart, because the high 32 bits will have been thrown away when the 64-bit difference is cast to "int". And what used to be a sign bit (in 64 bits) no longer exists, and the above tests the *new* sign bit that is bit #31, not #63. So you are a factor of four billion off. And while 2**63 is a big number and perfectly ok for a filesystem event difference, a difference of 2**31 is a "this can actually happen". Yes, even 2**31 is still a big difference, and it will take a very unusual situation, and quite a long time to trigger: something that does a thousand filesystem ops per second will not see the problem for 24 days. So you'll never see it in a test. But 24 days happens in real life.. That's why you need to do the comparison against zero inside the actual helper functions like the time comparisons do. Because if you return the 64-bit difference, it will be trivially lost, and the code will _look_ right, but not work right. The fact that you didn't even seem to see the problem in my example calling sequence just proves that point. Linus