Message ID | 4E02F0B8.4080301@rs.jp.nec.com |
---|---|
State | New, archived |
Headers | show |
2011/6/23 Akira Fujita <a-fujita@rs.jp.nec.com>: >> https://bugzilla.kernel.org/show_bug.cgi?id=23732 > > ext4: Fix ext4 timestamps corruption > > Officially, ext4 can handle its timestamps until 2514 > with 32bit entries plus EPOCH_BIT (2bits). > But when timestamps values use 32+ bit > (e.g. 2038-01-19 9:14:08 0x0000000080000000), > we can get corrupted values. > Because sign bit is overwritten by transferring value > between kernel space and user space. > > To fix this issue, 32th bit of extra time fields in ext4_inode structure > (e.g. i_ctime_extra) are used as the sign for 64bit user space. > Because these are used only 20bits for nano-second and bottom of 2bits > are for EXT4_EPOCH_BITS shift. > With this patch, ext4 supports timestamps Y1901-2514. Thanks for looking into this bug. However tv_nsec is in the range 0..999999999 and requires 30 bits. That is why tv_sec was only extended by 2 bits. So there are no additional spare bits in the "extra" field. 34-bit seconds can accommodate a maximum of 544.4 years, e.g. 1970..2514 or 1901..2446. Although an early version of the patch for 34-bit tv_sec in ext4 worked with years 1970..2514, prior to being committed the patch was changed to support pre-1970 timestamps (introducing the sign extension issue in the decoding): http://marc.info/?l=linux-ext4&m=118208541320999 The existing encoding simply encodes bits 31..0 of tv_sec in the regular time field and bits 33..32 in the extra field (along with the 30-bit tv_nsec). The issue is in the decoding, which I think can be addressed by changing only the body of the "if" in in the ext4_decode_extra_time function, to something like this: time->tv_sec += ((__u32)time->tv_sec + ((__u64)le32_to_cpu(extra) << 32) + 0x80000000LL) & 0x300000000LL; This is untested, and might look nicer with some macros, but it should decode the 34 bits into a timestamp in the range -0x80000000 (1901-12-13) to 0x37fffffff (2446-05-10), while retaining compatibility with the existing encoding. 2 msb of adjustment needed to convert extra 32-bit sign-extended 32-bit tv_sec bits time decoded 64-bit tv_sec to decoded 64-bit tv_sec 1 1 1 -0x80000000..-1 0 0 0 0 0x000000000..0x07fffffff 0 0 0 1 0x080000000..0x0ffffffff 0x100000000 0 1 0 0x100000000..0x17fffffff 0x100000000 0 1 1 0x180000000..0x1ffffffff 0x200000000 1 0 0 0x200000000..0x27fffffff 0x200000000 1 0 1 0x280000000..0x2ffffffff 0x300000000 1 1 0 0x300000000..0x37fffffff 0x300000000 - Mark -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2011-06-23, at 1:52 AM, Akira Fujita wrote: > ext4: Fix ext4 timestamps corruption Hi Akira-san, thank you for the patch. For submitting patches, it is easier for Ted if you send the patch in a separate email with a proper subject (e.g. "[PATCH] ext4: Fix ext4 timestamp > 2038 corruption" from instead of containing the reply to an old email. Otherwise Ted has to manually reformat the patch. > Officially, ext4 can handle its timestamps until 2514 with 32bit > entries plus EPOCH_BIT (2bits). But when timestamps values use > 32+ bit (e.g. 2038-01-19 9:14:08 0x0000000080000000), we can get > corrupted values. Because sign bit is overwritten by transferring > value between kernel space and user space. > > To fix this issue, 32th bit of extra time fields in ext4_inode structure > (e.g. i_ctime_extra) are used as the sign for 64bit user space. > Because these are used only 20bits for nano-second and bottom of 2bits This should be "30 bits for nanosecond". > are for EXT4_EPOCH_BITS shift. > With this patch, ext4 supports timestamps Y1901-2514. > > The performance comparison is as follows: > ------------------------------------------------ > | | file create | ls -l | > |--------------|---------------|---------------- > |with patch | 138.2056 sec | 14.9333 sec | > |without patch | 133.4566 sec | 14.9096 sec | > ------------------------------------------------ > file count:1 million (average of 3 trials) > kernel: 3.0.0-rc3 > > There is a slightly difference, but I think it is acceptable. I'm surprised that the difference is even measurable for such a change. > Thanks and Regards, > Akira Fujita > > Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> > --- > fs/ext4/ext4.h | 30 +++++++++++++++++++++--------- > 1 file changed, 21 insertions(+), 9 deletions(-) > > diff -Nrup -X linux-3.0-rc3-a/Documentation/dontdiff linux-3.0-rc3-a/fs/ext4/ext4.h linux-3.0-rc3-b/fs/ext4/ext4.h > --- linux-3.0-rc3-a/fs/ext4/ext4.h 2011-06-14 07:29:59.000000000 +0900 > +++ linux-3.0-rc3-b/fs/ext4/ext4.h 2011-06-23 14:18:47.000000000 +0900 > @@ -645,6 +645,7 @@ struct move_extent { > #define EXT4_EPOCH_BITS 2 > #define EXT4_EPOCH_MASK ((1 << EXT4_EPOCH_BITS) - 1) > #define EXT4_NSEC_MASK (~0UL << EXT4_EPOCH_BITS) > +#define EXT4_TIMESTAMP_SIGN_MASK 0x80000000 > > /* > * Extended fields will fit into an inode if the filesystem was formatted > @@ -665,16 +666,23 @@ struct move_extent { > static inline __le32 ext4_encode_extra_time(struct timespec *time) > { > + return cpu_to_le32((sizeof(time->tv_sec) > 4 ? > + (time->tv_sec >> 32) & > + (EXT4_EPOCH_MASK | EXT4_TIMESTAMP_SIGN_MASK) : > + time->tv_sec & EXT4_TIMESTAMP_SIGN_MASK) | > + ((time->tv_nsec << EXT4_EPOCH_BITS) & EXT4_NSEC_MASK)); > } As mentioned by Mark, this cannot work because the high bit is already used for the most significant bit of the nanoseconds. > static inline void ext4_decode_extra_time(struct timespec *time, __le32 extra) > { > + if (sizeof(time->tv_sec) > 4) { > time->tv_sec |= (__u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK) > << 32; > + if (le32_to_cpu(extra) & EXT4_TIMESTAMP_SIGN_MASK) > + time->tv_sec |= EXT4_NSEC_MASK << 32; > + } > + > + time->tv_nsec = ((le32_to_cpu(extra) & ~EXT4_TIMESTAMP_SIGN_MASK) >> > + EXT4_EPOCH_BITS); > } > > #define EXT4_INODE_SET_XTIME(xtime, inode, raw_inode) \ > @@ -696,19 +704,23 @@ do { \ > > #define EXT4_INODE_GET_XTIME(xtime, inode, raw_inode) \ > do { \ > - (inode)->xtime.tv_sec = (signed)le32_to_cpu((raw_inode)->xtime); \ > - if (EXT4_FITS_IN_INODE(raw_inode, EXT4_I(inode), xtime ## _extra)) \ > + if (EXT4_FITS_IN_INODE(raw_inode, EXT4_I(inode), xtime ## _extra)) { \ > + (inode)->xtime.tv_sec = \ > + (__u32)(le32_to_cpu((raw_inode)->xtime)); \ > ext4_decode_extra_time(&(inode)->xtime, \ > raw_inode->xtime ## _extra); \ > - else \ > + } else { \ > + (inode)->xtime.tv_sec = \ > + (signed)le32_to_cpu((raw_inode)->xtime); \ > (inode)->xtime.tv_nsec = 0; \ > + } \ > } while (0) > > #define EXT4_EINODE_GET_XTIME(xtime, einode, raw_inode) \ > do { \ > if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime)) \ > - (einode)->xtime.tv_sec = \ > - (signed)le32_to_cpu((raw_inode)->xtime); \ > + (einode)->xtime.tv_sec = \ > + (__u32)(le32_to_cpu((raw_inode)->xtime)); \ > if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime ## _extra)) \ > ext4_decode_extra_time(&(einode)->xtime, \ > raw_inode->xtime ## _extra); \ Cheers, Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2011-06-23, at 4:37 PM, Mark Harris wrote: > 2011/6/23 Akira Fujita <a-fujita@rs.jp.nec.com>: >>> https://bugzilla.kernel.org/show_bug.cgi?id=23732 >> >> ext4: Fix ext4 timestamps corruption >> >> Officially, ext4 can handle its timestamps until 2514 >> with 32bit entries plus EPOCH_BIT (2bits). >> But when timestamps values use 32+ bit >> (e.g. 2038-01-19 9:14:08 0x0000000080000000), >> we can get corrupted values. >> Because sign bit is overwritten by transferring value >> between kernel space and user space. >> >> To fix this issue, 32th bit of extra time fields in ext4_inode structure >> (e.g. i_ctime_extra) are used as the sign for 64bit user space. >> Because these are used only 20bits for nano-second and bottom of 2bits >> are for EXT4_EPOCH_BITS shift. >> With this patch, ext4 supports timestamps Y1901-2514. > > Thanks for looking into this bug. However tv_nsec is in the > range 0..999999999 and requires 30 bits. That is why tv_sec was > only extended by 2 bits. So there are no additional spare bits > in the "extra" field. > > 34-bit seconds can accommodate a maximum of 544.4 years, e.g. > 1970..2514 or 1901..2446. Although an early version of the patch > for 34-bit tv_sec in ext4 worked with years 1970..2514, prior to > being committed the patch was changed to support pre-1970 > timestamps (introducing the sign extension issue in the decoding): > http://marc.info/?l=linux-ext4&m=118208541320999 Sigh, it seems so unlikely that anyone even has a valid timestamp on a file with a date before 1970, it makes me wonder if this extra effort is even worthwhile... > The existing encoding simply encodes bits 31..0 of tv_sec in the > regular time field and bits 33..32 in the extra field (along with > the 30-bit tv_nsec). The issue is in the decoding, which I think > can be addressed by changing only the body of the "if" in in the > ext4_decode_extra_time function, to something like this: > > time->tv_sec += ((__u32)time->tv_sec + > ((__u64)le32_to_cpu(extra) << 32) + > 0x80000000LL) & 0x300000000LL; > > This is untested, and might look nicer with some macros, but > it should decode the 34 bits into a timestamp in the range > -0x80000000 (1901-12-13) to 0x37fffffff (2446-05-10), while > retaining compatibility with the existing encoding. > > 2 msb of adjustment needed to convert > extra 32-bit sign-extended 32-bit tv_sec > bits time decoded 64-bit tv_sec to decoded 64-bit tv_sec > 1 1 1 -0x80000000..-1 0 > 0 0 0 0x000000000..0x07fffffff 0 > 0 0 1 0x080000000..0x0ffffffff 0x100000000 > 0 1 0 0x100000000..0x17fffffff 0x100000000 > 0 1 1 0x180000000..0x1ffffffff 0x200000000 > 1 0 0 0x200000000..0x27fffffff 0x200000000 > 1 0 1 0x280000000..0x2ffffffff 0x300000000 > 1 1 0 0x300000000..0x37fffffff 0x300000000 The problem with this encoding is that it requires existing 32-bit timestamps before 1970 to have encoded "11" in the extra epoch bits, which is not the case. Current pre-1970 timestamps would be encoded with "00" there, which would (according to your table) bump them past 2038 incorrectly. What we need is an encoding that preserves the times for extra epoch "00": 2 msb of adjustment needed to convert extra 32-bit sign-extended 32-bit tv_sec bits time decoded 64-bit tv_sec to decoded 64-bit tv_sec 0 0 1 -0x80000000..-1 0 0 0 0 0x000000000..0x07fffffff 0 0 1 1 0x080000000..0x0ffffffff 0x100000000 0 1 0 0x100000000..0x17fffffff 0x100000000 1 0 1 0x180000000..0x1ffffffff 0x200000000 1 0 0 0x200000000..0x27fffffff 0x200000000 1 1 1 0x280000000..0x2ffffffff 0x300000000 1 1 0 0x300000000..0x37fffffff 0x300000000 So, looking at the above desired encoding, it looks like the error in the existing code is that it is doing a boolean operation on decode instead of a mathematical one, and it was incorrectly trying to extend the time to (1ULL<<34). The below should fix this: static inline void ext4_decode_extra_time(struct timespec *time, __le32 extra) { if (unlikely(sizeof(time->tv_sec) > 4 && (extra & cpu_to_le32(EXT4_EPOCH_MASK))) time->tv_sec += (u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK) << 32; time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >> EXT4_EPOCH_BITS; } Cheers, Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jun 24, 2011 at 15:46, Andreas Dilger wrote: > On 2011-06-23, at 4:37 PM, Mark Harris wrote: >> 2011/6/23 Akira Fujita <a-fujita@rs.jp.nec.com>: >>>> https://bugzilla.kernel.org/show_bug.cgi?id=23732 >>> >>> ext4: Fix ext4 timestamps corruption >>> >>> Officially, ext4 can handle its timestamps until 2514 >>> with 32bit entries plus EPOCH_BIT (2bits). >>> But when timestamps values use 32+ bit >>> (e.g. 2038-01-19 9:14:08 0x0000000080000000), >>> we can get corrupted values. >>> Because sign bit is overwritten by transferring value >>> between kernel space and user space. >>> >>> To fix this issue, 32th bit of extra time fields in ext4_inode structure >>> (e.g. i_ctime_extra) are used as the sign for 64bit user space. >>> Because these are used only 20bits for nano-second and bottom of 2bits >>> are for EXT4_EPOCH_BITS shift. >>> With this patch, ext4 supports timestamps Y1901-2514. >> >> Thanks for looking into this bug. However tv_nsec is in the >> range 0..999999999 and requires 30 bits. That is why tv_sec was >> only extended by 2 bits. So there are no additional spare bits >> in the "extra" field. >> >> 34-bit seconds can accommodate a maximum of 544.4 years, e.g. >> 1970..2514 or 1901..2446. Although an early version of the patch >> for 34-bit tv_sec in ext4 worked with years 1970..2514, prior to >> being committed the patch was changed to support pre-1970 >> timestamps (introducing the sign extension issue in the decoding): >> http://marc.info/?l=linux-ext4&m=118208541320999 > > Sigh, it seems so unlikely that anyone even has a valid timestamp > on a file with a date before 1970, it makes me wonder if this extra > effort is even worthwhile... > >> The existing encoding simply encodes bits 31..0 of tv_sec in the >> regular time field and bits 33..32 in the extra field (along with >> the 30-bit tv_nsec). The issue is in the decoding, which I think >> can be addressed by changing only the body of the "if" in in the >> ext4_decode_extra_time function, to something like this: >> >> time->tv_sec += ((__u32)time->tv_sec + >> ((__u64)le32_to_cpu(extra) << 32) + >> 0x80000000LL) & 0x300000000LL; >> >> This is untested, and might look nicer with some macros, but >> it should decode the 34 bits into a timestamp in the range >> -0x80000000 (1901-12-13) to 0x37fffffff (2446-05-10), while >> retaining compatibility with the existing encoding. >> >> 2 msb of adjustment needed to convert >> extra 32-bit sign-extended 32-bit tv_sec >> bits time decoded 64-bit tv_sec to decoded 64-bit tv_sec >> 1 1 1 -0x80000000..-1 0 >> 0 0 0 0x000000000..0x07fffffff 0 >> 0 0 1 0x080000000..0x0ffffffff 0x100000000 >> 0 1 0 0x100000000..0x17fffffff 0x100000000 >> 0 1 1 0x180000000..0x1ffffffff 0x200000000 >> 1 0 0 0x200000000..0x27fffffff 0x200000000 >> 1 0 1 0x280000000..0x2ffffffff 0x300000000 >> 1 1 0 0x300000000..0x37fffffff 0x300000000 > > The problem with this encoding is that it requires existing 32-bit > timestamps before 1970 to have encoded "11" in the extra epoch bits, > which is not the case. Current pre-1970 timestamps would be encoded > with "00" there, which would (according to your table) bump them past > 2038 incorrectly. I was under the impression that the encoding code stored bits 33 & 32 of tv_sec there, which would be 1,1 for a negative value like -1. Certainly the decoding would be simpler if the extra value was only non-zero for large timestamps. On closer inspection of ext4_encode_extra_time, it looks like for tv_sec = -1, a 64-bit kernel will store 1,1 in the extra bits and a 32-bit kernel will store 0,0 in the extra bits. That is a problem if both of these need to be decoded as -1 on a 64-bit system. > > What we need is an encoding that preserves the times for extra epoch "00": > > 2 msb of adjustment needed to convert > extra 32-bit sign-extended 32-bit tv_sec > bits time decoded 64-bit tv_sec to decoded 64-bit tv_sec > 0 0 1 -0x80000000..-1 0 > 0 0 0 0x000000000..0x07fffffff 0 > 0 1 1 0x080000000..0x0ffffffff 0x100000000 > 0 1 0 0x100000000..0x17fffffff 0x100000000 > 1 0 1 0x180000000..0x1ffffffff 0x200000000 > 1 0 0 0x200000000..0x27fffffff 0x200000000 > 1 1 1 0x280000000..0x2ffffffff 0x300000000 > 1 1 0 0x300000000..0x37fffffff 0x300000000 > > So, looking at the above desired encoding, it looks like the error in > the existing code is that it is doing a boolean operation on decode > instead of a mathematical one, and it was incorrectly trying to extend > the time to (1ULL<<34). The below should fix this: > > static inline void ext4_decode_extra_time(struct timespec *time, __le32 extra) > { > if (unlikely(sizeof(time->tv_sec) > 4 && > (extra & cpu_to_le32(EXT4_EPOCH_MASK))) > time->tv_sec += (u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK) << 32; > > time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >> EXT4_EPOCH_BITS; > } That is not compatible with the existing ext4_encode_extra_time. For example, 2038-01-31 (0x80101500) is encoded with extra bits equal to bits 33 & 32, i.e. 0,0. But this code would decode it as 1901-12-25 (i.e. it would leave the sign-extended 32-bit value unchanged). Possible solutions: (a) Define the current 64-bit encoding as the correct encoding since the 2 extra bits are not even decoded on 32-bit kernels, so its encoding doesn't matter much. However, if anyone with existing pre-1970 timestamps written using a 32-bit kernel wants to use their ext4 filesystem with a 64-bit kernel, the pre-1970 timestamps would be wrong unless they re-write them with a fixed kernel. Change ext4_decode_extra_time "if" body to something like: time->tv_sec += ((__u32)time->tv_sec + ((__u64)le32_to_cpu(extra) << 32) + 0x80000000LL) & 0x300000000LL; Change ext4_encode_extra_time ": 0" to something like: time->tv_sec < 0 ? EXT4_EPOCH_MASK : 0 (b) Change the encoding of the extra bits to be those in your new table. This is compatible with the 32-bit kernel encoding (which does not decode these bits) but incompatible with the 64-bit kernel encoding. Existing pre-1970 timestamps written with a 64-bit kernel would be decoded as dates far in the future. Requires your change to ext4_decode_extra_time. Also requires a change to ext4_encode_extra_time, changing (time->tv_sec >> 32) to something like: ((time->tv_sec - (signed int)time->tv_sec) >> 32) (c) If 100% compatibility with existing pre-1970 32-bit timestamps is desired even when switching between 32-bit and 64-bit kernels, both extra=1,1/msb=1 and extra=0,0/msb=1 could be treated as year 1901..1969 timestamps. However this would reduce the maximum 64-bit ext4 timestamp, and would necessarily be incompatible with the existing 64-bit kernel encoding of timestamps > year 2038 (since a current 64-bit kernel encodes a year 2039 timestamp exactly the same as a current 32-bit kernel encodes a year 1902 timestamp). This requires additional complexity in both ext4_decode_extra_time and ext4_encode_extra_time. (d) Declare that ext4 supports only timestamps with year >= 1970. i.e. 1970..2514 (64-bit), 1970..2038 (32-bit). Any existing pre-1970 timestamps would now be interpreted as a year >= 2038 timestamp on 64-bit kernels. It may be possible for users of 32-bit kernels to continue to successfully read and write 1901..1969 timestamps, but this would have to be unsupported. If such a timestamp was read with a 64-bit kernel, or a program like fsck.ext4, the time may be different. If some day, as 2038 approaches, 32-bit time_t is changed to unsigned, ext4 would once again support all 32-bit time_t values. To implement, the decoding can simply drop all casts to (signed). Optionally, the encoding could encode any negative tv_sec as 0 to make 32-bit and 64-bit behavior for pre-1970 timestamps consistent (bugzilla 5079/8643). However this may break some uses of pre-1970 timestamps that would otherwise work on 32-bit kernels. - Mark > > Cheers, Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2011-06-24, at 11:12 PM, Mark Harris wrote: > On Fri, Jun 24, 2011 at 15:46, Andreas Dilger wrote: >> The problem with this encoding is that it requires existing 32-bit >> timestamps before 1970 to have encoded "11" in the extra epoch bits, >> which is not the case. Current pre-1970 timestamps would be encoded >> with "00" there, which would (according to your table) bump them past >> 2038 incorrectly. > > I was under the impression that the encoding code stored bits > 33 & 32 of tv_sec there, which would be 1,1 for a negative value > like -1. Certainly the decoding would be simpler if the extra > value was only non-zero for large timestamps. One problem with a symmetrical encoding is that it wastes half of the dynamic range for values that nobody will ever use. Even values before 1970 seem so unlikely that I question whether we should support them at all. > On closer inspection of ext4_encode_extra_time, it looks like for > tv_sec = -1, a 64-bit kernel will store 1,1 in the extra bits and > a 32-bit kernel will store 0,0 in the extra bits. That is a problem > if both of these need to be decoded as -1 on a 64-bit system. That is definitely a problem. >> What we need is an encoding that preserves the times for extra epoch "00": >> >> 2 msb of adjustment needed to convert >> extra 32-bit sign-extended 32-bit tv_sec >> bits time decoded 64-bit tv_sec to decoded 64-bit tv_sec >> 0 0 1 -0x80000000..-1 0 >> 0 0 0 0x000000000..0x07fffffff 0 >> 0 1 1 0x080000000..0x0ffffffff 0x100000000 >> 0 1 0 0x100000000..0x17fffffff 0x100000000 >> 1 0 1 0x180000000..0x1ffffffff 0x200000000 >> 1 0 0 0x200000000..0x27fffffff 0x200000000 >> 1 1 1 0x280000000..0x2ffffffff 0x300000000 >> 1 1 0 0x300000000..0x37fffffff 0x300000000 >> >> So, looking at the above desired encoding, it looks like the error in >> the existing code is that it is doing a boolean operation on decode >> instead of a mathematical one, and it was incorrectly trying to extend >> the time to (1ULL<<34). The below should fix this: >> >> static inline void ext4_decode_extra_time(struct timespec *time, __le32 extra) >> { >> if (unlikely(sizeof(time->tv_sec) > 4 && >> (extra & cpu_to_le32(EXT4_EPOCH_MASK))) >> time->tv_sec += (u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK) << 32; >> >> time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >> EXT4_EPOCH_BITS; >> } > > That is not compatible with the existing ext4_encode_extra_time. > For example, 2038-01-31 (0x80101500) is encoded with extra bits > equal to bits 33 & 32, i.e. 0,0. But this code would decode it > as 1901-12-25 (i.e. it would leave the sign-extended 32-bit value > unchanged). Part of the problem is that the encoding/decoding of timestamps beyond 2038 is already broken today, so I don't think anyone has been using them so far. This gives us some leeway for fixing this problem I think. > Possible solutions: > > (a) Define the current 64-bit encoding as the correct encoding since > the 2 extra bits are not even decoded on 32-bit kernels, so its > encoding doesn't matter much. However, if anyone with existing > pre-1970 timestamps written using a 32-bit kernel wants to use > their ext4 filesystem with a 64-bit kernel, the pre-1970 > timestamps would be wrong unless they re-write them with a > fixed kernel. > > Change ext4_decode_extra_time "if" body to something like: > time->tv_sec += ((__u32)time->tv_sec + > ((__u64)le32_to_cpu(extra) << 32) + > 0x80000000LL) & 0x300000000LL; > > Change ext4_encode_extra_time ": 0" to something like: > time->tv_sec < 0 ? EXT4_EPOCH_MASK : 0 The real-world problem isn't with 32-bit systems, where it doesn't really matter at all how time is encoded, nor with files on 64-bit systems with timestamps 26 years in the future, but rather 256-byte inodes that were previously written with ext3 that will break if they are mounted with ext4 on a 64-bit system. > (b) Change the encoding of the extra bits to be those in your new > table. This is compatible with the 32-bit kernel encoding > (which does not decode these bits) but incompatible with the > 64-bit kernel encoding. Existing pre-1970 timestamps written > with a 64-bit kernel would be decoded as dates far in the future. > > Requires your change to ext4_decode_extra_time. > Also requires a change to ext4_encode_extra_time, changing > (time->tv_sec >> 32) to something like: > ((time->tv_sec - (signed int)time->tv_sec) >> 32) I think this is a reasonable solution, but I dislike that it breaks pre-1970 timestamps on 64-bit systems. > (c) If 100% compatibility with existing pre-1970 32-bit timestamps > is desired even when switching between 32-bit and 64-bit kernels, > both extra=1,1/msb=1 and extra=0,0/msb=1 could be treated as year > 1901..1969 timestamps. However this would reduce the maximum > 64-bit ext4 timestamp, and would necessarily be incompatible with > the existing 64-bit kernel encoding of timestamps > year 2038 > (since a current 64-bit kernel encodes a year 2039 timestamp > exactly the same as a current 32-bit kernel encodes a year 1902 > timestamp). > > This requires additional complexity in both ext4_decode_extra_time > and ext4_encode_extra_time. This would also be a good option for the short term, and then have e2fsck fix up the "11" encoded pre-1970 times to use "00", or just allow both to work. This cuts 136 years off the range (2310-04-04..2446-05-10) but to be honest I don't think that will matter very much. > (d) Declare that ext4 supports only timestamps with year >= 1970. > i.e. 1970..2514 (64-bit), 1970..2038 (32-bit). > Any existing pre-1970 timestamps would now be interpreted as a > year >= 2038 timestamp on 64-bit kernels. > > It may be possible for users of 32-bit kernels to continue to > successfully read and write 1901..1969 timestamps, but this > would have to be unsupported. If such a timestamp was read with > a 64-bit kernel, or a program like fsck.ext4, the time may be > different. I'm not so fond of this solution either. > If some day, as 2038 approaches, 32-bit time_t is changed to > unsigned, ext4 would once again support all 32-bit time_t > values. > > To implement, the decoding can simply drop all casts to (signed). > Optionally, the encoding could encode any negative tv_sec as 0 > to make 32-bit and 64-bit behavior for pre-1970 timestamps > consistent (bugzilla 5079/8643). However this may break some > uses of pre-1970 timestamps that would otherwise work on > 32-bit kernels. Hopefully Ted can chime in on this as well. Cheers, Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, Andreas and Mark, Thank you for looking at this issue. (2011/06/27 18:04), Andreas Dilger wrote: > On 2011-06-24, at 11:12 PM, Mark Harris wrote: >> On Fri, Jun 24, 2011 at 15:46, Andreas Dilger wrote: >>> The problem with this encoding is that it requires existing 32-bit >>> timestamps before 1970 to have encoded "11" in the extra epoch bits, >>> which is not the case. Current pre-1970 timestamps would be encoded >>> with "00" there, which would (according to your table) bump them past >>> 2038 incorrectly. >> >> I was under the impression that the encoding code stored bits >> 33& 32 of tv_sec there, which would be 1,1 for a negative value >> like -1. Certainly the decoding would be simpler if the extra >> value was only non-zero for large timestamps. > > One problem with a symmetrical encoding is that it wastes half of the > dynamic range for values that nobody will ever use. Even values before > 1970 seem so unlikely that I question whether we should support them > at all. > >> On closer inspection of ext4_encode_extra_time, it looks like for >> tv_sec = -1, a 64-bit kernel will store 1,1 in the extra bits and >> a 32-bit kernel will store 0,0 in the extra bits. That is a problem >> if both of these need to be decoded as -1 on a 64-bit system. > > That is definitely a problem. > >>> What we need is an encoding that preserves the times for extra epoch "00": >>> >>> 2 msb of adjustment needed to convert >>> extra 32-bit sign-extended 32-bit tv_sec >>> bits time decoded 64-bit tv_sec to decoded 64-bit tv_sec >>> 0 0 1 -0x80000000..-1 0 >>> 0 0 0 0x000000000..0x07fffffff 0 >>> 0 1 1 0x080000000..0x0ffffffff 0x100000000 >>> 0 1 0 0x100000000..0x17fffffff 0x100000000 >>> 1 0 1 0x180000000..0x1ffffffff 0x200000000 >>> 1 0 0 0x200000000..0x27fffffff 0x200000000 >>> 1 1 1 0x280000000..0x2ffffffff 0x300000000 >>> 1 1 0 0x300000000..0x37fffffff 0x300000000 >>> >>> So, looking at the above desired encoding, it looks like the error in >>> the existing code is that it is doing a boolean operation on decode >>> instead of a mathematical one, and it was incorrectly trying to extend >>> the time to (1ULL<<34). The below should fix this: >>> >>> static inline void ext4_decode_extra_time(struct timespec *time, __le32 extra) >>> { >>> if (unlikely(sizeof(time->tv_sec)> 4&& >>> (extra& cpu_to_le32(EXT4_EPOCH_MASK))) >>> time->tv_sec += (u64)(le32_to_cpu(extra)& EXT4_EPOCH_MASK)<< 32; >>> >>> time->tv_nsec = (le32_to_cpu(extra)& EXT4_NSEC_MASK)>> EXT4_EPOCH_BITS; >>> } >> >> That is not compatible with the existing ext4_encode_extra_time. >> For example, 2038-01-31 (0x80101500) is encoded with extra bits >> equal to bits 33& 32, i.e. 0,0. But this code would decode it >> as 1901-12-25 (i.e. it would leave the sign-extended 32-bit value >> unchanged). > > Part of the problem is that the encoding/decoding of timestamps beyond > 2038 is already broken today, so I don't think anyone has been using > them so far. This gives us some leeway for fixing this problem I think. > >> Possible solutions: >> >> (a) Define the current 64-bit encoding as the correct encoding since >> the 2 extra bits are not even decoded on 32-bit kernels, so its >> encoding doesn't matter much. However, if anyone with existing >> pre-1970 timestamps written using a 32-bit kernel wants to use >> their ext4 filesystem with a 64-bit kernel, the pre-1970 >> timestamps would be wrong unless they re-write them with a >> fixed kernel. >> >> Change ext4_decode_extra_time "if" body to something like: >> time->tv_sec += ((__u32)time->tv_sec + >> ((__u64)le32_to_cpu(extra)<< 32) + >> 0x80000000LL)& 0x300000000LL; >> >> Change ext4_encode_extra_time ": 0" to something like: >> time->tv_sec< 0 ? EXT4_EPOCH_MASK : 0 > > The real-world problem isn't with 32-bit systems, where it doesn't > really matter at all how time is encoded, nor with files on 64-bit systems > with timestamps 26 years in the future, but rather 256-byte inodes that > were previously written with ext3 that will break if they are mounted > with ext4 on a 64-bit system. > >> (b) Change the encoding of the extra bits to be those in your new >> table. This is compatible with the 32-bit kernel encoding >> (which does not decode these bits) but incompatible with the >> 64-bit kernel encoding. Existing pre-1970 timestamps written >> with a 64-bit kernel would be decoded as dates far in the future. >> >> Requires your change to ext4_decode_extra_time. >> Also requires a change to ext4_encode_extra_time, changing >> (time->tv_sec>> 32) to something like: >> ((time->tv_sec - (signed int)time->tv_sec)>> 32) > > I think this is a reasonable solution, but I dislike that it breaks > pre-1970 timestamps on 64-bit systems. I agree with this solution. I guess that no one has pre-1970 timestamps on ext4, actually. Mark, are you working on this right now? If you have a patch to fix this issue, please send it to the list. Regards, Akira Fujita -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Akira Fujita wrote: > Hi, Andreas and Mark, > Thank you for looking at this issue. > > (2011/06/27 18:04), Andreas Dilger wrote: >> On 2011-06-24, at 11:12 PM, Mark Harris wrote: >>> On Fri, Jun 24, 2011 at 15:46, Andreas Dilger wrote: >>>> The problem with this encoding is that it requires existing 32-bit >>>> timestamps before 1970 to have encoded "11" in the extra epoch bits, >>>> which is not the case. Current pre-1970 timestamps would be encoded >>>> with "00" there, which would (according to your table) bump them past >>>> 2038 incorrectly. >>> >>> I was under the impression that the encoding code stored bits >>> 33& 32 of tv_sec there, which would be 1,1 for a negative value >>> like -1. Certainly the decoding would be simpler if the extra >>> value was only non-zero for large timestamps. >> >> One problem with a symmetrical encoding is that it wastes half of the >> dynamic range for values that nobody will ever use. Even values before >> 1970 seem so unlikely that I question whether we should support them >> at all. >> >>> On closer inspection of ext4_encode_extra_time, it looks like for >>> tv_sec = -1, a 64-bit kernel will store 1,1 in the extra bits and >>> a 32-bit kernel will store 0,0 in the extra bits. That is a problem >>> if both of these need to be decoded as -1 on a 64-bit system. >> >> That is definitely a problem. >> >>>> What we need is an encoding that preserves the times for extra epoch "00": >>>> >>>> 2 msb of adjustment needed to convert >>>> extra 32-bit sign-extended 32-bit tv_sec >>>> bits time decoded 64-bit tv_sec to decoded 64-bit tv_sec >>>> 0 0 1 -0x80000000..-1 0 >>>> 0 0 0 0x000000000..0x07fffffff 0 >>>> 0 1 1 0x080000000..0x0ffffffff 0x100000000 >>>> 0 1 0 0x100000000..0x17fffffff 0x100000000 >>>> 1 0 1 0x180000000..0x1ffffffff 0x200000000 >>>> 1 0 0 0x200000000..0x27fffffff 0x200000000 >>>> 1 1 1 0x280000000..0x2ffffffff 0x300000000 >>>> 1 1 0 0x300000000..0x37fffffff 0x300000000 >>>> >>>> So, looking at the above desired encoding, it looks like the error in >>>> the existing code is that it is doing a boolean operation on decode >>>> instead of a mathematical one, and it was incorrectly trying to extend >>>> the time to (1ULL<<34). The below should fix this: >>>> >>>> static inline void ext4_decode_extra_time(struct timespec *time, __le32 extra) >>>> { >>>> if (unlikely(sizeof(time->tv_sec)> 4&& >>>> (extra& cpu_to_le32(EXT4_EPOCH_MASK))) >>>> time->tv_sec += (u64)(le32_to_cpu(extra)& EXT4_EPOCH_MASK)<< 32; >>>> >>>> time->tv_nsec = (le32_to_cpu(extra)& EXT4_NSEC_MASK)>> EXT4_EPOCH_BITS; >>>> } >>> >>> That is not compatible with the existing ext4_encode_extra_time. >>> For example, 2038-01-31 (0x80101500) is encoded with extra bits >>> equal to bits 33& 32, i.e. 0,0. But this code would decode it >>> as 1901-12-25 (i.e. it would leave the sign-extended 32-bit value >>> unchanged). >> >> Part of the problem is that the encoding/decoding of timestamps beyond >> 2038 is already broken today, so I don't think anyone has been using >> them so far. This gives us some leeway for fixing this problem I think. >> >>> Possible solutions: >>> >>> (a) Define the current 64-bit encoding as the correct encoding since >>> the 2 extra bits are not even decoded on 32-bit kernels, so its >>> encoding doesn't matter much. However, if anyone with existing >>> pre-1970 timestamps written using a 32-bit kernel wants to use >>> their ext4 filesystem with a 64-bit kernel, the pre-1970 >>> timestamps would be wrong unless they re-write them with a >>> fixed kernel. >>> >>> Change ext4_decode_extra_time "if" body to something like: >>> time->tv_sec += ((__u32)time->tv_sec + >>> ((__u64)le32_to_cpu(extra)<< 32) + >>> 0x80000000LL)& 0x300000000LL; >>> >>> Change ext4_encode_extra_time ": 0" to something like: >>> time->tv_sec< 0 ? EXT4_EPOCH_MASK : 0 >> >> The real-world problem isn't with 32-bit systems, where it doesn't >> really matter at all how time is encoded, nor with files on 64-bit systems >> with timestamps 26 years in the future, but rather 256-byte inodes that >> were previously written with ext3 that will break if they are mounted >> with ext4 on a 64-bit system. >> >>> (b) Change the encoding of the extra bits to be those in your new >>> table. This is compatible with the 32-bit kernel encoding >>> (which does not decode these bits) but incompatible with the >>> 64-bit kernel encoding. Existing pre-1970 timestamps written >>> with a 64-bit kernel would be decoded as dates far in the future. >>> >>> Requires your change to ext4_decode_extra_time. >>> Also requires a change to ext4_encode_extra_time, changing >>> (time->tv_sec>> 32) to something like: >>> ((time->tv_sec - (signed int)time->tv_sec)>> 32) >> >> I think this is a reasonable solution, but I dislike that it breaks >> pre-1970 timestamps on 64-bit systems. > > I agree with this solution. > I guess that no one has pre-1970 timestamps on ext4, actually. > > Mark, are you working on this right now? > If you have a patch to fix this issue, please send it to the list. I think that all of the code changes needed to implement this solution (b) are in ext4_decode_extra_time and ext4_encode_extra_time and are included above, if you want to submit them in patch format as a new version of your patch. The issue with this solution is that it breaks existing 1901..1969 timestamps written with a 64-bit kernel to ext4 with 256-byte inodes. (It breaks existing year 2038+ timestamps as well, but those are already broken.) It sounds like Andreas favors either (b) or (c) but would like to hear from Ted. - Mark > > Regards, > Akira Fujita > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff -Nrup -X linux-3.0-rc3-a/Documentation/dontdiff linux-3.0-rc3-a/fs/ext4/ext4.h linux-3.0-rc3-b/fs/ext4/ext4.h --- linux-3.0-rc3-a/fs/ext4/ext4.h 2011-06-14 07:29:59.000000000 +0900 +++ linux-3.0-rc3-b/fs/ext4/ext4.h 2011-06-23 14:18:47.000000000 +0900 @@ -645,6 +645,7 @@ struct move_extent { #define EXT4_EPOCH_BITS 2 #define EXT4_EPOCH_MASK ((1 << EXT4_EPOCH_BITS) - 1) #define EXT4_NSEC_MASK (~0UL << EXT4_EPOCH_BITS) +#define EXT4_TIMESTAMP_SIGN_MASK 0x80000000 /* * Extended fields will fit into an inode if the filesystem was formatted @@ -665,16 +666,23 @@ struct move_extent { static inline __le32 ext4_encode_extra_time(struct timespec *time) { return cpu_to_le32((sizeof(time->tv_sec) > 4 ? - (time->tv_sec >> 32) & EXT4_EPOCH_MASK : 0) | - ((time->tv_nsec << EXT4_EPOCH_BITS) & EXT4_NSEC_MASK)); + (time->tv_sec >> 32) & + (EXT4_EPOCH_MASK | EXT4_TIMESTAMP_SIGN_MASK) : + time->tv_sec & EXT4_TIMESTAMP_SIGN_MASK) | + ((time->tv_nsec << EXT4_EPOCH_BITS) & EXT4_NSEC_MASK)); } static inline void ext4_decode_extra_time(struct timespec *time, __le32 extra) { - if (sizeof(time->tv_sec) > 4) + if (sizeof(time->tv_sec) > 4) { time->tv_sec |= (__u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK) << 32; - time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >> EXT4_EPOCH_BITS; + if (le32_to_cpu(extra) & EXT4_TIMESTAMP_SIGN_MASK) + time->tv_sec |= EXT4_NSEC_MASK << 32; + } + + time->tv_nsec = ((le32_to_cpu(extra) & ~EXT4_TIMESTAMP_SIGN_MASK) >> + EXT4_EPOCH_BITS); } #define EXT4_INODE_SET_XTIME(xtime, inode, raw_inode) \ @@ -696,19 +704,23 @@ do { \ #define EXT4_INODE_GET_XTIME(xtime, inode, raw_inode) \ do { \ - (inode)->xtime.tv_sec = (signed)le32_to_cpu((raw_inode)->xtime); \ - if (EXT4_FITS_IN_INODE(raw_inode, EXT4_I(inode), xtime ## _extra)) \ + if (EXT4_FITS_IN_INODE(raw_inode, EXT4_I(inode), xtime ## _extra)) { \ + (inode)->xtime.tv_sec = \ + (__u32)(le32_to_cpu((raw_inode)->xtime)); \ ext4_decode_extra_time(&(inode)->xtime, \ raw_inode->xtime ## _extra); \ - else \ + } else { \ + (inode)->xtime.tv_sec = \ + (signed)le32_to_cpu((raw_inode)->xtime); \ (inode)->xtime.tv_nsec = 0; \ + } \ } while (0) #define EXT4_EINODE_GET_XTIME(xtime, einode, raw_inode) \ do { \ if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime)) \ - (einode)->xtime.tv_sec = \ - (signed)le32_to_cpu((raw_inode)->xtime); \ + (einode)->xtime.tv_sec = \ + (__u32)(le32_to_cpu((raw_inode)->xtime)); \ if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime ## _extra)) \ ext4_decode_extra_time(&(einode)->xtime, \ raw_inode->xtime ## _extra); \