mbox series

[0/2] Introduce dns_interval procfs setting

Message ID 20220608215444.1216-1-ematsumiya@suse.de
Headers show
Series Introduce dns_interval procfs setting | expand

Message

Enzo Matsumiya June 8, 2022, 9:54 p.m. UTC
Hi,

These 2 patches are a simple way to fix the DNS issue that
currently exists in cifs, where the upcall to key.dns_resolver will
always return 0 for the record TTL, hence, making the resolve worker
always use the default value SMB_DNS_RESOLVE_INTERVAL_DEFAULT
(currently 600 seconds).

This also makes the new setting `dns_interval' user-configurable via
procfs (/proc/fs/cifs/dns_interval).

One disadvantage here is that the interval is applied to all hosts
resolution. This is still how it works today, because we're always using
the default value anyway, but should someday this be fixed, then we can
go back to rely on the keys infrastructure to cache each hostname with
its own separate TTL.

Please review and test. All feedback is welcome.


Cheers

Enzo

Enzo Matsumiya (2):
  cifs: create procfs for dns_interval setting
  cifs: reschedule DNS resolve worker based on dns_interval

 fs/cifs/cifs_debug.c  | 63 +++++++++++++++++++++++++++++++++++++++++++
 fs/cifs/cifs_debug.h  |  2 ++
 fs/cifs/cifsfs.c      |  1 +
 fs/cifs/connect.c     |  4 +--
 fs/cifs/dns_resolve.c |  8 ++++++
 5 files changed, 76 insertions(+), 2 deletions(-)

Comments

ronnie sahlberg June 9, 2022, 12:39 a.m. UTC | #1
Looks good. Reviewed-by for both patches.

On Thu, 9 Jun 2022 at 07:55, Enzo Matsumiya <ematsumiya@suse.de> wrote:
>
> Hi,
>
> These 2 patches are a simple way to fix the DNS issue that
> currently exists in cifs, where the upcall to key.dns_resolver will
> always return 0 for the record TTL, hence, making the resolve worker
> always use the default value SMB_DNS_RESOLVE_INTERVAL_DEFAULT
> (currently 600 seconds).
>
> This also makes the new setting `dns_interval' user-configurable via
> procfs (/proc/fs/cifs/dns_interval).
>
> One disadvantage here is that the interval is applied to all hosts
> resolution. This is still how it works today, because we're always using
> the default value anyway, but should someday this be fixed, then we can
> go back to rely on the keys infrastructure to cache each hostname with
> its own separate TTL.
>
> Please review and test. All feedback is welcome.
>
>
> Cheers
>
> Enzo
>
> Enzo Matsumiya (2):
>   cifs: create procfs for dns_interval setting
>   cifs: reschedule DNS resolve worker based on dns_interval
>
>  fs/cifs/cifs_debug.c  | 63 +++++++++++++++++++++++++++++++++++++++++++
>  fs/cifs/cifs_debug.h  |  2 ++
>  fs/cifs/cifsfs.c      |  1 +
>  fs/cifs/connect.c     |  4 +--
>  fs/cifs/dns_resolve.c |  8 ++++++
>  5 files changed, 76 insertions(+), 2 deletions(-)
>
> --
> 2.36.1
>
Tom Talpey June 9, 2022, 2:17 p.m. UTC | #2
On 6/8/2022 5:54 PM, Enzo Matsumiya wrote:
> Hi,
> 
> These 2 patches are a simple way to fix the DNS issue that
> currently exists in cifs, where the upcall to key.dns_resolver will
> always return 0 for the record TTL, hence, making the resolve worker
> always use the default value SMB_DNS_RESOLVE_INTERVAL_DEFAULT
> (currently 600 seconds).
> 
> This also makes the new setting `dns_interval' user-configurable via
> procfs (/proc/fs/cifs/dns_interval).
> 
> One disadvantage here is that the interval is applied to all hosts
> resolution. This is still how it works today, because we're always using
> the default value anyway, but should someday this be fixed, then we can
> go back to rely on the keys infrastructure to cache each hostname with
> its own separate TTL.

Curious, why did you choose a procfs global approach? Wouldn't it be
more appropriate to make this a mount option, so it would be applied
selectively per-server?

Tom.

> Please review and test. All feedback is welcome.
> 
> 
> Cheers
> 
> Enzo
> 
> Enzo Matsumiya (2):
>    cifs: create procfs for dns_interval setting
>    cifs: reschedule DNS resolve worker based on dns_interval
> 
>   fs/cifs/cifs_debug.c  | 63 +++++++++++++++++++++++++++++++++++++++++++
>   fs/cifs/cifs_debug.h  |  2 ++
>   fs/cifs/cifsfs.c      |  1 +
>   fs/cifs/connect.c     |  4 +--
>   fs/cifs/dns_resolve.c |  8 ++++++
>   5 files changed, 76 insertions(+), 2 deletions(-)
>
Paulo Alcantara June 9, 2022, 2:53 p.m. UTC | #3
Hi Enzo,

Enzo Matsumiya <ematsumiya@suse.de> writes:

> These 2 patches are a simple way to fix the DNS issue that
> currently exists in cifs, where the upcall to key.dns_resolver will
> always return 0 for the record TTL, hence, making the resolve worker
> always use the default value SMB_DNS_RESOLVE_INTERVAL_DEFAULT
> (currently 600 seconds).
>
> This also makes the new setting `dns_interval' user-configurable via
> procfs (/proc/fs/cifs/dns_interval).

How is the user supposed to know which TTL value will be used?  If the
expire time is set by either key.dns_resolver or any other program used
for dns_resolver key, the above setting would no longer work and users
might get confused.
Enzo Matsumiya June 9, 2022, 3:03 p.m. UTC | #4
On 06/09, Tom Talpey wrote:
>On 6/8/2022 5:54 PM, Enzo Matsumiya wrote:
>>Hi,
>>
>>These 2 patches are a simple way to fix the DNS issue that
>>currently exists in cifs, where the upcall to key.dns_resolver will
>>always return 0 for the record TTL, hence, making the resolve worker
>>always use the default value SMB_DNS_RESOLVE_INTERVAL_DEFAULT
>>(currently 600 seconds).
>>
>>This also makes the new setting `dns_interval' user-configurable via
>>procfs (/proc/fs/cifs/dns_interval).
>>
>>One disadvantage here is that the interval is applied to all hosts
>>resolution. This is still how it works today, because we're always using
>>the default value anyway, but should someday this be fixed, then we can
>>go back to rely on the keys infrastructure to cache each hostname with
>>its own separate TTL.
>
>Curious, why did you choose a procfs global approach? Wouldn't it be
>more appropriate to make this a mount option, so it would be applied
>selectively per-server?

I tried to stick to the current behaviour, while also fixing the issue
of getting a TTL=0 from key.dns_resolver upcall.

A mount option looks indeed better initially, and that was also my
initial approach to this. But in a, e.g. multi-domain forest, with
multiple DFS targets, the per-mount setting starts to look irrelevant
again as well. So I took the "per-client" approach instead.


Cheers,

Enzo
Enzo Matsumiya June 9, 2022, 3:14 p.m. UTC | #5
Hi Paulo,

On 06/09, Paulo Alcantara wrote:
>Hi Enzo,
>
>Enzo Matsumiya <ematsumiya@suse.de> writes:
>
>> These 2 patches are a simple way to fix the DNS issue that
>> currently exists in cifs, where the upcall to key.dns_resolver will
>> always return 0 for the record TTL, hence, making the resolve worker
>> always use the default value SMB_DNS_RESOLVE_INTERVAL_DEFAULT
>> (currently 600 seconds).
>>
>> This also makes the new setting `dns_interval' user-configurable via
>> procfs (/proc/fs/cifs/dns_interval).
>
>How is the user supposed to know which TTL value will be used?  If the
>expire time is set by either key.dns_resolver or any other program used
>for dns_resolver key, the above setting would no longer work and users
>might get confused.

The initial value is the same as before (SMB_DNS_RESOLVE_INTERVAL_DEFAULT, 600s).

The user don't need to know a value to be used, unless they know the
value the server uses (either manually set by themselves or by some
external script) and then they can use that same value for this new
dns_interval setting.

A very simple example on how one could do it, if there's no desire to
use the default 600s:

# TTL=$(dig +noall +answer my.server.domain | awk '{ print $2 }')
# echo $TTL > /proc/fs/cifs/dns_interval


Cheers,

Enzo
Paulo Alcantara June 9, 2022, 3:21 p.m. UTC | #6
Enzo Matsumiya <ematsumiya@suse.de> writes:

> The initial value is the same as before (SMB_DNS_RESOLVE_INTERVAL_DEFAULT, 600s).
>
> The user don't need to know a value to be used, unless they know the
> value the server uses (either manually set by themselves or by some
> external script) and then they can use that same value for this new
> dns_interval setting.
>
> A very simple example on how one could do it, if there's no desire to
> use the default 600s:
>
> # TTL=$(dig +noall +answer my.server.domain | awk '{ print $2 }')
> # echo $TTL > /proc/fs/cifs/dns_interval

That's not what I meant.  Sorry if I was unclear.

If the upcalled program sets the record TTL (key's expire time), then
any values set in /proc/fs/cifs/dns_interval will not work.  The user
might expect it to work, though.
Tom Talpey June 9, 2022, 3:24 p.m. UTC | #7
On 6/9/2022 11:03 AM, Enzo Matsumiya wrote:
> On 06/09, Tom Talpey wrote:
>> On 6/8/2022 5:54 PM, Enzo Matsumiya wrote:
>>> Hi,
>>>
>>> These 2 patches are a simple way to fix the DNS issue that
>>> currently exists in cifs, where the upcall to key.dns_resolver will
>>> always return 0 for the record TTL, hence, making the resolve worker
>>> always use the default value SMB_DNS_RESOLVE_INTERVAL_DEFAULT
>>> (currently 600 seconds).
>>>
>>> This also makes the new setting `dns_interval' user-configurable via
>>> procfs (/proc/fs/cifs/dns_interval).
>>>
>>> One disadvantage here is that the interval is applied to all hosts
>>> resolution. This is still how it works today, because we're always using
>>> the default value anyway, but should someday this be fixed, then we can
>>> go back to rely on the keys infrastructure to cache each hostname with
>>> its own separate TTL.
>>
>> Curious, why did you choose a procfs global approach? Wouldn't it be
>> more appropriate to make this a mount option, so it would be applied
>> selectively per-server?
> 
> I tried to stick to the current behaviour, while also fixing the issue
> of getting a TTL=0 from key.dns_resolver upcall.
> 
> A mount option looks indeed better initially, and that was also my
> initial approach to this. But in a, e.g. multi-domain forest, with
> multiple DFS targets, the per-mount setting starts to look irrelevant
> again as well. So I took the "per-client" approach instead.

Ok, I guess. It seems like kicking the can down the road, a little.
I agree that rearchitecting the DNS cache might be extreme, but many
distros consider procfs to be a user API, and they'll require it to
be supported basically forever. That would be unfortunate...

Tom.
Enzo Matsumiya June 9, 2022, 3:30 p.m. UTC | #8
On 06/09, Paulo Alcantara wrote:
>Enzo Matsumiya <ematsumiya@suse.de> writes:
>
>> The initial value is the same as before (SMB_DNS_RESOLVE_INTERVAL_DEFAULT, 600s).
>>
>> The user don't need to know a value to be used, unless they know the
>> value the server uses (either manually set by themselves or by some
>> external script) and then they can use that same value for this new
>> dns_interval setting.
>>
>> A very simple example on how one could do it, if there's no desire to
>> use the default 600s:
>>
>> # TTL=$(dig +noall +answer my.server.domain | awk '{ print $2 }')
>> # echo $TTL > /proc/fs/cifs/dns_interval
>
>That's not what I meant.  Sorry if I was unclear.
>
>If the upcalled program sets the record TTL (key's expire time), then
>any values set in /proc/fs/cifs/dns_interval will not work.  The user
>might expect it to work, though.

Exactly.

Currently, key.dns_resolver uses getaddrinfo() to resolve names, which
doesn't contain the TTL for the record, hence *always* returns 0 to cifs.ko.
This patch is just a way to provide some flexibility to the user, in
case they don't want to use the currently-always-fixed 600s.

As we discussed, this is supposed to be fixed in key.dns_resolver, but
my proposed patch for that never got ack'd, or even replied to. I might
try again at some point.


Cheers,

Enzo
Paulo Alcantara June 9, 2022, 3:49 p.m. UTC | #9
Enzo Matsumiya <ematsumiya@suse.de> writes:

> Currently, key.dns_resolver uses getaddrinfo() to resolve names, which
> doesn't contain the TTL for the record, hence *always* returns 0 to cifs.ko.
> This patch is just a way to provide some flexibility to the user, in
> case they don't want to use the currently-always-fixed 600s.

It is not limited to key.dns_resolver.  The user is free to choose
whatever program he/she wants to be upcalled for dns_resolver key.

For instance, some distros might still be using cifs.upcall(8) that
actually set record TTL, thus making it impossible for the user to
change default via /proc/fs/cifs/dns_interval.
Enzo Matsumiya June 9, 2022, 4:16 p.m. UTC | #10
On 06/09, Paulo Alcantara wrote:
>Enzo Matsumiya <ematsumiya@suse.de> writes:
>
>> Currently, key.dns_resolver uses getaddrinfo() to resolve names, which
>> doesn't contain the TTL for the record, hence *always* returns 0 to cifs.ko.
>> This patch is just a way to provide some flexibility to the user, in
>> case they don't want to use the currently-always-fixed 600s.
>
>It is not limited to key.dns_resolver.  The user is free to choose
>whatever program he/she wants to be upcalled for dns_resolver key.
>
>For instance, some distros might still be using cifs.upcall(8) that
>actually set record TTL, thus making it impossible for the user to
>change default via /proc/fs/cifs/dns_interval.

Ah sorry, I misunderstood.

But this patch isn't supposed to allow the user to change the _default_ TTL
value, but only to give them a chance to change the TTL value *iff* the
upcall returned 0.

In case the upcall returns TTL != 0, dns_resolve_server_name_to_ip()
will use that value instead, which, again, maintains the current behaviour.

But yes, if desired, I can adjust the patch to completely ignore the
TTL value from upcall and manage it by ourselves always, either by
procfs or by mount option.

What do you think?


Cheers,

Enzo
Enzo Matsumiya June 9, 2022, 4:17 p.m. UTC | #11
On 06/09, Tom Talpey wrote:
>>I tried to stick to the current behaviour, while also fixing the issue
>>of getting a TTL=0 from key.dns_resolver upcall.
>>
>>A mount option looks indeed better initially, and that was also my
>>initial approach to this. But in a, e.g. multi-domain forest, with
>>multiple DFS targets, the per-mount setting starts to look irrelevant
>>again as well. So I took the "per-client" approach instead.
>
>Ok, I guess. It seems like kicking the can down the road, a little.
>I agree that rearchitecting the DNS cache might be extreme, but many
>distros consider procfs to be a user API, and they'll require it to
>be supported basically forever. That would be unfortunate...

Thanks, I didn't know that. I'll re-work on the patch and take this in
consideration then.


Cheers,

Enzo
Paulo Alcantara June 9, 2022, 4:42 p.m. UTC | #12
Enzo Matsumiya <ematsumiya@suse.de> writes:

> On 06/09, Paulo Alcantara wrote:
>>Enzo Matsumiya <ematsumiya@suse.de> writes:
>>
>>> Currently, key.dns_resolver uses getaddrinfo() to resolve names, which
>>> doesn't contain the TTL for the record, hence *always* returns 0 to cifs.ko.
>>> This patch is just a way to provide some flexibility to the user, in
>>> case they don't want to use the currently-always-fixed 600s.
>>
>>It is not limited to key.dns_resolver.  The user is free to choose
>>whatever program he/she wants to be upcalled for dns_resolver key.
>>
>>For instance, some distros might still be using cifs.upcall(8) that
>>actually set record TTL, thus making it impossible for the user to
>>change default via /proc/fs/cifs/dns_interval.
>
> Ah sorry, I misunderstood.
>
> But this patch isn't supposed to allow the user to change the _default_ TTL
> value, but only to give them a chance to change the TTL value *iff* the
> upcall returned 0.

That was my concern.  If we expose it to users, they would probably
expect it work at all times regardless whether the key's expire time was
set or not.

> In case the upcall returns TTL != 0, dns_resolve_server_name_to_ip()
> will use that value instead, which, again, maintains the current behaviour.

Then it would start ignoring values from /proc/fs/cifs/dns_interval and
not telling the user why.

> But yes, if desired, I can adjust the patch to completely ignore the
> TTL value from upcall and manage it by ourselves always, either by
> procfs or by mount option.

That would work, too.  BTW, I'd personally go with the mount option
version.