diff mbox series

fw4: add masquerade-prefix snat type

Message ID 20250112131635.8660-1-openwrt@jonaslochmann.de
State New
Headers show
Series fw4: add masquerade-prefix snat type | expand

Commit Message

Jonas Lochmann Jan. 12, 2025, 1:16 p.m. UTC
OpenWrt supports requesting IPv6 network prefixes using DHCP. However,
the existing masquerade option delegates the rewriting to nftables that
knows the ip address used by the router itself but not the prefixes that
are only known by netifd and OpenWrt services that use its data.

The masquerade-prefix does the following:

- keep the source address for IPv6 link local addresses
- keep the source address for IPs that are assigned to the interface
- keep the source address if they belong to an assigned IPv6 prefix
- otherwise rewrite the prefix to the shortest assigned prefix
- if there is no assigned prefix: use the regular masquerade

This is useful when devices in the network use a private source address
or a public source address that belongs to another network/uplink. In
simple scenarios, addresses from the current uplink prefix are used by all
devices. This stops working if there are multiple uplinks and the
(OpenWrt) router selects the uplink, e.g. with the help of the mwan3
package. While the traditional masquerade could be used in scenarios like
this, all devices would use the same outgoing IP address in case of
rewriting. Individual snat rules could be used only if all devices are
known when creating the configuration and the assigned prefix is static.

Currently, there are workarounds described in the wiki [1]. They all share
the limitation that they are not well integrated into the firewall. That
is, they are just hooks that are called whenever the firewall configuration
was replaced. Due to that, they do not benefit from the atomic
configuration updates that nftables provides - there is always a short
moment after updating the configuration during that this rules do not take
effect. My previous own solution [2] used other hooks (technically not
necassary) and uses an own nftables table so that it is unaffected
whenever firewall4 sends its new configuration to nftables. This is poorly
integrated with firewall4 and to disable it users need to remove the files
and know nftables well enough (or reboot the device) to stop the effect.

Another limitation of the approaches from the wiki is that they use
"snat ip6 prefix to ip6 saddr". "The optional prefix keyword allows to
map to map n source addresses to n destination addresses." [3]. Altough
not clearly stated, it looks like this needs original and rewritten
prefixes of the same size. Moreover, this requires determining the
possible original source IPs while the suggested implementation does
not need to handle them at all.

[1] https://openwrt.org/docs/guide-user/firewall/fw3_configurations/
    fw3_nat?rev=1736642299#ipv6_npt
[2] https://forum.openwrt.org/t/multiwan-with-ipv6/136965/12
[3] https://www.netfilter.org/projects/nftables/manpage.html

Signed-off-by: Jonas Lochmann <openwrt@jonaslochmann.de>
---
 root/usr/share/ucode/fw4.uc | 148 ++++++++++++++++++++++++++++++++++--
 1 file changed, 143 insertions(+), 5 deletions(-)

Comments

Michael Richardson Jan. 13, 2025, 3:17 a.m. UTC | #1
Hi, I understand the that this does *NPTv6* RFC6296 when forwarding traffic
with source addresses that do not fit into the uplink ISP.

You've called this masquerade-prefix, and I think that will confuse people
into thinking it's like "NAT44" aka NAPT, when it's different.
Jonas Lochmann Jan. 13, 2025, 8:17 a.m. UTC | #2
On Sun, Jan 12, 2025 at 10:17:01PM -0500, Michael Richardson wrote:
> 
> Hi, I understand the that this does *NPTv6* RFC6296 when forwarding traffic
> with source addresses that do not fit into the uplink ISP.

I do not agree with this. NPTv6 as described in RFC6296 is about stateless
prefix rewriting. Due to that, it is limited to the prefix length of the
smaller network. This patch is implementing stateful address rewriting
that is not limited to the prefix size of the smaller network.

In scenarios like load balancing, the system is stateful in any case
because one TCP flow must use the same source IP (and thus uplink) during
its whole lifetime (assuming that we do not use multipath TCP).

Another reason for the statefulness is the processing of traffic from the
uplink: It is not clear if the corresponding outbound traffic was
rewritten or not, e.g. because the flow started before the prefix from
this uplink was announced or not announced anymore in the private network.

All in all, I consider calling this feature NPTv6 misleading.

> You've called this masquerade-prefix, and I think that will confuse people
> into thinking it's like "NAT44" aka NAPT, when it's different.

After an internet search, I assume that NAT44 is a stateful NAT using a
source address pool with random source ip selection. I assume that NAPT
describes the rewriting of the source port in case of conflicts or in
any case (different sources indicate different behaviors).

The IPv4 pool in this case could be considered a prefix. On the other
hand, I consider it well known that a "masquerade" does not allow
configuring a source IP. Due to that, I consider the "masquerade-prefix"
something where the router should already know the possible source IPs.
In case of IPv4, I do not see anything in OpenWrt that could provide
these IPv4 prefixes.

While there is the possibility to extend this patch with a "randomize"
option to select the source IPs (for IPv6) randomly from the pool
instead of deterministically, I do not see a good use case for this.
While this could be a substitute for the IPv6 privacy extensions, the
regular masquerade for IPv6 would already solve this need.
Jonas Lochmann Feb. 27, 2025, 8:32 a.m. UTC | #3
More than a month has passed without any feedback that helped me. There was one
comment regarding an RFC that this does not implement (because its functionality
and goals are similar but different) and the name of this feature that is
considered misleading. Regarding the second point, there was no feedback suggesting
a better name. In reply to that, I stated why I chose this name and consider it
not misleading for this functionality.

This feature is used in production in one multiwan environment. I want to use it in
another production environment soon - as soon as the second ISP starts providing IPv6
instead of only advertising that dual stack connectivity is always included. Having
this upstream makes it easier for me and others to build multiwan setups with IPv6
that do not map everything to one outgoing IP.

On Sun, Jan 12, 2025 at 02:16:35PM +0100, Jonas Lochmann wrote:
> OpenWrt supports requesting IPv6 network prefixes using DHCP. However,
> the existing masquerade option delegates the rewriting to nftables that
> knows the ip address used by the router itself but not the prefixes that
> are only known by netifd and OpenWrt services that use its data.
> 
> The masquerade-prefix does the following:
> 
> - keep the source address for IPv6 link local addresses
> - keep the source address for IPs that are assigned to the interface
> - keep the source address if they belong to an assigned IPv6 prefix
> - otherwise rewrite the prefix to the shortest assigned prefix
> - if there is no assigned prefix: use the regular masquerade
> 
> This is useful when devices in the network use a private source address
> or a public source address that belongs to another network/uplink. In
> simple scenarios, addresses from the current uplink prefix are used by all
> devices. This stops working if there are multiple uplinks and the
> (OpenWrt) router selects the uplink, e.g. with the help of the mwan3
> package. While the traditional masquerade could be used in scenarios like
> this, all devices would use the same outgoing IP address in case of
> rewriting. Individual snat rules could be used only if all devices are
> known when creating the configuration and the assigned prefix is static.
> 
> Currently, there are workarounds described in the wiki [1]. They all share
> the limitation that they are not well integrated into the firewall. That
> is, they are just hooks that are called whenever the firewall configuration
> was replaced. Due to that, they do not benefit from the atomic
> configuration updates that nftables provides - there is always a short
> moment after updating the configuration during that this rules do not take
> effect. My previous own solution [2] used other hooks (technically not
> necassary) and uses an own nftables table so that it is unaffected
> whenever firewall4 sends its new configuration to nftables. This is poorly
> integrated with firewall4 and to disable it users need to remove the files
> and know nftables well enough (or reboot the device) to stop the effect.
> 
> Another limitation of the approaches from the wiki is that they use
> "snat ip6 prefix to ip6 saddr". "The optional prefix keyword allows to
> map to map n source addresses to n destination addresses." [3]. Altough
> not clearly stated, it looks like this needs original and rewritten
> prefixes of the same size. Moreover, this requires determining the
> possible original source IPs while the suggested implementation does
> not need to handle them at all.
> 
> [1] https://openwrt.org/docs/guide-user/firewall/fw3_configurations/
>     fw3_nat?rev=1736642299#ipv6_npt
> [2] https://forum.openwrt.org/t/multiwan-with-ipv6/136965/12
> [3] https://www.netfilter.org/projects/nftables/manpage.html
> 
> Signed-off-by: Jonas Lochmann <openwrt@jonaslochmann.de>
> ---
>  root/usr/share/ucode/fw4.uc | 148 ++++++++++++++++++++++++++++++++++--
>  1 file changed, 143 insertions(+), 5 deletions(-)
> 
> diff --git a/root/usr/share/ucode/fw4.uc b/root/usr/share/ucode/fw4.uc
> index 2d77146..0c1cd78 100644
> --- a/root/usr/share/ucode/fw4.uc
> +++ b/root/usr/share/ucode/fw4.uc
> @@ -192,7 +192,7 @@ const dscp_classes = {
>  	"EF": 0x2e
>  };
>  
> -function to_mask(bits, v6) {
> +function to_mask_raw(bits, v6) {
>  	let m = [], n = false;
>  
>  	if (bits < 0) {
> @@ -209,7 +209,19 @@ function to_mask(bits, v6) {
>  		bits -= b;
>  	}
>  
> -	return arrtoip(m);
> +	return m;
> +}
> +
> +function to_mask(bits, v6) {
> +	return arrtoip(to_mask_raw(bits, v6));
> +}
> +
> +function to_inverse_mask(bits, v6) {
> +	let mask = to_mask_raw(bits, v6);
> +
> +	mask = map(mask, (v) => v ^ 0xff);
> +
> +	return arrtoip(mask);
>  }
>  
>  function to_bits(mask) {
> @@ -658,6 +670,16 @@ return {
>  					}
>  				}
>  
> +				if (type(ifc["ipv6-prefix"]) == "array") {
> +					for (let prefix in ifc["ipv6-prefix"]) {
> +						push(net.ip6prefixes ||= [], {
> +							addr: prefix.address,
> +							mask: to_mask(prefix.mask, true),
> +							bits: prefix.mask
> +						});
> +					}
> +				}
> +
>  				if (type(ifc.data?.firewall) == "array") {
>  					let n = 0;
>  
> @@ -1475,6 +1497,7 @@ return {
>  			"dnat",
>  			"snat",
>  			"masquerade",
> +			"masquerade-prefix",
>  			"accept",
>  			"reject",
>  			"drop"
> @@ -3071,7 +3094,7 @@ return {
>  			return;
>  		}
>  
> -		if (!(snat.target in ["accept", "snat", "masquerade"])) {
> +		if (!(snat.target in ["accept", "snat", "masquerade", "masquerade-prefix"])) {
>  			this.warn_section(data, "has invalid target specified, defaulting to masquerade");
>  			snat.target = "masquerade";
>  		}
> @@ -3108,6 +3131,121 @@ return {
>  			delete snat.log;
>  		}
>  
> +		let add_rule_3 = (n) => {
> +			push(this.state.redirects ||= [], n);
> +		};
> +
> +		let add_rule_2 = (n) => {
> +			if (n.target == "masquerade-prefix") {
> +				let output_device_filter = null;
> +
> +				if (type(n.device) == "string") {
> +					output_device_filter = [n.device];
> +				} else if (n.src != null && !n.src.any) {
> +					// avoid creating rules for interfaces in other zones
> +					let zone = n.src.zone;
> +
> +					let simple = true;
> +
> +					simple &= zone.device == null || length(zone.device) == 0;
> +					simple &= zone.subnet == null || length(zone.subnet) == 0;
> +
> +					if (simple) {
> +						output_device_filter = zone.related_physdevs;
> +					}
> +				}
> +
> +				// accept (do nothing) for link local addresses
> +				// otherwise, we will get DHCP issues and thus no prefix
> +				if (n.family == 6) {
> +					let saddrs_masked = slice(n.saddrs_masked || []);
> +
> +					push(saddrs_masked, {
> +						addr: 'fe80::',
> +						mask: to_mask(10, true),
> +						invert: false
> +					});
> +
> +					add_rule_3({
> +						...n,
> +						saddrs_masked: saddrs_masked,
> +						target: "accept"
> +					});
> +				}
> +
> +				for (let name, net in this.state.networks) {
> +					if (output_device_filter == null || index(output_device_filter, net.device) != -1) {
> +						// accept (do nothing) if the src ip is ok
> +						for (let addr in net.ipaddrs) {
> +							if (addr.family == n.family) {
> +								let saddrs_masked = slice(n.saddrs_masked || []);
> +
> +								push(saddrs_masked, {
> +									addr: addr.addr,
> +									// only permit the single ip itself
> +									mask: n.family == 4 ? to_mask(32, false) : to_mask(128, true),
> +									invert: false
> +								});
> +
> +								add_rule_3({
> +									...n,
> +									device: net.device,
> +									saddrs_masked: saddrs_masked,
> +									target: "accept"
> +								});
> +							}
> +						}
> +
> +						if (n.family == 6) {
> +							// accept (do nothing) if the src ip belongs to a prefix
> +							for (let prefix in net.ip6prefixes) {
> +								let saddrs_masked = slice(n.saddrs_masked || []);
> +
> +								push(saddrs_masked, {
> +									addr: prefix.addr,
> +									mask: prefix.mask,
> +									invert: false
> +								});
> +
> +								add_rule_3({
> +									...n,
> +									device: net.device,
> +									saddrs_masked: saddrs_masked,
> +									target: "accept"
> +								});
> +							}
> +
> +							// otherwise rewrite the src ip
> +							let best_prefix = null;
> +
> +							for (let prefix in net.ip6prefixes) {
> +								if (best_prefix == null || best_prefix.bits > prefix.bits)
> +									best_prefix = prefix;
> +							}
> +
> +							// if there is no prefix, then the masquerade fallback will resolve this
> +							if (best_prefix != null) {
> +								let base_addr = apply_mask(best_prefix.addr, best_prefix.mask);
> +								let suffix_mask = to_inverse_mask(best_prefix.bits, true);
> +								let target = "snat ip6 to ip6 saddr and " + suffix_mask + " or " + base_addr;
> +
> +								add_rule_3({
> +									...n,
> +									device: net.device,
> +									target: target
> +								});
> +							}
> +						}
> +					}
> +				}
> +
> +				// use masquerade as fallback
> +				add_rule_3({ ...n, target: "masquerade" });
> +			} else {
> +				add_rule_3(n);
> +			}
> +		};
> +
>  		let add_rule = (family, proto, saddrs, daddrs, raddrs, sport, dport, rport, snat) => {
>  			let n = {
>  				...snat,
> @@ -3133,7 +3271,7 @@ return {
>  				chain: snat.src?.zone ? `srcnat_${snat.src.zone.name}` : "srcnat"
>  			};
>  
> -			push(this.state.redirects ||= [], n);
> +			add_rule_2(n);
>  		};
>  
>  		for (let proto in snat.proto) {
> @@ -3182,7 +3320,7 @@ return {
>  			}
>  
>  			/* check if there's no AF specific bits, in this case we can do an AF agnostic rule */
> -			if (!family && !length(sip[0]) && !length(sip[1]) && !length(dip[0]) && !length(dip[1]) && !length(rip[0]) && !length(rip[1])) {
> +			if (!family && !length(sip[0]) && !length(sip[1]) && !length(dip[0]) && !length(dip[1]) && !length(rip[0]) && !length(rip[1]) && snat.target != "masquerade-prefix") {
>  				add_rule(0, proto, [], [], null, sport, dport, rport, snat);
>  			}
>  
> -- 
> 2.39.5
> 
> 
> _______________________________________________
> openwrt-devel mailing list
> openwrt-devel@lists.openwrt.org
> https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Bjørn Mork Feb. 27, 2025, 10:49 a.m. UTC | #4
The sender domain has a DMARC Reject/Quarantine policy which disallows
sending mailing list messages using the original "From" header.

To mitigate this problem, the original message has been wrapped
automatically by the mailing list software.
Jonas Lochmann <openwrt@jonaslochmann.de> writes:

> More than a month has passed without any feedback that helped me. There was one
> comment regarding an RFC that this does not implement (because its functionality
> and goals are similar but different)

It is not clear to me why the functionality goals SHOULD be different.
Lack of justifiaction and documention of differences make them bugs, not
features.

> the name of this feature that is considered misleading. Regarding the
> second point, there was no feedback suggesting a better name. In reply
> to that, I stated why I chose this name and consider it not misleading
> for this functionality.

The question is not whether the functionality is covered by the name,
but whether the name implies some other functionality to those who are
not already familiar with the feature.

But this is mostly pointing back to the first issue: Why is it that we
need a feature which is so weird and unique to OpenWrt that it has never
been described before?


Bjørn
Jonas Lochmann Feb. 27, 2025, 5:39 p.m. UTC | #5
On Thu, Feb 27, 2025 at 11:49:10AM +0100, Bjørn Mork wrote:
> But this is mostly pointing back to the first issue: Why is it that we
> need a feature which is so weird and unique to OpenWrt that it has never
> been described before?

Because this solves a problem where no solution exists yet. The following is
based on search results for the term "ipv6 multiwan".

RFC 8678 described the solution of using source address based routing [1].
This supports a failover, but this method is not supported by the mwan3
package. This has the limitation that a load balancing is not possible. It
mentions NPTv6 and Multipath Transports as other possible solutions.

A Reddit discussion talks about the failover scenario [2]. NPTv6 is
discussed along with its disadvantages in practice - limited support in
products (not supported at all or only with static prefixes). Another
discussion is the one about using global addresses or ULA addresses in
the private network for this.

The documentation of PfSense states for multiwan with IPv6 that "This
[Network Prefix Translation] does not work for dynamic IPv6 types where
the subnet is not static, such as DHCP6-PD." [3] This document states
that this can be used with global or local addresses in the lan. As far
as I know, providing both in the lan will cause trouble. In the forum,
someone asks about other solutions but without any reply [4].

For OPNsense, someone wrote a tutorial (in german only) and just
skipped IPv6 [5]. The reason: IPv4 is for a failover enough. Sadly, the
date of this article is not clearly visible, but the year 2022 is
mentioned.

In the Unify forum, there is a post about a failover function that
seems to ignore IPv6 [6]. The post is two years old, but the last
comment stating the issue still exists is 5 months old. Another
post [7] describes using NPT but it looks manual and with hardcoding
the prefixes. It uses local addresses within the lan.

So the stateless NPT requires using one single prefix in the lan
(limitation 1). To avoid side effects on traffic to the other uplink if
one uplink obtains a new prefix, the local addresses must be used
(limitation 2). It requires prefixes of the same size for the internal
network and the uplinks (limitation 3). Using my approach, these
limitations do not exist. It looks like this approach is not implemented
anywhere yet. As a result, there is no well known name for it.

The downside of this method: it is stateful. However, a multiwan with
load balancing is stateful and a stateful firewall that is normally used
at the border of a network is stateful too.

An alternative to my approach would be a dynamic NPT in OpenWrt that
uses the assigned prefixes from the uplinks. This would be similar to
my patch but the mentioned limitations would apply.

[1] https://datatracker.ietf.org/doc/rfc8678/
[2] https://www.reddit.com/r/ipv6/comments/10odci9/is_there_still_no_good_ipv6_wan_failover_solution/
[3] https://docs.netgate.com/pfsense/en/latest/recipes/multiwan-ipv6.html
[4] https://forum.netgate.com/topic/188052/is-there-a-clear-and-complete-recipe-for-ipv6-multi-wan
[5] https://www.heimnetz.de/anleitungen/firewall/opnsense/opnsense-multi-wan-einrichten/
[6] https://community.ui.com/questions/Dual-WAN-IPv6-Failover-and-Traffic-Routing-UDM-Pro/8c46d2bb-9aba-422b-ad2d-c78d6a7d5bcb
[7] https://community.ui.com/questions/Dual-WAN-IPv6-setup/1c2d7fe2-3bc3-42b1-b9bf-b7d36bc9f9cc
Goetz Goerisch Feb. 28, 2025, 7:44 a.m. UTC | #6
Thank you Jonas for the initiative.

For Multi-Homing and Load-balacing scenarios I was always looking into
RFC8678 [1] or RFC8475 [2].
But as you mentioned there is no support in OpenWrt or mwan3 as of today.

Therefore I would be interested in a solution, nevertheless I have no
deployment and test possibilities at the moment.

Did you discuss the deployment scenario elsewhere, e.g. Ripe IPv6 WG?

Goetz

[1] https://datatracker.ietf.org/doc/rfc8678/
[2] https://datatracker.ietf.org/doc/rfc8475/

Am Do., 27. Feb. 2025 um 20:42 Uhr schrieb Jonas Lochmann
<openwrt@jonaslochmann.de>:
>
> On Thu, Feb 27, 2025 at 11:49:10AM +0100, Bjørn Mork wrote:
> > But this is mostly pointing back to the first issue: Why is it that we
> > need a feature which is so weird and unique to OpenWrt that it has never
> > been described before?
>
> Because this solves a problem where no solution exists yet. The following is
> based on search results for the term "ipv6 multiwan".
>
> RFC 8678 described the solution of using source address based routing [1].
> This supports a failover, but this method is not supported by the mwan3
> package. This has the limitation that a load balancing is not possible. It
> mentions NPTv6 and Multipath Transports as other possible solutions.
>
> A Reddit discussion talks about the failover scenario [2]. NPTv6 is
> discussed along with its disadvantages in practice - limited support in
> products (not supported at all or only with static prefixes). Another
> discussion is the one about using global addresses or ULA addresses in
> the private network for this.
>
> The documentation of PfSense states for multiwan with IPv6 that "This
> [Network Prefix Translation] does not work for dynamic IPv6 types where
> the subnet is not static, such as DHCP6-PD." [3] This document states
> that this can be used with global or local addresses in the lan. As far
> as I know, providing both in the lan will cause trouble. In the forum,
> someone asks about other solutions but without any reply [4].
>
> For OPNsense, someone wrote a tutorial (in german only) and just
> skipped IPv6 [5]. The reason: IPv4 is for a failover enough. Sadly, the
> date of this article is not clearly visible, but the year 2022 is
> mentioned.
>
> In the Unify forum, there is a post about a failover function that
> seems to ignore IPv6 [6]. The post is two years old, but the last
> comment stating the issue still exists is 5 months old. Another
> post [7] describes using NPT but it looks manual and with hardcoding
> the prefixes. It uses local addresses within the lan.
>
> So the stateless NPT requires using one single prefix in the lan
> (limitation 1). To avoid side effects on traffic to the other uplink if
> one uplink obtains a new prefix, the local addresses must be used
> (limitation 2). It requires prefixes of the same size for the internal
> network and the uplinks (limitation 3). Using my approach, these
> limitations do not exist. It looks like this approach is not implemented
> anywhere yet. As a result, there is no well known name for it.
>
> The downside of this method: it is stateful. However, a multiwan with
> load balancing is stateful and a stateful firewall that is normally used
> at the border of a network is stateful too.
>
> An alternative to my approach would be a dynamic NPT in OpenWrt that
> uses the assigned prefixes from the uplinks. This would be similar to
> my patch but the mentioned limitations would apply.
>
> [1] https://datatracker.ietf.org/doc/rfc8678/
> [2] https://www.reddit.com/r/ipv6/comments/10odci9/is_there_still_no_good_ipv6_wan_failover_solution/
> [3] https://docs.netgate.com/pfsense/en/latest/recipes/multiwan-ipv6.html
> [4] https://forum.netgate.com/topic/188052/is-there-a-clear-and-complete-recipe-for-ipv6-multi-wan
> [5] https://www.heimnetz.de/anleitungen/firewall/opnsense/opnsense-multi-wan-einrichten/
> [6] https://community.ui.com/questions/Dual-WAN-IPv6-Failover-and-Traffic-Routing-UDM-Pro/8c46d2bb-9aba-422b-ad2d-c78d6a7d5bcb
> [7] https://community.ui.com/questions/Dual-WAN-IPv6-setup/1c2d7fe2-3bc3-42b1-b9bf-b7d36bc9f9cc
>
> _______________________________________________
> openwrt-devel mailing list
> openwrt-devel@lists.openwrt.org
> https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Jonas Lochmann Feb. 28, 2025, 9:09 a.m. UTC | #7
On Fri, Feb 28, 2025 at 08:44:28AM +0100, Goetz Goerisch wrote:
> For Multi-Homing and Load-balacing scenarios I was always looking into
> RFC8678 [1] or RFC8475 [2].
> But as you mentioned there is no support in OpenWrt or mwan3 as of today.

RFC8475 is actually more or less supported in native OpenWrt. The
possible scenario is "3.2.3.  Single Router, Load-Balancing between
Uplinks". OpenWrt can report multiple uplinks to clients and it supports
source address based routing. The issue is the address selection by the
clients. Using a proxy server running at the clients, I could add load
balancing to existing applications [1]. This implementation implements
round robin without any weights and I don't know if it still compiles.

RFC 8678 talks about other methods to solve this issue: A DHCP option or
ICMP messages to tell the client to use another source address. Both do
not clearly state that it can be used for load balancing instead of
predefined rules that assign target addresses to source addresses.

The DHCP option could provide parameters according to RFC 6724 but the
address selection algorithm in it does not permit nondeterminism/
randomness. For the ICMP method, RFC 8678 says "When H31 receives this
packet, it would then be expected to try another source address to reach
the destination.". I don't know if this is expected or actually part of
any standard and implemented in most systems.

For mwan3, an interface for the RA/DHCP daemon is needed using that mwan3
can adjust the reported prefixes. Then there is the concept question what
happens if multiple services want to use this prefix adjustment API. If
multiwan would be a core feature of the network managment stack of
OpenWrt, then this would be easier. But I do not consider it worth it as
long as load balancing at the client is not possible with it.

> Therefore I would be interested in a solution, nevertheless I have no
> deployment and test possibilities at the moment.
> 
> Did you discuss the deployment scenario elsewhere, e.g. Ripe IPv6 WG?

Not yet, but this is a good idea.

[1] https://codeberg.org/jonas-l/socksbalance
diff mbox series

Patch

diff --git a/root/usr/share/ucode/fw4.uc b/root/usr/share/ucode/fw4.uc
index 2d77146..0c1cd78 100644
--- a/root/usr/share/ucode/fw4.uc
+++ b/root/usr/share/ucode/fw4.uc
@@ -192,7 +192,7 @@  const dscp_classes = {
 	"EF": 0x2e
 };
 
-function to_mask(bits, v6) {
+function to_mask_raw(bits, v6) {
 	let m = [], n = false;
 
 	if (bits < 0) {
@@ -209,7 +209,19 @@  function to_mask(bits, v6) {
 		bits -= b;
 	}
 
-	return arrtoip(m);
+	return m;
+}
+
+function to_mask(bits, v6) {
+	return arrtoip(to_mask_raw(bits, v6));
+}
+
+function to_inverse_mask(bits, v6) {
+	let mask = to_mask_raw(bits, v6);
+
+	mask = map(mask, (v) => v ^ 0xff);
+
+	return arrtoip(mask);
 }
 
 function to_bits(mask) {
@@ -658,6 +670,16 @@  return {
 					}
 				}
 
+				if (type(ifc["ipv6-prefix"]) == "array") {
+					for (let prefix in ifc["ipv6-prefix"]) {
+						push(net.ip6prefixes ||= [], {
+							addr: prefix.address,
+							mask: to_mask(prefix.mask, true),
+							bits: prefix.mask
+						});
+					}
+				}
+
 				if (type(ifc.data?.firewall) == "array") {
 					let n = 0;
 
@@ -1475,6 +1497,7 @@  return {
 			"dnat",
 			"snat",
 			"masquerade",
+			"masquerade-prefix",
 			"accept",
 			"reject",
 			"drop"
@@ -3071,7 +3094,7 @@  return {
 			return;
 		}
 
-		if (!(snat.target in ["accept", "snat", "masquerade"])) {
+		if (!(snat.target in ["accept", "snat", "masquerade", "masquerade-prefix"])) {
 			this.warn_section(data, "has invalid target specified, defaulting to masquerade");
 			snat.target = "masquerade";
 		}
@@ -3108,6 +3131,121 @@  return {
 			delete snat.log;
 		}
 
+		let add_rule_3 = (n) => {
+			push(this.state.redirects ||= [], n);
+		};
+
+		let add_rule_2 = (n) => {
+			if (n.target == "masquerade-prefix") {
+				let output_device_filter = null;
+
+				if (type(n.device) == "string") {
+					output_device_filter = [n.device];
+				} else if (n.src != null && !n.src.any) {
+					// avoid creating rules for interfaces in other zones
+					let zone = n.src.zone;
+
+					let simple = true;
+
+					simple &= zone.device == null || length(zone.device) == 0;
+					simple &= zone.subnet == null || length(zone.subnet) == 0;
+
+					if (simple) {
+						output_device_filter = zone.related_physdevs;
+					}
+				}
+
+				// accept (do nothing) for link local addresses
+				// otherwise, we will get DHCP issues and thus no prefix
+				if (n.family == 6) {
+					let saddrs_masked = slice(n.saddrs_masked || []);
+
+					push(saddrs_masked, {
+						addr: 'fe80::',
+						mask: to_mask(10, true),
+						invert: false
+					});
+
+					add_rule_3({
+						...n,
+						saddrs_masked: saddrs_masked,
+						target: "accept"
+					});
+				}
+
+				for (let name, net in this.state.networks) {
+					if (output_device_filter == null || index(output_device_filter, net.device) != -1) {
+						// accept (do nothing) if the src ip is ok
+						for (let addr in net.ipaddrs) {
+							if (addr.family == n.family) {
+								let saddrs_masked = slice(n.saddrs_masked || []);
+
+								push(saddrs_masked, {
+									addr: addr.addr,
+									// only permit the single ip itself
+									mask: n.family == 4 ? to_mask(32, false) : to_mask(128, true),
+									invert: false
+								});
+
+								add_rule_3({
+									...n,
+									device: net.device,
+									saddrs_masked: saddrs_masked,
+									target: "accept"
+								});
+							}
+						}
+
+						if (n.family == 6) {
+							// accept (do nothing) if the src ip belongs to a prefix
+							for (let prefix in net.ip6prefixes) {
+								let saddrs_masked = slice(n.saddrs_masked || []);
+
+								push(saddrs_masked, {
+									addr: prefix.addr,
+									mask: prefix.mask,
+									invert: false
+								});
+
+								add_rule_3({
+									...n,
+									device: net.device,
+									saddrs_masked: saddrs_masked,
+									target: "accept"
+								});
+							}
+
+							// otherwise rewrite the src ip
+							let best_prefix = null;
+
+							for (let prefix in net.ip6prefixes) {
+								if (best_prefix == null || best_prefix.bits > prefix.bits)
+									best_prefix = prefix;
+							}
+
+							// if there is no prefix, then the masquerade fallback will resolve this
+							if (best_prefix != null) {
+								let base_addr = apply_mask(best_prefix.addr, best_prefix.mask);
+								let suffix_mask = to_inverse_mask(best_prefix.bits, true);
+								let target = "snat ip6 to ip6 saddr and " + suffix_mask + " or " + base_addr;
+
+								add_rule_3({
+									...n,
+									device: net.device,
+									target: target
+								});
+							}
+						}
+					}
+				}
+
+				// use masquerade as fallback
+				add_rule_3({ ...n, target: "masquerade" });
+			} else {
+				add_rule_3(n);
+			}
+		};
+
 		let add_rule = (family, proto, saddrs, daddrs, raddrs, sport, dport, rport, snat) => {
 			let n = {
 				...snat,
@@ -3133,7 +3271,7 @@  return {
 				chain: snat.src?.zone ? `srcnat_${snat.src.zone.name}` : "srcnat"
 			};
 
-			push(this.state.redirects ||= [], n);
+			add_rule_2(n);
 		};
 
 		for (let proto in snat.proto) {
@@ -3182,7 +3320,7 @@  return {
 			}
 
 			/* check if there's no AF specific bits, in this case we can do an AF agnostic rule */
-			if (!family && !length(sip[0]) && !length(sip[1]) && !length(dip[0]) && !length(dip[1]) && !length(rip[0]) && !length(rip[1])) {
+			if (!family && !length(sip[0]) && !length(sip[1]) && !length(dip[0]) && !length(dip[1]) && !length(rip[0]) && !length(rip[1]) && snat.target != "masquerade-prefix") {
 				add_rule(0, proto, [], [], null, sport, dport, rport, snat);
 			}