ovn-northd(8) OVN Manual ovn-northd(8)
NAME
ovn-northd and ovn-northd-ddlog - Open Virtual Network central control
daemon
SYNOPSIS
ovn-northd [options]
DESCRIPTION
ovn-northd is a centralized daemon responsible for translating the
high-level OVN configuration into logical configuration consumable by
daemons such as ovn-controller. It translates the logical network con‐
figuration in terms of conventional network concepts, taken from the
OVN Northbound Database (see ovn-nb(5)), into logical datapath flows in
the OVN Southbound Database (see ovn-sb(5)) below it.
ovn-northd is implemented in C. ovn-northd-ddlog is a compatible imple‐
mentation written in DDlog, a language for incremental database pro‐
cessing. This documentation applies to both implementations, with dif‐
ferences indicated where relevant.
OPTIONS
--ovnnb-db=database
The OVSDB database containing the OVN Northbound Database. If
the OVN_NB_DB environment variable is set, its value is used as
the default. Otherwise, the default is unix:/ovnnb_db.sock.
--ovnsb-db=database
The OVSDB database containing the OVN Southbound Database. If
the OVN_SB_DB environment variable is set, its value is used as
the default. Otherwise, the default is unix:/ovnsb_db.sock.
--ddlog-record=file
This option is for ovn-north-ddlog only. It causes the daemon to
record the initial database state and later changes to file in
the text-based DDlog command format. The ovn_northd_cli program
can later replay these changes for debugging purposes. This
option has a performance impact. See debugging-ddlog.rst in the
OVN documentation for more details.
--dry-run
Causes ovn-northd to start paused. In the paused state,
ovn-northd does not apply any changes to the databases, although
it continues to monitor them. For more information, see the
pause command, under Runtime Management Commands below.
For ovn-northd-ddlog, one could use this option with
--ddlog-record to generate a replay log without restarting a
process or disturbing a running system.
database in the above options must be an OVSDB active or passive con‐
nection method, as described in ovsdb(7).
Daemon Options
--pidfile[=pidfile]
Causes a file (by default, program.pid) to be created indicating
the PID of the running process. If the pidfile argument is not
specified, or if it does not begin with /, then it is created in
.
If --pidfile is not specified, no pidfile is created.
--overwrite-pidfile
By default, when --pidfile is specified and the specified pid‐
file already exists and is locked by a running process, the dae‐
mon refuses to start. Specify --overwrite-pidfile to cause it to
instead overwrite the pidfile.
When --pidfile is not specified, this option has no effect.
--detach
Runs this program as a background process. The process forks,
and in the child it starts a new session, closes the standard
file descriptors (which has the side effect of disabling logging
to the console), and changes its current directory to the root
(unless --no-chdir is specified). After the child completes its
initialization, the parent exits.
--monitor
Creates an additional process to monitor this program. If it
dies due to a signal that indicates a programming error (SIGA‐
BRT, SIGALRM, SIGBUS, SIGFPE, SIGILL, SIGPIPE, SIGSEGV, SIGXCPU,
or SIGXFSZ) then the monitor process starts a new copy of it. If
the daemon dies or exits for another reason, the monitor process
exits.
This option is normally used with --detach, but it also func‐
tions without it.
--no-chdir
By default, when --detach is specified, the daemon changes its
current working directory to the root directory after it
detaches. Otherwise, invoking the daemon from a carelessly cho‐
sen directory would prevent the administrator from unmounting
the file system that holds that directory.
Specifying --no-chdir suppresses this behavior, preventing the
daemon from changing its current working directory. This may be
useful for collecting core files, since it is common behavior to
write core dumps into the current working directory and the root
directory is not a good directory to use.
This option has no effect when --detach is not specified.
--no-self-confinement
By default this daemon will try to self-confine itself to work
with files under well-known directories determined at build
time. It is better to stick with this default behavior and not
to use this flag unless some other Access Control is used to
confine daemon. Note that in contrast to other access control
implementations that are typically enforced from kernel-space
(e.g. DAC or MAC), self-confinement is imposed from the user-
space daemon itself and hence should not be considered as a full
confinement strategy, but instead should be viewed as an addi‐
tional layer of security.
--user=user:group
Causes this program to run as a different user specified in
user:group, thus dropping most of the root privileges. Short
forms user and :group are also allowed, with current user or
group assumed, respectively. Only daemons started by the root
user accepts this argument.
On Linux, daemons will be granted CAP_IPC_LOCK and
CAP_NET_BIND_SERVICES before dropping root privileges. Daemons
that interact with a datapath, such as ovs-vswitchd, will be
granted three additional capabilities, namely CAP_NET_ADMIN,
CAP_NET_BROADCAST and CAP_NET_RAW. The capability change will
apply even if the new user is root.
On Windows, this option is not currently supported. For security
reasons, specifying this option will cause the daemon process
not to start.
Logging Options
-v[spec]
--verbose=[spec]
Sets logging levels. Without any spec, sets the log level for
every module and destination to dbg. Otherwise, spec is a list of
words separated by spaces or commas or colons, up to one from each
category below:
· A valid module name, as displayed by the vlog/list command
on ovs-appctl(8), limits the log level change to the speci‐
fied module.
· syslog, console, or file, to limit the log level change to
only to the system log, to the console, or to a file,
respectively. (If --detach is specified, the daemon closes
its standard file descriptors, so logging to the console
will have no effect.)
On Windows platform, syslog is accepted as a word and is
only useful along with the --syslog-target option (the word
has no effect otherwise).
· off, emer, err, warn, info, or dbg, to control the log
level. Messages of the given severity or higher will be
logged, and messages of lower severity will be filtered
out. off filters out all messages. See ovs-appctl(8) for a
definition of each log level.
Case is not significant within spec.
Regardless of the log levels set for file, logging to a file will
not take place unless --log-file is also specified (see below).
For compatibility with older versions of OVS, any is accepted as a
word but has no effect.
-v
--verbose
Sets the maximum logging verbosity level, equivalent to --ver‐
bose=dbg.
-vPATTERN:destination:pattern
--verbose=PATTERN:destination:pattern
Sets the log pattern for destination to pattern. Refer to
ovs-appctl(8) for a description of the valid syntax for pattern.
-vFACILITY:facility
--verbose=FACILITY:facility
Sets the RFC5424 facility of the log message. facility can be one
of kern, user, mail, daemon, auth, syslog, lpr, news, uucp, clock,
ftp, ntp, audit, alert, clock2, local0, local1, local2, local3,
local4, local5, local6 or local7. If this option is not specified,
daemon is used as the default for the local system syslog and
local0 is used while sending a message to the target provided via
the --syslog-target option.
--log-file[=file]
Enables logging to a file. If file is specified, then it is used
as the exact name for the log file. The default log file name used
if file is omitted is /usr/local/var/log/ovn/program.log.
--syslog-target=host:port
Send syslog messages to UDP port on host, in addition to the sys‐
tem syslog. The host must be a numerical IP address, not a host‐
name.
--syslog-method=method
Specify method as how syslog messages should be sent to syslog
daemon. The following forms are supported:
· libc, to use the libc syslog() function. Downside of using
this options is that libc adds fixed prefix to every mes‐
sage before it is actually sent to the syslog daemon over
/dev/log UNIX domain socket.
· unix:file, to use a UNIX domain socket directly. It is pos‐
sible to specify arbitrary message format with this option.
However, rsyslogd 8.9 and older versions use hard coded
parser function anyway that limits UNIX domain socket use.
If you want to use arbitrary message format with older
rsyslogd versions, then use UDP socket to localhost IP
address instead.
· udp:ip:port, to use a UDP socket. With this method it is
possible to use arbitrary message format also with older
rsyslogd. When sending syslog messages over UDP socket
extra precaution needs to be taken into account, for exam‐
ple, syslog daemon needs to be configured to listen on the
specified UDP port, accidental iptables rules could be
interfering with local syslog traffic and there are some
security considerations that apply to UDP sockets, but do
not apply to UNIX domain sockets.
· null, to discard all messages logged to syslog.
The default is taken from the OVS_SYSLOG_METHOD environment vari‐
able; if it is unset, the default is libc.
PKI Options
PKI configuration is required in order to use SSL for the connections
to the Northbound and Southbound databases.
-p privkey.pem
--private-key=privkey.pem
Specifies a PEM file containing the private key used as
identity for outgoing SSL connections.
-c cert.pem
--certificate=cert.pem
Specifies a PEM file containing a certificate that certi‐
fies the private key specified on -p or --private-key to be
trustworthy. The certificate must be signed by the certifi‐
cate authority (CA) that the peer in SSL connections will
use to verify it.
-C cacert.pem
--ca-cert=cacert.pem
Specifies a PEM file containing the CA certificate for ver‐
ifying certificates presented to this program by SSL peers.
(This may be the same certificate that SSL peers use to
verify the certificate specified on -c or --certificate, or
it may be a different one, depending on the PKI design in
use.)
-C none
--ca-cert=none
Disables verification of certificates presented by SSL
peers. This introduces a security risk, because it means
that certificates cannot be verified to be those of known
trusted hosts.
Other Options
--unixctl=socket
Sets the name of the control socket on which program listens for
runtime management commands (see RUNTIME MANAGEMENT COMMANDS,
below). If socket does not begin with /, it is interpreted as
relative to . If --unixctl is not used at all, the default
socket is /program.pid.ctl, where pid is program’s process ID.
On Windows a local named pipe is used to listen for runtime man‐
agement commands. A file is created in the absolute path as
pointed by socket or if --unixctl is not used at all, a file is
created as program in the configured OVS_RUNDIR directory. The
file exists just to mimic the behavior of a Unix domain socket.
Specifying none for socket disables the control socket feature.
-h
--help
Prints a brief help message to the console.
-V
--version
Prints version information to the console.
RUNTIME MANAGEMENT COMMANDS
ovs-appctl can send commands to a running ovn-northd process. The cur‐
rently supported commands are described below.
exit Causes ovn-northd to gracefully terminate.
pause Pauses ovn-northd. When it is paused, ovn-northd receives
changes from the Northbound and Southbound database
changes as usual, but it does not send any updates. A
paused ovn-northd also drops database locks, which allows
any other non-paused instance of ovn-northd to take over.
resume Resumes the ovn-northd operation to process Northbound
and Southbound database contents and generate logical
flows. This will also instruct ovn-northd to aspire for
the lock on SB DB.
is-paused
Returns "true" if ovn-northd is currently paused, "false"
otherwise.
status Prints this server’s status. Status will be "active" if
ovn-northd has acquired OVSDB lock on SB DB, "standby" if
it has not or "paused" if this instance is paused.
sb-cluster-state-reset
Reset southbound database cluster status when databases
are destroyed and rebuilt.
If all databases in a clustered southbound database are
removed from disk, then the stored index of all databases
will be reset to zero. This will cause ovn-northd to be
unable to read or write to the southbound database,
because it will always detect the data as stale. In such
a case, run this command so that ovn-northd will reset
its local index so that it can interact with the south‐
bound database again.
nb-cluster-state-reset
Reset northbound database cluster status when databases
are destroyed and rebuilt.
This performs the same task as sb-cluster-state-reset
except for the northbound database client.
Only ovn-northd-ddlog supports the following commands:
enable-cpu-profiling
disable-cpu-profiling
Enables or disables profiling of CPU time used by the DDlog
engine. When CPU profiling is enabled, the profile command
(see below) will include DDlog CPU usage statistics in its
output. Enabling CPU profiling will slow ovn-northd-ddlog.
Disabling CPU profiling does not clear any previously
recorded statistics.
profile
Outputs a profile of the current and peak sizes of arrange‐
ments inside DDlog. This profiling data can be useful for
optimizing DDlog code. If CPU profiling was previously
enabled (even if it was later disabled), the output also
includes a CPU time profile. See Profiling inside the tuto‐
rial in the DDlog repository for an introduction to profil‐
ing DDlog.
ACTIVE-STANDBY FOR HIGH AVAILABILITY
You may run ovn-northd more than once in an OVN deployment. When con‐
nected to a standalone or clustered DB setup, OVN will automatically
ensure that only one of them is active at a time. If multiple instances
of ovn-northd are running and the active ovn-northd fails, one of the
hot standby instances of ovn-northd will automatically take over.
Active-Standby with multiple OVN DB servers
You may run multiple OVN DB servers in an OVN deployment with:
· OVN DB servers deployed in active/passive mode with one
active and multiple passive ovsdb-servers.
· ovn-northd also deployed on all these nodes, using unix
ctl sockets to connect to the local OVN DB servers.
In such deployments, the ovn-northds on the passive nodes will process
the DB changes and compute logical flows to be thrown out later,
because write transactions are not allowed by the passive ovsdb-
servers. It results in unnecessary CPU usage.
With the help of runtime management command pause, you can pause
ovn-northd on these nodes. When a passive node becomes master, you can
use the runtime management command resume to resume the ovn-northd to
process the DB changes.
LOGICAL FLOW TABLE STRUCTURE
One of the main purposes of ovn-northd is to populate the Logical_Flow
table in the OVN_Southbound database. This section describes how
ovn-northd does this for switch and router logical datapaths.
Logical Switch Datapaths
Ingress Table 0: Admission Control and Ingress Port Security - L2
Ingress table 0 contains these logical flows:
· Priority 100 flows to drop packets with VLAN tags or mul‐
ticast Ethernet source addresses.
· Priority 50 flows that implement ingress port security
for each enabled logical port. For logical ports on which
port security is enabled, these match the inport and the
valid eth.src address(es) and advance only those packets
to the next flow table. For logical ports on which port
security is not enabled, these advance all packets that
match the inport.
There are no flows for disabled logical ports because the default-drop
behavior of logical flow tables causes packets that ingress from them
to be dropped.
Ingress Table 1: Ingress Port Security - IP
Ingress table 1 contains these logical flows:
· For each element in the port security set having one or
more IPv4 or IPv6 addresses (or both),
· Priority 90 flow to allow IPv4 traffic if it has
IPv4 addresses which match the inport, valid
eth.src and valid ip4.src address(es).
· Priority 90 flow to allow IPv4 DHCP discovery
traffic if it has a valid eth.src. This is neces‐
sary since DHCP discovery messages are sent from
the unspecified IPv4 address (0.0.0.0) since the
IPv4 address has not yet been assigned.
· Priority 90 flow to allow IPv6 traffic if it has
IPv6 addresses which match the inport, valid
eth.src and valid ip6.src address(es).
· Priority 90 flow to allow IPv6 DAD (Duplicate
Address Detection) traffic if it has a valid
eth.src. This is is necessary since DAD include
requires joining an multicast group and sending
neighbor solicitations for the newly assigned
address. Since no address is yet assigned, these
are sent from the unspecified IPv6 address (::).
· Priority 80 flow to drop IP (both IPv4 and IPv6)
traffic which match the inport and valid eth.src.
· One priority-0 fallback flow that matches all packets and
advances to the next table.
Ingress Table 2: Ingress Port Security - Neighbor discovery
Ingress table 2 contains these logical flows:
· For each element in the port security set,
· Priority 90 flow to allow ARP traffic which match
the inport and valid eth.src and arp.sha. If the
element has one or more IPv4 addresses, then it
also matches the valid arp.spa.
· Priority 90 flow to allow IPv6 Neighbor Solicita‐
tion and Advertisement traffic which match the
inport, valid eth.src and nd.sll/nd.tll. If the
element has one or more IPv6 addresses, then it
also matches the valid nd.target address(es) for
Neighbor Advertisement traffic.
· Priority 80 flow to drop ARP and IPv6 Neighbor
Solicitation and Advertisement traffic which match
the inport and valid eth.src.
· One priority-0 fallback flow that matches all packets and
advances to the next table.
Ingress Table 3: Lookup MAC address learning table
This table looks up the MAC learning table of the logical switch data‐
path to check if the port-mac pair is present or not. MAC is learnt
only for logical switch VIF ports whose port security is disabled and
’unknown’ address set.
· For each such logical port p whose port security is dis‐
abled and ’unknown’ address set following flow is added.
· Priority 100 flow with the match inport == p and
action reg0[11] = lookup_fdb(inport, eth.src);
next;
· One priority-0 fallback flow that matches all packets and
advances to the next table.
Ingress Table 4: Learn MAC of ’unknown’ ports.
This table learns the MAC addresses seen on the logical ports whose
port security is disabled and ’unknown’ address set if the lookup_fdb
action returned false in the previous table.
· For each such logical port p whose port security is dis‐
abled and ’unknown’ address set following flow is added.
· Priority 100 flow with the match inport == p &&&&
reg0[11] == 0 and action put_fdb(inport, eth.src);
next; which stores the port-mac in the mac learn‐
ing table of the logical switch datapath and
advances the packet to the next table.
· One priority-0 fallback flow that matches all packets and
advances to the next table.
Ingress Table 5: from-lport Pre-ACLs
This table prepares flows for possible stateful ACL processing in
ingress table ACLs. It contains a priority-0 flow that simply moves
traffic to the next table. If stateful ACLs are used in the logical
datapath, a priority-100 flow is added that sets a hint (with reg0[0] =
1; next;) for table Pre-stateful to send IP packets to the connection
tracker before eventually advancing to ingress table ACLs. If special
ports such as route ports or localnet ports can’t use ct(), a prior‐
ity-110 flow is added to skip over stateful ACLs. IPv6 Neighbor Discov‐
ery and MLD traffic also skips stateful ACLs. For "allow-stateless"
ACLs, a flow is added to bypass setting the hint for connection tracker
processing.
This table also has a priority-110 flow with the match eth.dst == E for
all logical switch datapaths to move traffic to the next table. Where E
is the service monitor mac defined in the options:svc_monitor_mac colum
of NB_Global table.
Ingress Table 6: Pre-LB
This table prepares flows for possible stateful load balancing process‐
ing in ingress table LB and Stateful. It contains a priority-0 flow
that simply moves traffic to the next table. Moreover it contains a
priority-110 flow to move IPv6 Neighbor Discovery and MLD traffic to
the next table. If load balancing rules with virtual IP addresses (and
ports) are configured in OVN_Northbound database for a logical switch
datapath, a priority-100 flow is added with the match ip to match on IP
packets and sets the action reg0[2] = 1; next; to act as a hint for ta‐
ble Pre-stateful to send IP packets to the connection tracker for
packet de-fragmentation (and to possibly do DNAT for already estab‐
lished load balanced traffic) before eventually advancing to ingress
table Stateful. If controller_event has been enabled and load balancing
rules with empty backends have been added in OVN_Northbound, a 130 flow
is added to trigger ovn-controller events whenever the chassis receives
a packet for that particular VIP. If event-elb meter has been previ‐
ously created, it will be associated to the empty_lb logical flow
Prior to OVN 20.09 we were setting the reg0[0] = 1 only if the IP des‐
tination matches the load balancer VIP. However this had few issues
cases where a logical switch doesn’t have any ACLs with allow-related
action. To understand the issue lets a take a TCP load balancer -
10.0.0.10:80=10.0.0.3:80. If a logical port - p1 with IP - 10.0.0.5
opens a TCP connection with the VIP - 10.0.0.10, then the packet in the
ingress pipeline of ’p1’ is sent to the p1’s conntrack zone id and the
packet is load balanced to the backend - 10.0.0.3. For the reply packet
from the backend lport, it is not sent to the conntrack of backend
lport’s zone id. This is fine as long as the packet is valid. Suppose
the backend lport sends an invalid TCP packet (like incorrect sequence
number), the packet gets delivered to the lport ’p1’ without unDNATing
the packet to the VIP - 10.0.0.10. And this causes the connection to be
reset by the lport p1’s VIF.
We can’t fix this issue by adding a logical flow to drop ct.inv packets
in the egress pipeline since it will drop all other connections not
destined to the load balancers. To fix this issue, we send all the
packets to the conntrack in the ingress pipeline if a load balancer is
configured. We can now add a lflow to drop ct.inv packets.
This table also has a priority-110 flow with the match eth.dst == E for
all logical switch datapaths to move traffic to the next table. Where E
is the service monitor mac defined in the options:svc_monitor_mac colum
of NB_Global table.
This table also has a priority-110 flow with the match inport == I for
all logical switch datapaths to move traffic to the next table. Where I
is the peer of a logical router port. This flow is added to skip the
connection tracking of packets which enter from logical router datapath
to logical switch datapath.
Ingress Table 7: Pre-stateful
This table prepares flows for all possible stateful processing in next
tables. It contains a priority-0 flow that simply moves traffic to the
next table.
· Priority-120 flows that send the packets to connection
tracker using ct_lb; as the action so that the already
established traffic destined to the load balancer VIP
gets DNATted based on a hint provided by the previous
tables (with a match for reg0[2] == 1 and on supported
load balancer protocols and address families). For IPv4
traffic the flows also load the original destination IP
and transport port in registers reg1 and reg2. For IPv6
traffic the flows also load the original destination IP
and transport port in registers xxreg1 and reg2.
· A priority-110 flow sends the packets to connection
tracker based on a hint provided by the previous tables
(with a match for reg0[2] == 1) by using the ct_lb;
action. This flow is added to handle the traffic for load
balancer VIPs whose protocol is not defined (mainly for
ICMP traffic).
· A priority-100 flow sends the packets to connection
tracker based on a hint provided by the previous tables
(with a match for reg0[0] == 1) by using the ct_next;
action.
Ingress Table 8: from-lport ACL hints
This table consists of logical flows that set hints (reg0 bits) to be
used in the next stage, in the ACL processing table, if stateful ACLs
or load balancers are configured. Multiple hints can be set for the
same packet. The possible hints are:
· reg0[7]: the packet might match an allow-related ACL and
might have to commit the connection to conntrack.
· reg0[8]: the packet might match an allow-related ACL but
there will be no need to commit the connection to con‐
ntrack because it already exists.
· reg0[9]: the packet might match a drop/reject.
· reg0[10]: the packet might match a drop/reject ACL but
the connection was previously allowed so it might have to
be committed again with ct_label=1/1.
The table contains the following flows:
· A priority-65535 flow to advance to the next table if the
logical switch has no ACLs configured, otherwise a prior‐
ity-0 flow to advance to the next table.
· A priority-7 flow that matches on packets that initiate a
new session. This flow sets reg0[7] and reg0[9] and then
advances to the next table.
· A priority-6 flow that matches on packets that are in the
request direction of an already existing session that has
been marked as blocked. This flow sets reg0[7] and
reg0[9] and then advances to the next table.
· A priority-5 flow that matches untracked packets. This
flow sets reg0[8] and reg0[9] and then advances to the
next table.
· A priority-4 flow that matches on packets that are in the
request direction of an already existing session that has
not been marked as blocked. This flow sets reg0[8] and
reg0[10] and then advances to the next table.
· A priority-3 flow that matches on packets that are in not
part of established sessions. This flow sets reg0[9] and
then advances to the next table.
· A priority-2 flow that matches on packets that are part
of an established session that has been marked as
blocked. This flow sets reg0[9] and then advances to the
next table.
· A priority-1 flow that matches on packets that are part
of an established session that has not been marked as
blocked. This flow sets reg0[10] and then advances to the
next table.
Ingress table 9: from-lport ACLs
Logical flows in this table closely reproduce those in the ACL table in
the OVN_Northbound database for the from-lport direction. The priority
values from the ACL table have a limited range and have 1000 added to
them to leave room for OVN default flows at both higher and lower pri‐
orities.
· allow ACLs translate into logical flows with the next;
action. If there are any stateful ACLs on this datapath,
then allow ACLs translate to ct_commit; next; (which acts
as a hint for the next tables to commit the connection to
conntrack),
· allow-related ACLs translate into logical flows with the
ct_commit(ct_label=0/1); next; actions for new connec‐
tions and reg0[1] = 1; next; for existing connections.
· allow-stateless ACLs translate into logical flows with
the next; action.
· reject ACLs translate into logical flows with the
tcp_reset { output ->gt;>gt; inport; next(pipeline=egress,ta‐
ble=5);} action for TCP connections,icmp4/icmp6 action
for UDP connections, and sctp_abort {output -%gt;
inport; next(pipeline=egress,table=5);} action for SCTP
associations.
· Other ACLs translate to drop; for new or untracked con‐
nections and ct_commit(ct_label=1/1); for known connec‐
tions. Setting ct_label marks a connection as one that
was previously allowed, but should no longer be allowed
due to a policy change.
This table contains a priority-65535 flow to advance to the next table
if the logical switch has no ACLs configured, otherwise a priority-0
flow to advance to the next table so that ACLs allow packets by
default.
If the logical datapath has a stateful ACL or a load balancer with VIP
configured, the following flows will also be added:
· A priority-1 flow that sets the hint to commit IP traffic
to the connection tracker (with action reg0[1] = 1;
next;). This is needed for the default allow policy
because, while the initiator’s direction may not have any
stateful rules, the server’s may and then its return
traffic would not be known and marked as invalid.
· A priority-65532 flow that allows any traffic in the
reply direction for a connection that has been committed
to the connection tracker (i.e., established flows), as
long as the committed flow does not have ct_label.blocked
set. We only handle traffic in the reply direction here
because we want all packets going in the request direc‐
tion to still go through the flows that implement the
currently defined policy based on ACLs. If a connection
is no longer allowed by policy, ct_label.blocked will get
set and packets in the reply direction will no longer be
allowed, either.
· A priority-65532 flow that allows any traffic that is
considered related to a committed flow in the connection
tracker (e.g., an ICMP Port Unreachable from a non-lis‐
tening UDP port), as long as the committed flow does not
have ct_label.blocked set.
· A priority-65532 flow that drops all traffic marked by
the connection tracker as invalid.
· A priority-65532 flow that drops all traffic in the reply
direction with ct_label.blocked set meaning that the con‐
nection should no longer be allowed due to a policy
change. Packets in the request direction are skipped here
to let a newly created ACL re-allow this connection.
· A priority-65532 flow that allows IPv6 Neighbor solicita‐
tion, Neighbor discover, Router solicitation, Router
advertisement and MLD packets.
If the logical datapath has any ACL or a load balancer with VIP config‐
ured, the following flow will also be added:
· A priority 34000 logical flow is added for each logical
switch datapath with the match eth.dst = E to allow the
service monitor reply packet destined to ovn-controller
with the action next, where E is the service monitor mac
defined in the options:svc_monitor_mac colum of NB_Global
table.
Ingress Table 10: from-lport QoS Marking
Logical flows in this table closely reproduce those in the QoS table
with the action column set in the OVN_Northbound database for the
from-lport direction.
· For every qos_rules entry in a logical switch with DSCP
marking enabled, a flow will be added at the priority
mentioned in the QoS table.
· One priority-0 fallback flow that matches all packets and
advances to the next table.
Ingress Table 11: from-lport QoS Meter
Logical flows in this table closely reproduce those in the QoS table
with the bandwidth column set in the OVN_Northbound database for the
from-lport direction.
· For every qos_rules entry in a logical switch with meter‐
ing enabled, a flow will be added at the priority men‐
tioned in the QoS table.
· One priority-0 fallback flow that matches all packets and
advances to the next table.
Ingress Table 12: Stateful
· For all the configured load balancing rules for a switch
in OVN_Northbound database that includes a L4 port PORT
of protocol P and IP address VIP, a priority-120 flow is
added. For IPv4 VIPs , the flow matches ct.new &&&& ip &&&&
ip4.dst == VIP &&&& P &&&& P.dst == PORT. For IPv6 VIPs, the
flow matches ct.new &&&& ip &&&& ip6.dst == VIP &&&& P &&&& P.dst
== PORT. The flow’s action is ct_lb(args) , where args
contains comma separated IP addresses (and optional port
numbers) to load balance to. The address family of the IP
addresses of args is the same as the address family of
VIP. If health check is enabled, then args will only con‐
tain those endpoints whose service monitor status entry
in OVN_Southbound db is either online or empty. For IPv4
traffic the flow also loads the original destination IP
and transport port in registers reg1 and reg2. For IPv6
traffic the flow also loads the original destination IP
and transport port in registers xxreg1 and reg2.
· For all the configured load balancing rules for a switch
in OVN_Northbound database that includes just an IP
address VIP to match on, OVN adds a priority-110 flow.
For IPv4 VIPs, the flow matches ct.new &&&& ip &&&& ip4.dst
== VIP. For IPv6 VIPs, the flow matches ct.new &&&& ip &&&&
ip6.dst == VIP. The action on this flow is ct_lb(args),
where args contains comma separated IP addresses of the
same address family as VIP. For IPv4 traffic the flow
also loads the original destination IP and transport port
in registers reg1 and reg2. For IPv6 traffic the flow
also loads the original destination IP and transport port
in registers xxreg1 and reg2.
· If the load balancer is created with --reject option and
it has no active backends, a TCP reset segment (for tcp)
or an ICMP port unreachable packet (for all other kind of
traffic) will be sent whenever an incoming packet is
received for this load-balancer. Please note using
--reject option will disable empty_lb SB controller event
for this load balancer.
· A priority-100 flow commits packets to connection tracker
using ct_commit; next; action based on a hint provided by
the previous tables (with a match for reg0[1] == 1).
· A priority-0 flow that simply moves traffic to the next
table.
Ingress Table 13: Pre-Hairpin
· If the logical switch has load balancer(s) configured,
then a priority-100 flow is added with the match ip &&&&
ct.trk to check if the packet needs to be hairpinned (if
after load balancing the destination IP matches the
source IP) or not by executing the actions reg0[6] =
chk_lb_hairpin(); and reg0[12] = chk_lb_hairpin_reply();
and advances the packet to the next table.
· A priority-0 flow that simply moves traffic to the next
table.
Ingress Table 14: Nat-Hairpin
· If the logical switch has load balancer(s) configured,
then a priority-100 flow is added with the match ip &&&&
ct.new &&&& ct.trk &&&& reg0[6] == 1 which hairpins the traf‐
fic by NATting source IP to the load balancer VIP by exe‐
cuting the action ct_snat_to_vip and advances the packet
to the next table.
· If the logical switch has load balancer(s) configured,
then a priority-100 flow is added with the match ip &&&&
ct.est &&&& ct.trk &&&& reg0[6] == 1 which hairpins the traf‐
fic by NATting source IP to the load balancer VIP by exe‐
cuting the action ct_snat and advances the packet to the
next table.
· If the logical switch has load balancer(s) configured,
then a priority-90 flow is added with the match ip &&&&
reg0[12] == 1 which matches on the replies of hairpinned
traffic (i.e., destination IP is VIP, source IP is the
backend IP and source L4 port is backend port for L4 load
balancers) and executes ct_snat and advances the packet
to the next table.
· A priority-0 flow that simply moves traffic to the next
table.
Ingress Table 15: Hairpin
· A priority-1 flow that hairpins traffic matched by non-
default flows in the Pre-Hairpin table. Hairpinning is
done at L2, Ethernet addresses are swapped and the pack‐
ets are looped back on the input port.
· A priority-0 flow that simply moves traffic to the next
table.
Ingress Table 16: ARP/ND responder
This table implements ARP/ND responder in a logical switch for known
IPs. The advantage of the ARP responder flow is to limit ARP broadcasts
by locally responding to ARP requests without the need to send to other
hypervisors. One common case is when the inport is a logical port asso‐
ciated with a VIF and the broadcast is responded to on the local hyper‐
visor rather than broadcast across the whole network and responded to
by the destination VM. This behavior is proxy ARP.
ARP requests arrive from VMs from a logical switch inport of type
default. For this case, the logical switch proxy ARP rules can be for
other VMs or logical router ports. Logical switch proxy ARP rules may
be programmed both for mac binding of IP addresses on other logical
switch VIF ports (which are of the default logical switch port type,
representing connectivity to VMs or containers), and for mac binding of
IP addresses on logical switch router type ports, representing their
logical router port peers. In order to support proxy ARP for logical
router ports, an IP address must be configured on the logical switch
router type port, with the same value as the peer logical router port.
The configured MAC addresses must match as well. When a VM sends an ARP
request for a distributed logical router port and if the peer router
type port of the attached logical switch does not have an IP address
configured, the ARP request will be broadcast on the logical switch.
One of the copies of the ARP request will go through the logical switch
router type port to the logical router datapath, where the logical
router ARP responder will generate a reply. The MAC binding of a dis‐
tributed logical router, once learned by an associated VM, is used for
all that VM’s communication needing routing. Hence, the action of a VM
re-arping for the mac binding of the logical router port should be
rare.
Logical switch ARP responder proxy ARP rules can also be hit when
receiving ARP requests externally on a L2 gateway port. In this case,
the hypervisor acting as an L2 gateway, responds to the ARP request on
behalf of a destination VM.
Note that ARP requests received from localnet or vtep logical inports
can either go directly to VMs, in which case the VM responds or can hit
an ARP responder for a logical router port if the packet is used to
resolve a logical router port next hop address. In either case, logical
switch ARP responder rules will not be hit. It contains these logical
flows:
· Priority-100 flows to skip the ARP responder if inport is
of type localnet or vtep and advances directly to the
next table. ARP requests sent to localnet or vtep ports
can be received by multiple hypervisors. Now, because the
same mac binding rules are downloaded to all hypervisors,
each of the multiple hypervisors will respond. This will
confuse L2 learning on the source of the ARP requests.
ARP requests received on an inport of type router are not
expected to hit any logical switch ARP responder flows.
However, no skip flows are installed for these packets,
as there would be some additional flow cost for this and
the value appears limited.
· If inport V is of type virtual adds a priority-100 logi‐
cal flow for each P configured in the options:virtual-
parents column with the match
inport == P &&&& &&&& ((arp.op == 1 &&&& arp.spa == VIP &&&& arp.tpa == VIP) || (arp.op == 2 &&&& arp.spa == VIP))
and applies the action
bind_vport(V, inport);
and advances the packet to the next table.
Where VIP is the virtual ip configured in the column
options:virtual-ip.
· Priority-50 flows that match ARP requests to each known
IP address A of every logical switch port, and respond
with ARP replies directly with corresponding Ethernet
address E:
eth.dst = eth.src;
eth.src = E;
arp.op = 2; /* ARP reply. */
arp.tha = arp.sha;
arp.sha = E;
arp.tpa = arp.spa;
arp.spa = A;
outport = inport;
flags.loopback = 1;
output;
These flows are omitted for logical ports (other than
router ports or localport ports) that are down (unless
ignore_lsp_down is configured as true in options column
of NB_Global table of the Northbound database), for logi‐
cal ports of type virtual, for logical ports with
’unknown’ address set and for logical ports of a logical
switch configured with other_config:vlan-passthru=true.
· Priority-50 flows that match IPv6 ND neighbor solicita‐
tions to each known IP address A (and A’s solicited node
address) of every logical switch port except of type
router, and respond with neighbor advertisements directly
with corresponding Ethernet address E:
nd_na {
eth.src = E;
ip6.src = A;
nd.target = A;
nd.tll = E;
outport = inport;
flags.loopback = 1;
output;
};
Priority-50 flows that match IPv6 ND neighbor solicita‐
tions to each known IP address A (and A’s solicited node
address) of logical switch port of type router, and
respond with neighbor advertisements directly with corre‐
sponding Ethernet address E:
nd_na_router {
eth.src = E;
ip6.src = A;
nd.target = A;
nd.tll = E;
outport = inport;
flags.loopback = 1;
output;
};
These flows are omitted for logical ports (other than
router ports or localport ports) that are down (unless
ignore_lsp_down is configured as true in options column
of NB_Global table of the Northbound database), for logi‐
cal ports of type virtual and for logical ports with
’unknown’ address set.
· Priority-100 flows with match criteria like the ARP and
ND flows above, except that they only match packets from
the inport that owns the IP addresses in question, with
action next;. These flows prevent OVN from replying to,
for example, an ARP request emitted by a VM for its own
IP address. A VM only makes this kind of request to
attempt to detect a duplicate IP address assignment, so
sending a reply will prevent the VM from accepting the IP
address that it owns.
In place of next;, it would be reasonable to use drop;
for the flows’ actions. If everything is working as it is
configured, then this would produce equivalent results,
since no host should reply to the request. But ARPing for
one’s own IP address is intended to detect situations
where the network is not working as configured, so drop‐
ping the request would frustrate that intent.
· For each SVC_MON_SRC_IP defined in the value of the
ip_port_mappings:ENDPOINT_IP column of Load_Balancer ta‐
ble, priority-110 logical flow is added with the match
arp.tpa == SVC_MON_SRC_IP &&&& &&&& arp.op == 1 and applies
the action
eth.dst = eth.src;
eth.src = E;
arp.op = 2; /* ARP reply. */
arp.tha = arp.sha;
arp.sha = E;
arp.tpa = arp.spa;
arp.spa = A;
outport = inport;
flags.loopback = 1;
output;
where E is the service monitor source mac defined in the
options:svc_monitor_mac column in the NB_Global table.
This mac is used as the source mac in the service monitor
packets for the load balancer endpoint IP health checks.
SVC_MON_SRC_IP is used as the source ip in the service
monitor IPv4 packets for the load balancer endpoint IP
health checks.
These flows are required if an ARP request is sent for
the IP SVC_MON_SRC_IP.
· For each VIP configured in the table Forwarding_Group a
priority-50 logical flow is added with the match arp.tpa
== vip &&&& &&&& arp.op == 1
and applies the action
eth.dst = eth.src;
eth.src = E;
arp.op = 2; /* ARP reply. */
arp.tha = arp.sha;
arp.sha = E;
arp.tpa = arp.spa;
arp.spa = A;
outport = inport;
flags.loopback = 1;
output;
where E is the forwarding group’s mac defined in the
vmac.
A is used as either the destination ip for load balancing
traffic to child ports or as nexthop to hosts behind the
child ports.
These flows are required to respond to an ARP request if
an ARP request is sent for the IP vip.
· One priority-0 fallback flow that matches all packets and
advances to the next table.
Ingress Table 17: DHCP option processing
This table adds the DHCPv4 options to a DHCPv4 packet from the logical
ports configured with IPv4 address(es) and DHCPv4 options, and simi‐
larly for DHCPv6 options. This table also adds flows for the logical
ports of type external.
· A priority-100 logical flow is added for these logical
ports which matches the IPv4 packet with udp.src = 68 and
udp.dst = 67 and applies the action put_dhcp_opts and
advances the packet to the next table.
reg0[3] = put_dhcp_opts(offer_ip = ip, options...);
next;
For DHCPDISCOVER and DHCPREQUEST, this transforms the
packet into a DHCP reply, adds the DHCP offer IP ip and
options to the packet, and stores 1 into reg0[3]. For
other kinds of packets, it just stores 0 into reg0[3].
Either way, it continues to the next table.
· A priority-100 logical flow is added for these logical
ports which matches the IPv6 packet with udp.src = 546
and udp.dst = 547 and applies the action put_dhcpv6_opts
and advances the packet to the next table.
reg0[3] = put_dhcpv6_opts(ia_addr = ip, options...);
next;
For DHCPv6 Solicit/Request/Confirm packets, this trans‐
forms the packet into a DHCPv6 Advertise/Reply, adds the
DHCPv6 offer IP ip and options to the packet, and stores
1 into reg0[3]. For other kinds of packets, it just
stores 0 into reg0[3]. Either way, it continues to the
next table.
· A priority-0 flow that matches all packets to advances to
table 16.
Ingress Table 18: DHCP responses
This table implements DHCP responder for the DHCP replies generated by
the previous table.
· A priority 100 logical flow is added for the logical
ports configured with DHCPv4 options which matches IPv4
packets with udp.src == 68 &&&& udp.dst == 67 &&&& reg0[3] ==
1 and responds back to the inport after applying these
actions. If reg0[3] is set to 1, it means that the action
put_dhcp_opts was successful.
eth.dst = eth.src;
eth.src = E;
ip4.src = S;
udp.src = 67;
udp.dst = 68;
outport = P;
flags.loopback = 1;
output;
where E is the server MAC address and S is the server
IPv4 address defined in the DHCPv4 options. Note that
ip4.dst field is handled by put_dhcp_opts.
(This terminates ingress packet processing; the packet
does not go to the next ingress table.)
· A priority 100 logical flow is added for the logical
ports configured with DHCPv6 options which matches IPv6
packets with udp.src == 546 &&&& udp.dst == 547 &&&& reg0[3]
== 1 and responds back to the inport after applying these
actions. If reg0[3] is set to 1, it means that the action
put_dhcpv6_opts was successful.
eth.dst = eth.src;
eth.src = E;
ip6.dst = A;
ip6.src = S;
udp.src = 547;
udp.dst = 546;
outport = P;
flags.loopback = 1;
output;
where E is the server MAC address and S is the server
IPv6 LLA address generated from the server_id defined in
the DHCPv6 options and A is the IPv6 address defined in
the logical port’s addresses column.
(This terminates packet processing; the packet does not
go on the next ingress table.)
· A priority-0 flow that matches all packets to advances to
table 17.
Ingress Table 19 DNS Lookup
This table looks up and resolves the DNS names to the corresponding
configured IP address(es).
· A priority-100 logical flow for each logical switch data‐
path if it is configured with DNS records, which matches
the IPv4 and IPv6 packets with udp.dst = 53 and applies
the action dns_lookup and advances the packet to the next
table.
reg0[4] = dns_lookup(); next;
For valid DNS packets, this transforms the packet into a
DNS reply if the DNS name can be resolved, and stores 1
into reg0[4]. For failed DNS resolution or other kinds of
packets, it just stores 0 into reg0[4]. Either way, it
continues to the next table.
Ingress Table 20 DNS Responses
This table implements DNS responder for the DNS replies generated by
the previous table.
· A priority-100 logical flow for each logical switch data‐
path if it is configured with DNS records, which matches
the IPv4 and IPv6 packets with udp.dst = 53 &&&& reg0[4] ==
1 and responds back to the inport after applying these
actions. If reg0[4] is set to 1, it means that the action
dns_lookup was successful.
eth.dst ->gt;>gt; eth.src;
ip4.src ->gt;>gt; ip4.dst;
udp.dst = udp.src;
udp.src = 53;
outport = P;
flags.loopback = 1;
output;
(This terminates ingress packet processing; the packet
does not go to the next ingress table.)
Ingress table 21 External ports
Traffic from the external logical ports enter the ingress datapath
pipeline via the localnet port. This table adds the below logical flows
to handle the traffic from these ports.
· A priority-100 flow is added for each external logical
port which doesn’t reside on a chassis to drop the
ARP/IPv6 NS request to the router IP(s) (of the logical
switch) which matches on the inport of the external logi‐
cal port and the valid eth.src address(es) of the exter‐
nal logical port.
This flow guarantees that the ARP/NS request to the
router IP address from the external ports is responded by
only the chassis which has claimed these external ports.
All the other chassis, drops these packets.
A priority-100 flow is added for each external logical
port which doesn’t reside on a chassis to drop any packet
destined to the router mac - with the match inport ==
external &&&& eth.src == E &&&& eth.dst == R &&&& !is_chas‐
sis_resident("external") where E is the external port mac
and R is the router port mac.
· A priority-0 flow that matches all packets to advances to
table 20.
Ingress Table 22 Destination Lookup
This table implements switching behavior. It contains these logical
flows:
· A priority-110 flow with the match eth.src == E for all
logical switch datapaths and applies the action han‐
dle_svc_check(inport). Where E is the service monitor mac
defined in the options:svc_monitor_mac colum of NB_Global
table.
· A priority-100 flow that punts all IGMP/MLD packets to
ovn-controller if multicast snooping is enabled on the
logical switch. The flow also forwards the IGMP/MLD pack‐
ets to the MC_MROUTER_STATIC multicast group, which
ovn-northd populates with all the logical ports that have
options :mcast_flood_reports=’true’.
· Priority-90 flows that forward registered IP multicast
traffic to their corresponding multicast group, which
ovn-northd creates based on learnt IGMP_Group entries.
The flows also forward packets to the MC_MROUTER_FLOOD
multicast group, which ovn-nortdh populates with all the
logical ports that are connected to logical routers with
options:mcast_relay=’true’.
· A priority-85 flow that forwards all IP multicast traffic
destined to 224.0.0.X to the MC_FLOOD multicast group,
which ovn-northd populates with all enabled logical
ports.
· A priority-85 flow that forwards all IP multicast traffic
destined to reserved multicast IPv6 addresses (RFC 4291,
2.7.1, e.g., Solicited-Node multicast) to the MC_FLOOD
multicast group, which ovn-northd populates with all
enabled logical ports.
· A priority-80 flow that forwards all unregistered IP mul‐
ticast traffic to the MC_STATIC multicast group, which
ovn-northd populates with all the logical ports that have
options :mcast_flood=’true’. The flow also forwards
unregistered IP multicast traffic to the MC_MROUTER_FLOOD
multicast group, which ovn-northd populates with all the
logical ports connected to logical routers that have
options :mcast_relay=’true’.
· A priority-80 flow that drops all unregistered IP multi‐
cast traffic if other_config :mcast_snoop=’true’ and
other_config :mcast_flood_unregistered=’false’ and the
switch is not connected to a logical router that has
options :mcast_relay=’true’ and the switch doesn’t have
any logical port with options :mcast_flood=’true’.
· Priority-80 flows for each IP address/VIP/NAT address
owned by a router port connected to the switch. These
flows match ARP requests and ND packets for the specific
IP addresses. Matched packets are forwarded only to the
router that owns the IP address and to the MC_FLOOD_L2
multicast group which contains all non-router logical
ports.
· Priority-75 flows for each port connected to a logical
router matching self originated ARP request/ND packets.
These packets are flooded to the MC_FLOOD_L2 which con‐
tains all non-router logical ports.
· A priority-70 flow that outputs all packets with an Eth‐
ernet broadcast or multicast eth.dst to the MC_FLOOD mul‐
ticast group.
· One priority-50 flow that matches each known Ethernet
address against eth.dst and outputs the packet to the
single associated output port.
For the Ethernet address on a logical switch port of type
router, when that logical switch port’s addresses column
is set to router and the connected logical router port
has a gateway chassis:
· The flow for the connected logical router port’s
Ethernet address is only programmed on the gateway
chassis.
· If the logical router has rules specified in nat
with external_mac, then those addresses are also
used to populate the switch’s destination lookup
on the chassis where logical_port is resident.
For the Ethernet address on a logical switch port of type
router, when that logical switch port’s addresses column
is set to router and the connected logical router port
specifies a reside-on-redirect-chassis and the logical
router to which the connected logical router port belongs
to has a distributed gateway LRP:
· The flow for the connected logical router port’s
Ethernet address is only programmed on the gateway
chassis.
For each forwarding group configured on the logical
switch datapath, a priority-50 flow that matches on
eth.dst == VIP
with an action of fwd_group(childports=args ), where
args contains comma separated logical switch child ports
to load balance to. If liveness is enabled, then action
also includes liveness=true.
· One priority-0 fallback flow that matches all packets
with the action outport = get_fdb(eth.dst); next;. The
action get_fdb gets the port for the eth.dst in the MAC
learning table of the logical switch datapath. If there
is no entry for eth.dst in the MAC learning table, then
it stores none in the outport.
Ingress Table 24 Destination unknown
This table handles the packets whose destination was not found or and
looked up in the MAC learning table of the logical switch datapath. It
contains the following flows.
· If the logical switch has logical ports with ’unknown’
addresses set, then the below logical flow is added
· Priority 50 flow with the match outport == none
then outputs them to the MC_UNKNOWN multicast
group, which ovn-northd populates with all enabled
logical ports that accept unknown destination
packets. As a small optimization, if no logical
ports accept unknown destination packets,
ovn-northd omits this multicast group and logical
flow.
If the logical switch has no logical ports with ’unknown’
address set, then the below logical flow is added
· Priority 50 flow with the match outport == none
and drops the packets.
· One priority-0 fallback flow that outputs the packet to
the egress stage with the outport learnt from get_fdb
action.
Egress Table 0: Pre-LB
This table is similar to ingress table Pre-LB. It contains a priority-0
flow that simply moves traffic to the next table. Moreover it contains
a priority-110 flow to move IPv6 Neighbor Discovery traffic to the next
table. If any load balancing rules exist for the datapath, a prior‐
ity-100 flow is added with a match of ip and action of reg0[2] = 1;
next; to act as a hint for table Pre-stateful to send IP packets to the
connection tracker for packet de-fragmentation and possibly DNAT the
destination VIP to one of the selected backend for already commited
load balanced traffic.
This table also has a priority-110 flow with the match eth.src == E for
all logical switch datapaths to move traffic to the next table. Where E
is the service monitor mac defined in the options:svc_monitor_mac colum
of NB_Global table.
Egress Table 1: to-lport Pre-ACLs
This is similar to ingress table Pre-ACLs except for to-lport traffic.
This table also has a priority-110 flow with the match eth.src == E for
all logical switch datapaths to move traffic to the next table. Where E
is the service monitor mac defined in the options:svc_monitor_mac colum
of NB_Global table.
This table also has a priority-110 flow with the match outport == I for
all logical switch datapaths to move traffic to the next table. Where I
is the peer of a logical router port. This flow is added to skip the
connection tracking of packets which will be entering logical router
datapath from logical switch datapath for routing.
Egress Table 2: Pre-stateful
This is similar to ingress table Pre-stateful. This table adds the
below 3 logical flows.
· A Priority-120 flow that send the packets to connection
tracker using ct_lb; as the action so that the already
established traffic gets unDNATted from the backend IP to
the load balancer VIP based on a hint provided by the
previous tables with a match for reg0[2] == 1. If the
packet was not DNATted earlier, then ct_lb functions like
ct_next.
· A priority-100 flow sends the packets to connection
tracker based on a hint provided by the previous tables
(with a match for reg0[0] == 1) by using the ct_next;
action.
· A priority-0 flow that matches all packets to advance to
the next table.
Egress Table 3: from-lport ACL hints
This is similar to ingress table ACL hints.
Egress Table 4: to-lport ACLs
This is similar to ingress table ACLs except for to-lport ACLs.
In addition, the following flows are added.
· A priority 34000 logical flow is added for each logical
port which has DHCPv4 options defined to allow the DHCPv4
reply packet and which has DHCPv6 options defined to
allow the DHCPv6 reply packet from the Ingress Table 16:
DHCP responses.
· A priority 34000 logical flow is added for each logical
switch datapath configured with DNS records with the
match udp.dst = 53 to allow the DNS reply packet from the
Ingress Table 18: DNS responses.
· A priority 34000 logical flow is added for each logical
switch datapath with the match eth.src = E to allow the
service monitor request packet generated by ovn-con‐
troller with the action next, where E is the service mon‐
itor mac defined in the options:svc_monitor_mac colum of
NB_Global table.
Egress Table 5: to-lport QoS Marking
This is similar to ingress table QoS marking except they apply to
to-lport QoS rules.
Egress Table 6: to-lport QoS Meter
This is similar to ingress table QoS meter except they apply to
to-lport QoS rules.
Egress Table 7: Stateful
This is similar to ingress table Stateful except that there are no
rules added for load balancing new connections.
Egress Table 8: Egress Port Security - IP
This is similar to the port security logic in table Ingress Port Secu‐
rity - IP except that outport, eth.dst, ip4.dst and ip6.dst are checked
instead of inport, eth.src, ip4.src and ip6.src
Egress Table 9: Egress Port Security - L2
This is similar to the ingress port security logic in ingress table
Admission Control and Ingress Port Security - L2, but with important
differences. Most obviously, outport and eth.dst are checked instead of
inport and eth.src. Second, packets directed to broadcast or multicast
eth.dst are always accepted instead of being subject to the port secu‐
rity rules; this is implemented through a priority-100 flow that
matches on eth.mcast with action output;. Moreover, to ensure that even
broadcast and multicast packets are not delivered to disabled logical
ports, a priority-150 flow for each disabled logical outport overrides
the priority-100 flow with a drop; action. Finally if egress qos has
been enabled on a localnet port, the outgoing queue id is set through
set_queue action. Please remember to mark the corresponding physical
interface with ovn-egress-iface set to true in external_ids
Logical Router Datapaths
Logical router datapaths will only exist for Logical_Router rows in the
OVN_Northbound database that do not have enabled set to false
Ingress Table 0: L2 Admission Control
This table drops packets that the router shouldn’t see at all based on
their Ethernet headers. It contains the following flows:
· Priority-100 flows to drop packets with VLAN tags or mul‐
ticast Ethernet source addresses.
· For each enabled router port P with Ethernet address E, a
priority-50 flow that matches inport == P &&&& (eth.mcast
|| eth.dst == E), stores the router port ethernet address
and advances to next table, with action xreg0[0..47]=E;
next;.
For the gateway port on a distributed logical router
(where one of the logical router ports specifies a gate‐
way chassis), the above flow matching eth.dst == E is
only programmed on the gateway port instance on the gate‐
way chassis.
· For each dnat_and_snat NAT rule on a distributed router
that specifies an external Ethernet address E, a prior‐
ity-50 flow that matches inport == GW &&&& eth.dst == E,
where GW is the logical router gateway port, with action
xreg0[0..47]=E; next;.
This flow is only programmed on the gateway port instance
on the chassis where the logical_port specified in the
NAT rule resides.
Other packets are implicitly dropped.
Ingress Table 1: Neighbor lookup
For ARP and IPv6 Neighbor Discovery packets, this table looks into the
MAC_Binding records to determine if OVN needs to learn the mac bind‐
ings. Following flows are added:
· For each router port P that owns IP address A, which
belongs to subnet S with prefix length L, if the option
always_learn_from_arp_request is true for this router, a
priority-100 flow is added which matches inport == P &&&&
arp.spa == S/L &&&& arp.op == 1 (ARP request) with the fol‐
lowing actions:
reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
next;
If the option always_learn_from_arp_request is false, the
following two flows are added.
A priority-110 flow is added which matches inport == P &&&&
arp.spa == S/L &&&& arp.tpa == A &&&& arp.op == 1 (ARP
request) with the following actions:
reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
reg9[3] = 1;
next;
A priority-100 flow is added which matches inport == P &&&&
arp.spa == S/L &&&& arp.op == 1 (ARP request) with the fol‐
lowing actions:
reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
reg9[3] = lookup_arp_ip(inport, arp.spa);
next;
If the logical router port P is a distributed gateway
router port, additional match is_chassis_resident(cr-P)
is added for all these flows.
· A priority-100 flow which matches on ARP reply packets
and applies the actions if the option
always_learn_from_arp_request is true:
reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
next;
If the option always_learn_from_arp_request is false, the
above actions will be:
reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
reg9[3] = 1;
next;
· A priority-100 flow which matches on IPv6 Neighbor Dis‐
covery advertisement packet and applies the actions if
the option always_learn_from_arp_request is true:
reg9[2] = lookup_nd(inport, nd.target, nd.tll);
next;
If the option always_learn_from_arp_request is false, the
above actions will be:
reg9[2] = lookup_nd(inport, nd.target, nd.tll);
reg9[3] = 1;
next;
· A priority-100 flow which matches on IPv6 Neighbor Dis‐
covery solicitation packet and applies the actions if the
option always_learn_from_arp_request is true:
reg9[2] = lookup_nd(inport, ip6.src, nd.sll);
next;
If the option always_learn_from_arp_request is false, the
above actions will be:
reg9[2] = lookup_nd(inport, ip6.src, nd.sll);
reg9[3] = lookup_nd_ip(inport, ip6.src);
next;
· A priority-0 fallback flow that matches all packets and
applies the action reg9[2] = 1; next; advancing the
packet to the next table.
Ingress Table 2: Neighbor learning
This table adds flows to learn the mac bindings from the ARP and IPv6
Neighbor Solicitation/Advertisement packets if it is needed according
to the lookup results from the previous stage.
reg9[2] will be 1 if the lookup_arp/lookup_nd in the previous table was
successful or skipped, meaning no need to learn mac binding from the
packet.
reg9[3] will be 1 if the lookup_arp_ip/lookup_nd_ip in the previous ta‐
ble was successful or skipped, meaning it is ok to learn mac binding
from the packet (if reg9[2] is 0).
· A priority-100 flow with the match reg9[2] == 1 ||
reg9[3] == 0 and advances the packet to the next table as
there is no need to learn the neighbor.
· A priority-90 flow with the match arp and applies the
action put_arp(inport, arp.spa, arp.sha); next;
· A priority-90 flow with the match nd_na and applies the
action put_nd(inport, nd.target, nd.tll); next;
· A priority-90 flow with the match nd_ns and applies the
action put_nd(inport, ip6.src, nd.sll); next;
Ingress Table 3: IP Input
This table is the core of the logical router datapath functionality. It
contains the following flows to implement very basic IP host function‐
ality.
· For each NAT entry of a distributed logical router (with
distributed gateway router port) of type snat, a prior‐
ity-120 flow with the match inport == P &&&& ip4.src == A
advances the packet to the next pipeline, where P is the
distributed logical router port and A is the external_ip
set in the NAT entry. If A is an IPv6 address, then
ip6.src is used for the match.
The above flow is required to handle the routing of the
East/west NAT traffic.
· For each BFD port the two following priority-110 flows
are added to manage BFD traffic:
· if ip4.src or ip6.src is any IP address owned by
the router port and udp.dst == 3784 , the packet
is advanced to the next pipeline stage.
· if ip4.dst or ip6.dst is any IP address owned by
the router port and udp.dst == 3784 , the han‐
dle_bfd_msg action is executed.
· L3 admission control: A priority-100 flow drops packets
that match any of the following:
· ip4.src[28..31] == 0xe (multicast source)
· ip4.src == 255.255.255.255 (broadcast source)
· ip4.src == 127.0.0.0/8 || ip4.dst == 127.0.0.0/8
(localhost source or destination)
· ip4.src == 0.0.0.0/8 || ip4.dst == 0.0.0.0/8 (zero
network source or destination)
· ip4.src or ip6.src is any IP address owned by the
router, unless the packet was recirculated due to
egress loopback as indicated by REG‐
BIT_EGRESS_LOOPBACK.
· ip4.src is the broadcast address of any IP network
known to the router.
· A priority-100 flow parses DHCPv6 replies from IPv6 pre‐
fix delegation routers (udp.src == 547 &&&& udp.dst ==
546). The handle_dhcpv6_reply is used to send IPv6 prefix
delegation messages to the delegation router.
· ICMP echo reply. These flows reply to ICMP echo requests
received for the router’s IP address. Let A be an IP
address owned by a router port. Then, for each A that is
an IPv4 address, a priority-90 flow matches on ip4.dst ==
A and icmp4.type == 8 &&&& icmp4.code == 0 (ICMP echo
request). For each A that is an IPv6 address, a prior‐
ity-90 flow matches on ip6.dst == A and icmp6.type == 128
&&&& icmp6.code == 0 (ICMPv6 echo request). The port of the
router that receives the echo request does not matter.
Also, the ip.ttl of the echo request packet is not
checked, so it complies with RFC 1812, section 4.2.2.9.
Flows for ICMPv4 echo requests use the following actions:
ip4.dst ->gt;>gt; ip4.src;
ip.ttl = 255;
icmp4.type = 0;
flags.loopback = 1;
next;
Flows for ICMPv6 echo requests use the following actions:
ip6.dst ->gt;>gt; ip6.src;
ip.ttl = 255;
icmp6.type = 129;
flags.loopback = 1;
next;
· Reply to ARP requests.
These flows reply to ARP requests for the router’s own IP
address. The ARP requests are handled only if the
requestor’s IP belongs to the same subnets of the logical
router port. For each router port P that owns IP address
A, which belongs to subnet S with prefix length L, and
Ethernet address E, a priority-90 flow matches inport ==
P &&&& arp.spa == S/L &&&& arp.op == 1 &&&& arp.tpa == A (ARP
request) with the following actions:
eth.dst = eth.src;
eth.src = xreg0[0..47];
arp.op = 2; /* ARP reply. */
arp.tha = arp.sha;
arp.sha = xreg0[0..47];
arp.tpa = arp.spa;
arp.spa = A;
outport = inport;
flags.loopback = 1;
output;
For the gateway port on a distributed logical router
(where one of the logical router ports specifies a gate‐
way chassis), the above flows are only programmed on the
gateway port instance on the gateway chassis. This behav‐
ior avoids generation of multiple ARP responses from dif‐
ferent chassis, and allows upstream MAC learning to point
to the gateway chassis.
For the logical router port with the option reside-on-re‐
direct-chassis set (which is centralized), the above
flows are only programmed on the gateway port instance on
the gateway chassis (if the logical router has a distrib‐
uted gateway port). This behavior avoids generation of
multiple ARP responses from different chassis, and allows
upstream MAC learning to point to the gateway chassis.
· Reply to IPv6 Neighbor Solicitations. These flows reply
to Neighbor Solicitation requests for the router’s own
IPv6 address and populate the logical router’s mac bind‐
ing table.
For each router port P that owns IPv6 address A,
solicited node address S, and Ethernet address E, a pri‐
ority-90 flow matches inport == P &&&& nd_ns &&&& ip6.dst ==
{A, E} &&&& nd.target == A with the following actions:
nd_na_router {
eth.src = xreg0[0..47];
ip6.src = A;
nd.target = A;
nd.tll = xreg0[0..47];
outport = inport;
flags.loopback = 1;
output;
};
For the gateway port on a distributed logical router
(where one of the logical router ports specifies a gate‐
way chassis), the above flows replying to IPv6 Neighbor
Solicitations are only programmed on the gateway port
instance on the gateway chassis. This behavior avoids
generation of multiple replies from different chassis,
and allows upstream MAC learning to point to the gateway
chassis.
· These flows reply to ARP requests or IPv6 neighbor solic‐
itation for the virtual IP addresses configured in the
router for NAT (both DNAT and SNAT) or load balancing.
IPv4: For a configured NAT (both DNAT and SNAT) IP
address or a load balancer IPv4 VIP A, for each router
port P with Ethernet address E, a priority-90 flow
matches arp.op == 1 &&&& arp.tpa == A (ARP request) with
the following actions:
eth.dst = eth.src;
eth.src = xreg0[0..47];
arp.op = 2; /* ARP reply. */
arp.tha = arp.sha;
arp.sha = xreg0[0..47];
arp.tpa ->gt;>gt; arp.spa;
outport = inport;
flags.loopback = 1;
output;
IPv4: For a configured load balancer IPv4 VIP, a similar
flow is added with the additional match inport == P.
If the router port P is a distributed gateway router
port, then the is_chassis_resident(P) is also added in
the match condition for the load balancer IPv4 VIP A.
IPv6: For a configured NAT (both DNAT and SNAT) IP
address or a load balancer IPv6 VIP A, solicited node
address S, for each router port P with Ethernet address
E, a priority-90 flow matches inport == P &&&& nd_ns &&&&
ip6.dst == {A, S} &&&& nd.target == A with the following
actions:
eth.dst = eth.src;
nd_na {
eth.src = xreg0[0..47];
nd.tll = xreg0[0..47];
ip6.src = A;
nd.target = A;
outport = inport;
flags.loopback = 1;
output;
}
If the router port P is a distributed gateway router
port, then the is_chassis_resident(P) is also added in
the match condition for the load balancer IPv6 VIP A.
For the gateway port on a distributed logical router with
NAT (where one of the logical router ports specifies a
gateway chassis):
· If the corresponding NAT rule cannot be handled in
a distributed manner, then a priority-92 flow is
programmed on the gateway port instance on the
gateway chassis. A priority-91 drop flow is pro‐
grammed on the other chassis when ARP requests/NS
packets are received on the gateway port. This
behavior avoids generation of multiple ARP
responses from different chassis, and allows
upstream MAC learning to point to the gateway
chassis.
· If the corresponding NAT rule can be handled in a
distributed manner, then this flow is only pro‐
grammed on the gateway port instance where the
logical_port specified in the NAT rule resides.
Some of the actions are different for this case,
using the external_mac specified in the NAT rule
rather than the gateway port’s Ethernet address E:
eth.src = external_mac;
arp.sha = external_mac;
or in the case of IPv6 neighbor solicition:
eth.src = external_mac;
nd.tll = external_mac;
This behavior avoids generation of multiple ARP
responses from different chassis, and allows
upstream MAC learning to point to the correct
chassis.
· Priority-85 flows which drops the ARP and IPv6 Neighbor
Discovery packets.
· A priority-84 flow explicitly allows IPv6 multicast traf‐
fic that is supposed to reach the router pipeline (i.e.,
router solicitation and router advertisement packets).
· A priority-83 flow explicitly drops IPv6 multicast traf‐
fic that is destined to reserved multicast groups.
· A priority-82 flow allows IP multicast traffic if
options:mcast_relay=’true’, otherwise drops it.
· UDP port unreachable. Priority-80 flows generate ICMP
port unreachable messages in reply to UDP datagrams
directed to the router’s IP address, except in the spe‐
cial case of gateways, which accept traffic directed to a
router IP for load balancing and NAT purposes.
These flows should not match IP fragments with nonzero
offset.
· TCP reset. Priority-80 flows generate TCP reset messages
in reply to TCP datagrams directed to the router’s IP
address, except in the special case of gateways, which
accept traffic directed to a router IP for load balancing
and NAT purposes.
These flows should not match IP fragments with nonzero
offset.
· Protocol or address unreachable. Priority-70 flows gener‐
ate ICMP protocol or address unreachable messages for
IPv4 and IPv6 respectively in reply to packets directed
to the router’s IP address on IP protocols other than
UDP, TCP, and ICMP, except in the special case of gate‐
ways, which accept traffic directed to a router IP for
load balancing purposes.
These flows should not match IP fragments with nonzero
offset.
· Drop other IP traffic to this router. These flows drop
any other traffic destined to an IP address of this
router that is not already handled by one of the flows
above, which amounts to ICMP (other than echo requests)
and fragments with nonzero offsets. For each IP address A
owned by the router, a priority-60 flow matches ip4.dst
== A or ip6.dst == A and drops the traffic. An exception
is made and the above flow is not added if the router
port’s own IP address is used to SNAT packets passing
through that router.
The flows above handle all of the traffic that might be directed to the
router itself. The following flows (with lower priorities) handle the
remaining traffic, potentially for forwarding:
· Drop Ethernet local broadcast. A priority-50 flow with
match eth.bcast drops traffic destined to the local Eth‐
ernet broadcast address. By definition this traffic
should not be forwarded.
· ICMP time exceeded. For each router port P, whose IP
address is A, a priority-40 flow with match inport == P
&&&& ip.ttl == {0, 1} &&&& !ip.later_frag matches packets
whose TTL has expired, with the following actions to send
an ICMP time exceeded reply for IPv4 and IPv6 respec‐
tively:
icmp4 {
icmp4.type = 11; /* Time exceeded. */
icmp4.code = 0; /* TTL exceeded in transit. */
ip4.dst = ip4.src;
ip4.src = A;
ip.ttl = 255;
next;
};
icmp6 {
icmp6.type = 3; /* Time exceeded. */
icmp6.code = 0; /* TTL exceeded in transit. */
ip6.dst = ip6.src;
ip6.src = A;
ip.ttl = 255;
next;
};
· TTL discard. A priority-30 flow with match ip.ttl == {0,
1} and actions drop; drops other packets whose TTL has
expired, that should not receive a ICMP error reply (i.e.
fragments with nonzero offset).
· Next table. A priority-0 flows match all packets that
aren’t already handled and uses actions next; to feed
them to the next table.
Ingress Table 4: DEFRAG
This is to send packets to connection tracker for tracking and defrag‐
mentation. It contains a priority-0 flow that simply moves traffic to
the next table.
If load balancing rules with virtual IP addresses (and ports) are con‐
figured in OVN_Northbound database for a Gateway router, a priority-100
flow is added for each configured virtual IP address VIP. For IPv4 VIPs
the flow matches ip &&&& ip4.dst == VIP. For IPv6 VIPs, the flow matches
ip &&&& ip6.dst == VIP. The flow uses the action ct_next; to send IP
packets to the connection tracker for packet de-fragmentation and
tracking before sending it to the next table.
If ECMP routes with symmetric reply are configured in the OVN_North‐
bound database for a gateway router, a priority-300 flow is added for
each router port on which symmetric replies are configured. The match‐
ing logic for these ports essentially reverses the configured logic of
the ECMP route. So for instance, a route with a destination routing
policy will instead match if the source IP address matches the static
route’s prefix. The flow uses the action ct_next to send IP packets to
the connection tracker for packet de-fragmentation and tracking before
sending it to the next table.
Ingress Table 5: UNSNAT
This is for already established connections’ reverse traffic. i.e.,
SNAT has already been done in egress pipeline and now the packet has
entered the ingress pipeline as part of a reply. It is unSNATted here.
Ingress Table 5: UNSNAT on Gateway and Distributed Routers
· If the Router (Gateway or Distributed) is configured with
load balancers, then below lflows are added:
For each IPv4 address A defined as load balancer VIP with
the protocol P (and the protocol port T if defined) is
also present as an external_ip in the NAT table, a prior‐
ity-120 logical flow is added with the match ip4 &&&&
ip4.dst == A &&&& P with the action next; to advance the
packet to the next table. If the load balancer has proto‐
col port B defined, then the match also has P.dst == B.
The above flows are also added for IPv6 load balancers.
Ingress Table 5: UNSNAT on Gateway Routers
· If the Gateway router has been configured to force SNAT
any previously DNATted packets to B, a priority-110 flow
matches ip &&&& ip4.dst == B or ip &&&& ip6.dst == B with an
action ct_snat; .
If the Gateway router is configured with
lb_force_snat_ip=router_ip then for every logical router
port P attached to the Gateway router with the router ip
B, a priority-110 flow is added with the match inport ==
P &&&& ip4.dst == B or inport == P &&&& ip6.dst == B with an
action ct_snat; .
If the Gateway router has been configured to force SNAT
any previously load-balanced packets to B, a priority-100
flow matches ip &&&& ip4.dst == B or ip &&&& ip6.dst == B
with an action ct_snat; .
For each NAT configuration in the OVN Northbound data‐
base, that asks to change the source IP address of a
packet from A to B, a priority-90 flow matches ip &&&&
ip4.dst == B or ip &&&& ip6.dst == B with an action
ct_snat; . If the NAT rule is of type dnat_and_snat and
has stateless=true in the options, then the action would
be ip4/6.dst= (B).
A priority-0 logical flow with match 1 has actions next;.
Ingress Table 5: UNSNAT on Distributed Routers
· For each configuration in the OVN Northbound database,
that asks to change the source IP address of a packet
from A to B, a priority-100 flow matches ip &&&& ip4.dst ==
B &&&& inport == GW or ip &&&& ip6.dst == B &&&& inport == GW
where GW is the logical router gateway port, with an
action ct_snat;. If the NAT rule is of type dnat_and_snat
and has stateless=true in the options, then the action
would be ip4/6.dst= (B).
If the NAT rule cannot be handled in a distributed man‐
ner, then the priority-100 flow above is only programmed
on the gateway chassis.
A priority-0 logical flow with match 1 has actions next;.
Ingress Table 6: DNAT
Packets enter the pipeline with destination IP address that needs to be
DNATted from a virtual IP address to a real IP address. Packets in the
reverse direction needs to be unDNATed.
Ingress Table 6: Load balancing DNAT rules
Following load balancing DNAT flows are added for Gateway router or
Router with gateway port. These flows are programmed only on the gate‐
way chassis. These flows do not get programmed for load balancers with
IPv6 VIPs.
· If controller_event has been enabled for all the config‐
ured load balancing rules for a Gateway router or Router
with gateway port in OVN_Northbound database that does
not have configured backends, a priority-130 flow is
added to trigger ovn-controller events whenever the chas‐
sis receives a packet for that particular VIP. If
event-elb meter has been previously created, it will be
associated to the empty_lb logical flow
· For all the configured load balancing rules for a Gateway
router or Router with gateway port in OVN_Northbound
database that includes a L4 port PORT of protocol P and
IPv4 or IPv6 address VIP, a priority-120 flow that
matches on ct.new &&&& ip &&&& ip4.dst == VIP &&&& P &&&& P.dst
== PORT
(ip6.dst == VIP in the IPv6 case) with an action of
ct_lb(args), where args contains comma separated IPv4 or
IPv6 addresses (and optional port numbers) to load bal‐
ance to. If the router is configured to force SNAT any
load-balanced packets, the above action will be replaced
by flags.force_snat_for_lb = 1; ct_lb(args);. If the load
balancing rule is configured with skip_snat set to true,
the above action will be replaced by
flags.skip_snat_for_lb = 1; ct_lb(args);. If health check
is enabled, then args will only contain those endpoints
whose service monitor status entry in OVN_Southbound db
is either online or empty.
· For all the configured load balancing rules for a router
in OVN_Northbound database that includes a L4 port PORT
of protocol P and IPv4 or IPv6 address VIP, a prior‐
ity-120 flow that matches on ct.est &&&& ip &&&& ip4.dst ==
VIP &&&& P &&&& P.dst == PORT
(ip6.dst == VIP in the IPv6 case) with an action of
ct_dnat;. If the router is configured to force SNAT any
load-balanced packets, the above action will be replaced
by flags.force_snat_for_lb = 1; ct_dnat;. If the load
balancing rule is configured with skip_snat set to true,
the above action will be replaced by
flags.skip_snat_for_lb = 1; ct_dnat;.
· For all the configured load balancing rules for a router
in OVN_Northbound database that includes just an IP
address VIP to match on, a priority-110 flow that matches
on ct.new &&&& ip &&&& ip4.dst == VIP (ip6.dst == VIP in the
IPv6 case) with an action of ct_lb(args), where args con‐
tains comma separated IPv4 or IPv6 addresses. If the
router is configured to force SNAT any load-balanced
packets, the above action will be replaced by
flags.force_snat_for_lb = 1; ct_lb(args);. If the load
balancing rule is configured with skip_snat set to true,
the above action will be replaced by
flags.skip_snat_for_lb = 1; ct_lb(args);.
· For all the configured load balancing rules for a router
in OVN_Northbound database that includes just an IP
address VIP to match on, a priority-110 flow that matches
on ct.est &&&& ip &&&& ip4.dst == VIP (or ip6.dst == VIP)
with an action of ct_dnat;. If the router is configured
to force SNAT any load-balanced packets, the above action
will be replaced by flags.force_snat_for_lb = 1;
ct_dnat;. If the load balancing rule is configured with
skip_snat set to true, the above action will be replaced
by flags.skip_snat_for_lb = 1; ct_dnat;.
· If the load balancer is created with --reject option and
it has no active backends, a TCP reset segment (for tcp)
or an ICMP port unreachable packet (for all other kind of
traffic) will be sent whenever an incoming packet is
received for this load-balancer. Please note using
--reject option will disable empty_lb SB controller event
for this load balancer.
Ingress Table 6: DNAT on Gateway Routers
· For each configuration in the OVN Northbound database,
that asks to change the destination IP address of a
packet from A to B, a priority-100 flow matches ip &&&&
ip4.dst == A or ip &&&& ip6.dst == A with an action
flags.loopback = 1; ct_dnat(B);. If the Gateway router is
configured to force SNAT any DNATed packet, the above
action will be replaced by flags.force_snat_for_dnat = 1;
flags.loopback = 1; ct_dnat(B);. If the NAT rule is of
type dnat_and_snat and has stateless=true in the options,
then the action would be ip4/6.dst= (B).
If the NAT rule has allowed_ext_ips configured, then
there is an additional match ip4.src == allowed_ext_ips .
Similarly, for IPV6, match would be ip6.src ==
allowed_ext_ips.
If the NAT rule has exempted_ext_ips set, then there is
an additional flow configured at priority 101. The flow
matches if source ip is an exempted_ext_ip and the action
is next; . This flow is used to bypass the ct_dnat action
for a packet originating from exempted_ext_ips.
· For all IP packets of a Gateway router, a priority-50
flow with an action flags.loopback = 1; ct_dnat;.
· A priority-0 logical flow with match 1 has actions next;.
Ingress Table 6: DNAT on Distributed Routers
On distributed routers, the DNAT table only handles packets with desti‐
nation IP address that needs to be DNATted from a virtual IP address to
a real IP address. The unDNAT processing in the reverse direction is
handled in a separate table in the egress pipeline.
· For each configuration in the OVN Northbound database,
that asks to change the destination IP address of a
packet from A to B, a priority-100 flow matches ip &&&&
ip4.dst == B &&&& inport == GW, where GW is the logical
router gateway port, with an action ct_dnat(B);. The
match will include ip6.dst == B in the IPv6 case. If the
NAT rule is of type dnat_and_snat and has stateless=true
in the options, then the action would be ip4/6.dst=(B).
If the NAT rule cannot be handled in a distributed man‐
ner, then the priority-100 flow above is only programmed
on the gateway chassis.
If the NAT rule has allowed_ext_ips configured, then
there is an additional match ip4.src == allowed_ext_ips .
Similarly, for IPV6, match would be ip6.src ==
allowed_ext_ips.
If the NAT rule has exempted_ext_ips set, then there is
an additional flow configured at priority 101. The flow
matches if source ip is an exempted_ext_ip and the action
is next; . This flow is used to bypass the ct_dnat action
for a packet originating from exempted_ext_ips.
A priority-0 logical flow with match 1 has actions next;.
Ingress Table 7: ECMP symmetric reply processing
· If ECMP routes with symmetric reply are configured in the
OVN_Northbound database for a gateway router, a prior‐
ity-100 flow is added for each router port on which sym‐
metric replies are configured. The matching logic for
these ports essentially reverses the configured logic of
the ECMP route. So for instance, a route with a destina‐
tion routing policy will instead match if the source IP
address matches the static route’s prefix. The flow uses
the action ct_commit { ct_label.ecmp_reply_eth =
eth.src;" " ct_label.ecmp_reply_port = K;}; next; to
commit the connection and storing eth.src and the ECMP
reply port binding tunnel key K in the ct_label.
Ingress Table 8: IPv6 ND RA option processing
· A priority-50 logical flow is added for each logical
router port configured with IPv6 ND RA options which
matches IPv6 ND Router Solicitation packet and applies
the action put_nd_ra_opts and advances the packet to the
next table.
reg0[5] = put_nd_ra_opts(options);next;
For a valid IPv6 ND RS packet, this transforms the packet
into an IPv6 ND RA reply and sets the RA options to the
packet and stores 1 into reg0[5]. For other kinds of
packets, it just stores 0 into reg0[5]. Either way, it
continues to the next table.
· A priority-0 logical flow with match 1 has actions next;.
Ingress Table 9: IPv6 ND RA responder
This table implements IPv6 ND RA responder for the IPv6 ND RA replies
generated by the previous table.
· A priority-50 logical flow is added for each logical
router port configured with IPv6 ND RA options which
matches IPv6 ND RA packets and reg0[5] == 1 and responds
back to the inport after applying these actions. If
reg0[5] is set to 1, it means that the action
put_nd_ra_opts was successful.
eth.dst = eth.src;
eth.src = E;
ip6.dst = ip6.src;
ip6.src = I;
outport = P;
flags.loopback = 1;
output;
where E is the MAC address and I is the IPv6 link local
address of the logical router port.
(This terminates packet processing in ingress pipeline;
the packet does not go to the next ingress table.)
· A priority-0 logical flow with match 1 has actions next;.
Ingress Table 10: IP Routing
A packet that arrives at this table is an IP packet that should be
routed to the address in ip4.dst or ip6.dst. This table implements IP
routing, setting reg0 (or xxreg0 for IPv6) to the next-hop IP address
(leaving ip4.dst or ip6.dst, the packet’s final destination, unchanged)
and advances to the next table for ARP resolution. It also sets reg1
(or xxreg1) to the IP address owned by the selected router port
(ingress table ARP Request will generate an ARP request, if needed,
with reg0 as the target protocol address and reg1 as the source proto‐
col address).
For ECMP routes, i.e. multiple static routes with same policy and pre‐
fix but different nexthops, the above actions are deferred to next ta‐
ble. This table, instead, is responsible for determine the ECMP group
id and select a member id within the group based on 5-tuple hashing. It
stores group id in reg8[0..15] and member id in reg8[16..31]. This step
is skipped if the traffic going out the ECMP route is reply traffic,
and the ECMP route was configured to use symmetric replies. Instead,
the stored ct_label value is used to choose the destination. The least
significant 48 bits of the ct_label tell the destination MAC address to
which the packet should be sent. The next 16 bits tell the logical
router port on which the packet should be sent. These values in the
ct_label are set when the initial ingress traffic is received over the
ECMP route.
This table contains the following logical flows:
· Priority-550 flow that drops IPv6 Router Solicita‐
tion/Advertisement packets that were not processed in
previous tables.
· Priority-500 flows that match IP multicast traffic des‐
tined to groups registered on any of the attached
switches and sets outport to the associated multicast
group that will eventually flood the traffic to all
interested attached logical switches. The flows also
decrement TTL.
· Priority-450 flow that matches unregistered IP multicast
traffic and sets outport to the MC_STATIC multicast
group, which ovn-northd populates with the logical ports
that have options :mcast_flood=’true’. If no router ports
are configured to flood multicast traffic the packets are
dropped.
· IPv4 routing table. For each route to IPv4 network N with
netmask M, on router port P with IP address A and Ether‐
net address E, a logical flow with match ip4.dst == N/M,
whose priority is the number of 1-bits in M, has the fol‐
lowing actions:
ip.ttl--;
reg8[0..15] = 0;
reg0 = G;
reg1 = A;
eth.src = E;
outport = P;
flags.loopback = 1;
next;
(Ingress table 1 already verified that ip.ttl--; will not
yield a TTL exceeded error.)
If the route has a gateway, G is the gateway IP address.
Instead, if the route is from a configured static route,
G is the next hop IP address. Else it is ip4.dst.
· IPv6 routing table. For each route to IPv6 network N with
netmask M, on router port P with IP address A and Ether‐
net address E, a logical flow with match in CIDR notation
ip6.dst == N/M, whose priority is the integer value of M,
has the following actions:
ip.ttl--;
reg8[0..15] = 0;
xxreg0 = G;
xxreg1 = A;
eth.src = E;
outport = inport;
flags.loopback = 1;
next;
(Ingress table 1 already verified that ip.ttl--; will not
yield a TTL exceeded error.)
If the route has a gateway, G is the gateway IP address.
Instead, if the route is from a configured static route,
G is the next hop IP address. Else it is ip6.dst.
If the address A is in the link-local scope, the route
will be limited to sending on the ingress port.
· For ECMP routes, they are grouped by policy and prefix.
An unique id (non-zero) is assigned to each group, and
each member is also assigned an unique id (non-zero)
within each group.
For each IPv4/IPv6 ECMP group with group id GID and mem‐
ber ids MID1, MID2, ..., a logical flow with match in
CIDR notation ip4.dst == N/M, or ip6.dst == N/M, whose
priority is the integer value of M, has the following
actions:
ip.ttl--;
flags.loopback = 1;
reg8[0..15] = GID;
select(reg8[16..31], MID1, MID2, ...);
Ingress Table 11: IP_ROUTING_ECMP
This table implements the second part of IP routing for ECMP routes
following the previous table. If a packet matched a ECMP group in the
previous table, this table matches the group id and member id stored
from the previous table, setting reg0 (or xxreg0 for IPv6) to the next-
hop IP address (leaving ip4.dst or ip6.dst, the packet’s final destina‐
tion, unchanged) and advances to the next table for ARP resolution. It
also sets reg1 (or xxreg1) to the IP address owned by the selected
router port (ingress table ARP Request will generate an ARP request, if
needed, with reg0 as the target protocol address and reg1 as the source
protocol address).
This processing is skipped for reply traffic being sent out of an ECMP
route if the route was configured to use symmetric replies.
This table contains the following logical flows:
· A priority-150 flow that matches reg8[0..15] == 0 with
action next; directly bypasses packets of non-ECMP
routes.
· For each member with ID MID in each ECMP group with ID
GID, a priority-100 flow with match reg8[0..15] == GID &&&&
reg8[16..31] == MID has following actions:
[xx]reg0 = G;
[xx]reg1 = A;
eth.src = E;
outport = P;
Ingress Table 12: Router policies
This table adds flows for the logical router policies configured on the
logical router. Please see the OVN_Northbound database Logi‐
cal_Router_Policy table documentation in ovn-nb for supported actions.
· For each router policy configured on the logical router,
a logical flow is added with specified priority, match
and actions.
· If the policy action is reroute with 2 or more nexthops
defined, then the logical flow is added with the follow‐
ing actions:
reg8[0..15] = GID;
reg8[16..31] = select(1,..n);
where GID is the ECMP group id generated by ovn-northd
for this policy and n is the number of nexthops. select
action selects one of the nexthop member id, stores it in
the register reg8[16..31] and advances the packet to the
next stage.
· If the policy action is reroute with just one nexhop,
then the logical flow is added with the following
actions:
[xx]reg0 = H;
eth.src = E;
outport = P;
reg8[0..15] = 0;
flags.loopback = 1;
next;
where H is the nexthop defined in the router policy, E
is the ethernet address of the logical router port from
which the nexthop is reachable and P is the logical
router port from which the nexthop is reachable.
· If a router policy has the option pkt_mark=m set and if
the action is not drop, then the action also includes
pkt.mark = m to mark the packet with the marker m.
Ingress Table 13: ECMP handling for router policies
This table handles the ECMP for the router policies configured with
multiple nexthops.
· A priority-150 flow is added to advance the packet to the
next stage if the ECMP group id register reg8[0..15] is
0.
· For each ECMP reroute router policy with multiple nex‐
thops, a priority-100 flow is added for each nexthop H
with the match reg8[0..15] == GID &&&& reg8[16..31] == M
where GID is the router policy group id generated by
ovn-northd and M is the member id of the nexthop H gener‐
ated by ovn-northd. The following actions are added to
the flow:
[xx]reg0 = H;
eth.src = E;
outport = P
"flags.loopback = 1; "
"next;"
where H is the nexthop defined in the router policy, E
is the ethernet address of the logical router port from
which the nexthop is reachable and P is the logical
router port from which the nexthop is reachable.
Ingress Table 14: ARP/ND Resolution
Any packet that reaches this table is an IP packet whose next-hop IPv4
address is in reg0 or IPv6 address is in xxreg0. (ip4.dst or ip6.dst
contains the final destination.) This table resolves the IP address in
reg0 (or xxreg0) into an output port in outport and an Ethernet address
in eth.dst, using the following flows:
· A priority-500 flow that matches IP multicast traffic
that was allowed in the routing pipeline. For this kind
of traffic the outport was already set so the flow just
advances to the next table.
· Static MAC bindings. MAC bindings can be known statically
based on data in the OVN_Northbound database. For router
ports connected to logical switches, MAC bindings can be
known statically from the addresses column in the Logi‐
cal_Switch_Port table. For router ports connected to
other logical routers, MAC bindings can be known stati‐
cally from the mac and networks column in the Logi‐
cal_Router_Port table. (Note: the flow is NOT installed
for the IP addresses that belong to a neighbor logical
router port if the current router has the
options:dynamic_neigh_routers set to true)
For each IPv4 address A whose host is known to have Eth‐
ernet address E on router port P, a priority-100 flow
with match outport === P &&&& reg0 == A has actions eth.dst
= E; next;.
For each virtual ip A configured on a logical port of
type virtual and its virtual parent set in its corre‐
sponding Port_Binding record and the virtual parent with
the Ethernet address E and the virtual ip is reachable
via the router port P, a priority-100 flow with match
outport === P &&&& reg0 == A has actions eth.dst = E;
next;.
For each virtual ip A configured on a logical port of
type virtual and its virtual parent not set in its corre‐
sponding Port_Binding record and the virtual ip A is
reachable via the router port P, a priority-100 flow with
match outport === P &&&& reg0 == A has actions eth.dst =
00:00:00:00:00:00; next;. This flow is added so that the
ARP is always resolved for the virtual ip A by generating
ARP request and not consulting the MAC_Binding table as
it can have incorrect value for the virtual ip A.
For each IPv6 address A whose host is known to have Eth‐
ernet address E on router port P, a priority-100 flow
with match outport === P &&&& xxreg0 == A has actions
eth.dst = E; next;.
For each logical router port with an IPv4 address A and a
mac address of E that is reachable via a different logi‐
cal router port P, a priority-100 flow with match outport
=== P &&&& reg0 == A has actions eth.dst = E; next;.
For each logical router port with an IPv6 address A and a
mac address of E that is reachable via a different logi‐
cal router port P, a priority-100 flow with match outport
=== P &&&& xxreg0 == A has actions eth.dst = E; next;.
· Static MAC bindings from NAT entries. MAC bindings can
also be known for the entries in the NAT table. Below
flows are programmed for distributed logical routers i.e
with a distributed router port.
For each row in the NAT table with IPv4 address A in the
external_ip column of NAT table, a priority-100 flow with
the match outport === P &&&& reg0 == A has actions eth.dst
= E; next;, where P is the distributed logical router
port, E is the Ethernet address if set in the exter‐
nal_mac column of NAT table for of type dnat_and_snat,
otherwise the Ethernet address of the distributed logical
router port.
For IPv6 NAT entries, same flows are added, but using the
register xxreg0 for the match.
· Traffic with IP destination an address owned by the
router should be dropped. Such traffic is normally
dropped in ingress table IP Input except for IPs that are
also shared with SNAT rules. However, if there was no
unSNAT operation that happened successfully until this
point in the pipeline and the destination IP of the
packet is still a router owned IP, the packets can be
safely dropped.
A priority-1 logical flow with match ip4.dst = {..}
matches on traffic destined to router owned IPv4
addresses which are also SNAT IPs. This flow has action
drop;.
A priority-1 logical flow with match ip6.dst = {..}
matches on traffic destined to router owned IPv6
addresses which are also SNAT IPs. This flow has action
drop;.
· Dynamic MAC bindings. These flows resolve MAC-to-IP bind‐
ings that have become known dynamically through ARP or
neighbor discovery. (The ingress table ARP Request will
issue an ARP or neighbor solicitation request for cases
where the binding is not yet known.)
A priority-0 logical flow with match ip4 has actions
get_arp(outport, reg0); next;.
A priority-0 logical flow with match ip6 has actions
get_nd(outport, xxreg0); next;.
· For a distributed gateway LRP with redirect-type set to
bridged, a priority-50 flow will match outport ==
"ROUTER_PORT" and !is_chassis_resident ("cr-ROUTER_PORT")
has actions eth.dst = E; next;, where E is the ethernet
address of the logical router port.
Ingress Table 15: Check packet length
For distributed logical routers with distributed gateway port config‐
ured with options:gateway_mtu to a valid integer value, this table adds
a priority-50 logical flow with the match ip4 &&&& outport == GW_PORT
where GW_PORT is the distributed gateway router port and applies the
action check_pkt_larger and advances the packet to the next table.
REGBIT_PKT_LARGER = check_pkt_larger(L); next;
where L is the packet length to check for. If the packet is larger than
L, it stores 1 in the register bit REGBIT_PKT_LARGER. The value of L is
taken from options:gateway_mtu column of Logical_Router_Port row.
This table adds one priority-0 fallback flow that matches all packets
and advances to the next table.
Ingress Table 16: Handle larger packets
For distributed logical routers with distributed gateway port config‐
ured with options:gateway_mtu to a valid integer value, this table adds
the following priority-50 logical flow for each logical router port
with the match inport == LRP &&&& outport == GW_PORT &&&& REG‐
BIT_PKT_LARGER, where LRP is the logical router port and GW_PORT is the
distributed gateway router port and applies the following action for
ipv4 and ipv6 respectively:
icmp4 {
icmp4.type = 3; /* Destination Unreachable. */
icmp4.code = 4; /* Frag Needed and DF was Set. */
icmp4.frag_mtu = M;
eth.dst = E;
ip4.dst = ip4.src;
ip4.src = I;
ip.ttl = 255;
REGBIT_EGRESS_LOOPBACK = 1;
next(pipeline=ingress, table=0);
};
icmp6 {
icmp6.type = 2;
icmp6.code = 0;
icmp6.frag_mtu = M;
eth.dst = E;
ip6.dst = ip6.src;
ip6.src = I;
ip.ttl = 255;
REGBIT_EGRESS_LOOPBACK = 1;
next(pipeline=ingress, table=0);
};
· Where M is the (fragment MTU - 58) whose value is taken
from options:gateway_mtu column of Logical_Router_Port
row.
· E is the Ethernet address of the logical router port.
· I is the IPv4/IPv6 address of the logical router port.
This table adds one priority-0 fallback flow that matches all packets
and advances to the next table.
Ingress Table 17: Gateway Redirect
For distributed logical routers where one of the logical router ports
specifies a gateway chassis, this table redirects certain packets to
the distributed gateway port instance on the gateway chassis. This ta‐
ble has the following flows:
· For each NAT rule in the OVN Northbound database that can
be handled in a distributed manner, a priority-100 logi‐
cal flow with match ip4.src == B &&&& outport == GW &&
is_chassis_resident(P), where GW is the logical router
distributed gateway port and P is the NAT logical port.
IP traffic matching the above rule will be managed
locally setting reg1 to C and eth.src to D, where C is
NAT external ip and D is NAT external mac.
· For each NAT rule in the OVN Northbound database that can
be handled in a distributed manner, a priority-80 logical
flow with drop action if the NAT logical port is a vir‐
tual port not claimed by any chassis yet.
· A priority-50 logical flow with match outport == GW has
actions outport = CR; next;, where GW is the logical
router distributed gateway port and CR is the chas‐
sisredirect port representing the instance of the logical
router distributed gateway port on the gateway chassis.
· A priority-0 logical flow with match 1 has actions next;.
Ingress Table 18: ARP Request
In the common case where the Ethernet destination has been resolved,
this table outputs the packet. Otherwise, it composes and sends an ARP
or IPv6 Neighbor Solicitation request. It holds the following flows:
· Unknown MAC address. A priority-100 flow for IPv4 packets
with match eth.dst == 00:00:00:00:00:00 has the following
actions:
arp {
eth.dst = ff:ff:ff:ff:ff:ff;
arp.spa = reg1;
arp.tpa = reg0;
arp.op = 1; /* ARP request. */
output;
};
Unknown MAC address. For each IPv6 static route associ‐
ated with the router with the nexthop IP: G, a prior‐
ity-200 flow for IPv6 packets with match eth.dst ==
00:00:00:00:00:00 &&&& xxreg0 == G with the following
actions is added:
nd_ns {
eth.dst = E;
ip6.dst = I
nd.target = G;
output;
};
Where E is the multicast mac derived from the Gateway IP,
I is the solicited-node multicast address corresponding
to the target address G.
Unknown MAC address. A priority-100 flow for IPv6 packets
with match eth.dst == 00:00:00:00:00:00 has the following
actions:
nd_ns {
nd.target = xxreg0;
output;
};
(Ingress table IP Routing initialized reg1 with the IP
address owned by outport and (xx)reg0 with the next-hop
IP address)
The IP packet that triggers the ARP/IPv6 NS request is
dropped.
· Known MAC address. A priority-0 flow with match 1 has
actions output;.
Egress Table 0: UNDNAT
This is for already established connections’ reverse traffic. i.e.,
DNAT has already been done in ingress pipeline and now the packet has
entered the egress pipeline as part of a reply. For NAT on a distrib‐
uted router, it is unDNATted here. For Gateway routers, the unDNAT pro‐
cessing is carried out in the ingress DNAT table.
· For all the configured load balancing rules for a router
with gateway port in OVN_Northbound database that
includes an IPv4 address VIP, for every backend IPv4
address B defined for the VIP a priority-120 flow is pro‐
grammed on gateway chassis that matches ip &&&& ip4.src ==
B &&&& outport == GW, where GW is the logical router gate‐
way port with an action ct_dnat;. If the backend IPv4
address B is also configured with L4 port PORT of proto‐
col P, then the match also includes P.src == PORT. These
flows are not added for load balancers with IPv6 VIPs.
If the router is configured to force SNAT any load-bal‐
anced packets, above action will be replaced by
flags.force_snat_for_lb = 1; ct_dnat;.
· For each configuration in the OVN Northbound database
that asks to change the destination IP address of a
packet from an IP address of A to B, a priority-100 flow
matches ip &&&& ip4.src == B &&&& outport == GW, where GW is
the logical router gateway port, with an action ct_dnat;.
If the NAT rule is of type dnat_and_snat and has state‐
less=true in the options, then the action would be
ip4/6.src= (B).
If the NAT rule cannot be handled in a distributed man‐
ner, then the priority-100 flow above is only programmed
on the gateway chassis.
If the NAT rule can be handled in a distributed manner,
then there is an additional action eth.src = EA;, where
EA is the ethernet address associated with the IP address
A in the NAT rule. This allows upstream MAC learning to
point to the correct chassis.
· A priority-0 logical flow with match 1 has actions next;.
Egress Table 1: SNAT
Packets that are configured to be SNATed get their source IP address
changed based on the configuration in the OVN Northbound database.
· A priority-120 flow to advance the IPv6 Neighbor solici‐
tation packet to next table to skip SNAT. In the case
where ovn-controller injects an IPv6 Neighbor Solicita‐
tion packet (for nd_ns action) we don’t want the packet
to go throught conntrack.
Egress Table 1: SNAT on Gateway Routers
· If the Gateway router in the OVN Northbound database has
been configured to force SNAT a packet (that has been
previously DNATted) to B, a priority-100 flow matches
flags.force_snat_for_dnat == 1 &&&& ip with an action
ct_snat(B);.
· If a load balancer configured to skip snat has been
applied to the Gateway router pipeline, a priority-120
flow matches flags.skip_snat_for_lb == 1 &&&& ip with an
action next;.
· If the Gateway router in the OVN Northbound database has
been configured to force SNAT a packet (that has been
previously load-balanced) using router IP (i.e
options:lb_force_snat_ip=router_ip), then for each logi‐
cal router port P attached to the Gateway router, a pri‐
ority-110 flow matches flags.force_snat_for_lb == 1 &&&&
outport == P
with an action ct_snat(R); where R is the IP configured
on the router port. If R is an IPv4 address then the
match will also include ip4 and if it is an IPv6 address,
then the match will also include ip6.
If the logical router port P is configured with multiple
IPv4 and multiple IPv6 addresses, only the first IPv4 and
first IPv6 address is considered.
· If the Gateway router in the OVN Northbound database has
been configured to force SNAT a packet (that has been
previously load-balanced) to B, a priority-100 flow
matches flags.force_snat_for_lb == 1 &&&& ip with an action
ct_snat(B);.
· For each configuration in the OVN Northbound database,
that asks to change the source IP address of a packet
from an IP address of A or to change the source IP
address of a packet that belongs to network A to B, a
flow matches ip &&&& ip4.src == A with an action
ct_snat(B);. The priority of the flow is calculated based
on the mask of A, with matches having larger masks get‐
ting higher priorities. If the NAT rule is of type
dnat_and_snat and has stateless=true in the options, then
the action would be ip4/6.src= (B).
· If the NAT rule has allowed_ext_ips configured, then
there is an additional match ip4.dst == allowed_ext_ips .
Similarly, for IPV6, match would be ip6.dst ==
allowed_ext_ips.
· If the NAT rule has exempted_ext_ips set, then there is
an additional flow configured at the priority + 1 of cor‐
responding NAT rule. The flow matches if destination ip
is an exempted_ext_ip and the action is next; . This flow
is used to bypass the ct_snat action for a packet which
is destinted to exempted_ext_ips.
· A priority-0 logical flow with match 1 has actions next;.
Egress Table 1: SNAT on Distributed Routers
· For each configuration in the OVN Northbound database,
that asks to change the source IP address of a packet
from an IP address of A or to change the source IP
address of a packet that belongs to network A to B, a
flow matches ip &&&& ip4.src == A &&&& outport == GW, where
GW is the logical router gateway port, with an action
ct_snat(B);. The priority of the flow is calculated based
on the mask of A, with matches having larger masks get‐
ting higher priorities. If the NAT rule is of type
dnat_and_snat and has stateless=true in the options, then
the action would be ip4/6.src= (B).
If the NAT rule cannot be handled in a distributed man‐
ner, then the flow above is only programmed on the gate‐
way chassis increasing flow priority by 128 in order to
be run first
If the NAT rule can be handled in a distributed manner,
then there is an additional action eth.src = EA;, where
EA is the ethernet address associated with the IP address
A in the NAT rule. This allows upstream MAC learning to
point to the correct chassis.
If the NAT rule has allowed_ext_ips configured, then
there is an additional match ip4.dst == allowed_ext_ips .
Similarly, for IPV6, match would be ip6.dst ==
allowed_ext_ips.
If the NAT rule has exempted_ext_ips set, then there is
an additional flow configured at the priority + 1 of cor‐
responding NAT rule. The flow matches if destination ip
is an exempted_ext_ip and the action is next; . This flow
is used to bypass the ct_snat action for a flow which is
destinted to exempted_ext_ips.
· A priority-0 logical flow with match 1 has actions next;.
Egress Table 2: Egress Loopback
For distributed logical routers where one of the logical router ports
specifies a gateway chassis.
While UNDNAT and SNAT processing have already occurred by this point,
this traffic needs to be forced through egress loopback on this dis‐
tributed gateway port instance, in order for UNSNAT and DNAT processing
to be applied, and also for IP routing and ARP resolution after all of
the NAT processing, so that the packet can be forwarded to the destina‐
tion.
This table has the following flows:
· For each NAT rule in the OVN Northbound database on a
distributed router, a priority-100 logical flow with
match ip4.dst == E &&&& outport == GW &&&& is_chassis_resi‐
dent(P), where E is the external IP address specified in
the NAT rule, GW is the logical router distributed gate‐
way port. For dnat_and_snat NAT rule, P is the logical
port specified in the NAT rule. If logical_port column of
NAT table is NOT set, then P is the chassisredirect port
of GW with the following actions:
clone {
ct_clear;
inport = outport;
outport = "";
flags = 0;
flags.loopback = 1;
reg0 = 0;
reg1 = 0;
...
reg9 = 0;
REGBIT_EGRESS_LOOPBACK = 1;
next(pipeline=ingress, table=0);
};
flags.loopback is set since in_port is unchanged and the
packet may return back to that port after NAT processing.
REGBIT_EGRESS_LOOPBACK is set to indicate that egress
loopback has occurred, in order to skip the source IP
address check against the router address.
· A priority-0 logical flow with match 1 has actions next;.
Egress Table 3: Delivery
Packets that reach this table are ready for delivery. It contains:
· Priority-110 logical flows that match IP multicast pack‐
ets on each enabled logical router port and modify the
Ethernet source address of the packets to the Ethernet
address of the port and then execute action output;.
· Priority-100 logical flows that match packets on each
enabled logical router port, with action output;.
OVN 21.06.1 ovn-northd ovn-northd(8)