Skip to content

Commit 94d9f78

Browse files
vladimirolteankuba-moo
authored andcommitted
docs: networking: timestamping: add section for stacked PHC devices
The concept of timestamping DSA switches / Ethernet PHYs is becoming more and more popular, however the Linux kernel timestamping code has evolved quite organically and there's layers upon layers of new and old code that need to work together for things to behave as expected. Add this chapter to explain what the overall goals are. Loosely based upon this email discussion plus some more info: https://lkml.org/lkml/2020/7/6/481 Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Richard Cochran <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
1 parent e63a228 commit 94d9f78

File tree

1 file changed

+165
-0
lines changed

1 file changed

+165
-0
lines changed

Documentation/networking/timestamping.rst

Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -589,3 +589,168 @@ Time stamps for outgoing packets are to be generated as follows:
589589
this would occur at a later time in the processing pipeline than other
590590
software time stamping and therefore could lead to unexpected deltas
591591
between time stamps.
592+
593+
3.2 Special considerations for stacked PTP Hardware Clocks
594+
----------------------------------------------------------
595+
596+
There are situations when there may be more than one PHC (PTP Hardware Clock)
597+
in the data path of a packet. The kernel has no explicit mechanism to allow the
598+
user to select which PHC to use for timestamping Ethernet frames. Instead, the
599+
assumption is that the outermost PHC is always the most preferable, and that
600+
kernel drivers collaborate towards achieving that goal. Currently there are 3
601+
cases of stacked PHCs, detailed below:
602+
603+
3.2.1 DSA (Distributed Switch Architecture) switches
604+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
605+
606+
These are Ethernet switches which have one of their ports connected to an
607+
(otherwise completely unaware) host Ethernet interface, and perform the role of
608+
a port multiplier with optional forwarding acceleration features. Each DSA
609+
switch port is visible to the user as a standalone (virtual) network interface,
610+
and its network I/O is performed, under the hood, indirectly through the host
611+
interface (redirecting to the host port on TX, and intercepting frames on RX).
612+
613+
When a DSA switch is attached to a host port, PTP synchronization has to
614+
suffer, since the switch's variable queuing delay introduces a path delay
615+
jitter between the host port and its PTP partner. For this reason, some DSA
616+
switches include a timestamping clock of their own, and have the ability to
617+
perform network timestamping on their own MAC, such that path delays only
618+
measure wire and PHY propagation latencies. Timestamping DSA switches are
619+
supported in Linux and expose the same ABI as any other network interface (save
620+
for the fact that the DSA interfaces are in fact virtual in terms of network
621+
I/O, they do have their own PHC). It is typical, but not mandatory, for all
622+
interfaces of a DSA switch to share the same PHC.
623+
624+
By design, PTP timestamping with a DSA switch does not need any special
625+
handling in the driver for the host port it is attached to. However, when the
626+
host port also supports PTP timestamping, DSA will take care of intercepting
627+
the ``.ndo_do_ioctl`` calls towards the host port, and block attempts to enable
628+
hardware timestamping on it. This is because the SO_TIMESTAMPING API does not
629+
allow the delivery of multiple hardware timestamps for the same packet, so
630+
anybody else except for the DSA switch port must be prevented from doing so.
631+
632+
In code, DSA provides for most of the infrastructure for timestamping already,
633+
in generic code: a BPF classifier (``ptp_classify_raw``) is used to identify
634+
PTP event messages (any other packets, including PTP general messages, are not
635+
timestamped), and provides two hooks to drivers:
636+
637+
- ``.port_txtstamp()``: The driver is passed a clone of the timestampable skb
638+
to be transmitted, before actually transmitting it. Typically, a switch will
639+
have a PTP TX timestamp register (or sometimes a FIFO) where the timestamp
640+
becomes available. There may be an IRQ that is raised upon this timestamp's
641+
availability, or the driver might have to poll after invoking
642+
``dev_queue_xmit()`` towards the host interface. Either way, in the
643+
``.port_txtstamp()`` method, the driver only needs to save the clone for
644+
later use (when the timestamp becomes available). Each skb is annotated with
645+
a pointer to its clone, in ``DSA_SKB_CB(skb)->clone``, to ease the driver's
646+
job of keeping track of which clone belongs to which skb.
647+
648+
- ``.port_rxtstamp()``: The original (and only) timestampable skb is provided
649+
to the driver, for it to annotate it with a timestamp, if that is immediately
650+
available, or defer to later. On reception, timestamps might either be
651+
available in-band (through metadata in the DSA header, or attached in other
652+
ways to the packet), or out-of-band (through another RX timestamping FIFO).
653+
Deferral on RX is typically necessary when retrieving the timestamp needs a
654+
sleepable context. In that case, it is the responsibility of the DSA driver
655+
to call ``netif_rx_ni()`` on the freshly timestamped skb.
656+
657+
3.2.2 Ethernet PHYs
658+
^^^^^^^^^^^^^^^^^^^
659+
660+
These are devices that typically fulfill a Layer 1 role in the network stack,
661+
hence they do not have a representation in terms of a network interface as DSA
662+
switches do. However, PHYs may be able to detect and timestamp PTP packets, for
663+
performance reasons: timestamps taken as close as possible to the wire have the
664+
potential to yield a more stable and precise synchronization.
665+
666+
A PHY driver that supports PTP timestamping must create a ``struct
667+
mii_timestamper`` and add a pointer to it in ``phydev->mii_ts``. The presence
668+
of this pointer will be checked by the networking stack.
669+
670+
Since PHYs do not have network interface representations, the timestamping and
671+
ethtool ioctl operations for them need to be mediated by their respective MAC
672+
driver. Therefore, as opposed to DSA switches, modifications need to be done
673+
to each individual MAC driver for PHY timestamping support. This entails:
674+
675+
- Checking, in ``.ndo_do_ioctl``, whether ``phy_has_hwtstamp(netdev->phydev)``
676+
is true or not. If it is, then the MAC driver should not process this request
677+
but instead pass it on to the PHY using ``phy_mii_ioctl()``.
678+
679+
- On RX, special intervention may or may not be needed, depending on the
680+
function used to deliver skb's up the network stack. In the case of plain
681+
``netif_rx()`` and similar, MAC drivers must check whether
682+
``skb_defer_rx_timestamp(skb)`` is necessary or not - and if it is, don't
683+
call ``netif_rx()`` at all. If ``CONFIG_NETWORK_PHY_TIMESTAMPING`` is
684+
enabled, and ``skb->dev->phydev->mii_ts`` exists, its ``.rxtstamp()`` hook
685+
will be called now, to determine, using logic very similar to DSA, whether
686+
deferral for RX timestamping is necessary. Again like DSA, it becomes the
687+
responsibility of the PHY driver to send the packet up the stack when the
688+
timestamp is available.
689+
690+
For other skb receive functions, such as ``napi_gro_receive`` and
691+
``netif_receive_skb``, the stack automatically checks whether
692+
``skb_defer_rx_timestamp()`` is necessary, so this check is not needed inside
693+
the driver.
694+
695+
- On TX, again, special intervention might or might not be needed. The
696+
function that calls the ``mii_ts->txtstamp()`` hook is named
697+
``skb_clone_tx_timestamp()``. This function can either be called directly
698+
(case in which explicit MAC driver support is indeed needed), but the
699+
function also piggybacks from the ``skb_tx_timestamp()`` call, which many MAC
700+
drivers already perform for software timestamping purposes. Therefore, if a
701+
MAC supports software timestamping, it does not need to do anything further
702+
at this stage.
703+
704+
3.2.3 MII bus snooping devices
705+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
706+
707+
These perform the same role as timestamping Ethernet PHYs, save for the fact
708+
that they are discrete devices and can therefore be used in conjunction with
709+
any PHY even if it doesn't support timestamping. In Linux, they are
710+
discoverable and attachable to a ``struct phy_device`` through Device Tree, and
711+
for the rest, they use the same mii_ts infrastructure as those. See
712+
Documentation/devicetree/bindings/ptp/timestamper.txt for more details.
713+
714+
3.2.4 Other caveats for MAC drivers
715+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
716+
717+
Stacked PHCs, especially DSA (but not only) - since that doesn't require any
718+
modification to MAC drivers, so it is more difficult to ensure correctness of
719+
all possible code paths - is that they uncover bugs which were impossible to
720+
trigger before the existence of stacked PTP clocks. One example has to do with
721+
this line of code, already presented earlier::
722+
723+
skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
724+
725+
Any TX timestamping logic, be it a plain MAC driver, a DSA switch driver, a PHY
726+
driver or a MII bus snooping device driver, should set this flag.
727+
But a MAC driver that is unaware of PHC stacking might get tripped up by
728+
somebody other than itself setting this flag, and deliver a duplicate
729+
timestamp.
730+
For example, a typical driver design for TX timestamping might be to split the
731+
transmission part into 2 portions:
732+
733+
1. "TX": checks whether PTP timestamping has been previously enabled through
734+
the ``.ndo_do_ioctl`` ("``priv->hwtstamp_tx_enabled == true``") and the
735+
current skb requires a TX timestamp ("``skb_shinfo(skb)->tx_flags &
736+
SKBTX_HW_TSTAMP``"). If this is true, it sets the
737+
"``skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS``" flag. Note: as
738+
described above, in the case of a stacked PHC system, this condition should
739+
never trigger, as this MAC is certainly not the outermost PHC. But this is
740+
not where the typical issue is. Transmission proceeds with this packet.
741+
742+
2. "TX confirmation": Transmission has finished. The driver checks whether it
743+
is necessary to collect any TX timestamp for it. Here is where the typical
744+
issues are: the MAC driver takes a shortcut and only checks whether
745+
"``skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS``" was set. With a stacked
746+
PHC system, this is incorrect because this MAC driver is not the only entity
747+
in the TX data path who could have enabled SKBTX_IN_PROGRESS in the first
748+
place.
749+
750+
The correct solution for this problem is for MAC drivers to have a compound
751+
check in their "TX confirmation" portion, not only for
752+
"``skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS``", but also for
753+
"``priv->hwtstamp_tx_enabled == true``". Because the rest of the system ensures
754+
that PTP timestamping is not enabled for anything other than the outermost PHC,
755+
this enhanced check will avoid delivering a duplicated TX timestamp to user
756+
space.

0 commit comments

Comments
 (0)