@@ -589,3 +589,168 @@ Time stamps for outgoing packets are to be generated as follows:
589
589
this would occur at a later time in the processing pipeline than other
590
590
software time stamping and therefore could lead to unexpected deltas
591
591
between time stamps.
592
+
593
+ 3.2 Special considerations for stacked PTP Hardware Clocks
594
+ ----------------------------------------------------------
595
+
596
+ There are situations when there may be more than one PHC (PTP Hardware Clock)
597
+ in the data path of a packet. The kernel has no explicit mechanism to allow the
598
+ user to select which PHC to use for timestamping Ethernet frames. Instead, the
599
+ assumption is that the outermost PHC is always the most preferable, and that
600
+ kernel drivers collaborate towards achieving that goal. Currently there are 3
601
+ cases of stacked PHCs, detailed below:
602
+
603
+ 3.2.1 DSA (Distributed Switch Architecture) switches
604
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
605
+
606
+ These are Ethernet switches which have one of their ports connected to an
607
+ (otherwise completely unaware) host Ethernet interface, and perform the role of
608
+ a port multiplier with optional forwarding acceleration features. Each DSA
609
+ switch port is visible to the user as a standalone (virtual) network interface,
610
+ and its network I/O is performed, under the hood, indirectly through the host
611
+ interface (redirecting to the host port on TX, and intercepting frames on RX).
612
+
613
+ When a DSA switch is attached to a host port, PTP synchronization has to
614
+ suffer, since the switch's variable queuing delay introduces a path delay
615
+ jitter between the host port and its PTP partner. For this reason, some DSA
616
+ switches include a timestamping clock of their own, and have the ability to
617
+ perform network timestamping on their own MAC, such that path delays only
618
+ measure wire and PHY propagation latencies. Timestamping DSA switches are
619
+ supported in Linux and expose the same ABI as any other network interface (save
620
+ for the fact that the DSA interfaces are in fact virtual in terms of network
621
+ I/O, they do have their own PHC). It is typical, but not mandatory, for all
622
+ interfaces of a DSA switch to share the same PHC.
623
+
624
+ By design, PTP timestamping with a DSA switch does not need any special
625
+ handling in the driver for the host port it is attached to. However, when the
626
+ host port also supports PTP timestamping, DSA will take care of intercepting
627
+ the ``.ndo_do_ioctl `` calls towards the host port, and block attempts to enable
628
+ hardware timestamping on it. This is because the SO_TIMESTAMPING API does not
629
+ allow the delivery of multiple hardware timestamps for the same packet, so
630
+ anybody else except for the DSA switch port must be prevented from doing so.
631
+
632
+ In code, DSA provides for most of the infrastructure for timestamping already,
633
+ in generic code: a BPF classifier (``ptp_classify_raw ``) is used to identify
634
+ PTP event messages (any other packets, including PTP general messages, are not
635
+ timestamped), and provides two hooks to drivers:
636
+
637
+ - ``.port_txtstamp() ``: The driver is passed a clone of the timestampable skb
638
+ to be transmitted, before actually transmitting it. Typically, a switch will
639
+ have a PTP TX timestamp register (or sometimes a FIFO) where the timestamp
640
+ becomes available. There may be an IRQ that is raised upon this timestamp's
641
+ availability, or the driver might have to poll after invoking
642
+ ``dev_queue_xmit() `` towards the host interface. Either way, in the
643
+ ``.port_txtstamp() `` method, the driver only needs to save the clone for
644
+ later use (when the timestamp becomes available). Each skb is annotated with
645
+ a pointer to its clone, in ``DSA_SKB_CB(skb)->clone ``, to ease the driver's
646
+ job of keeping track of which clone belongs to which skb.
647
+
648
+ - ``.port_rxtstamp() ``: The original (and only) timestampable skb is provided
649
+ to the driver, for it to annotate it with a timestamp, if that is immediately
650
+ available, or defer to later. On reception, timestamps might either be
651
+ available in-band (through metadata in the DSA header, or attached in other
652
+ ways to the packet), or out-of-band (through another RX timestamping FIFO).
653
+ Deferral on RX is typically necessary when retrieving the timestamp needs a
654
+ sleepable context. In that case, it is the responsibility of the DSA driver
655
+ to call ``netif_rx_ni() `` on the freshly timestamped skb.
656
+
657
+ 3.2.2 Ethernet PHYs
658
+ ^^^^^^^^^^^^^^^^^^^
659
+
660
+ These are devices that typically fulfill a Layer 1 role in the network stack,
661
+ hence they do not have a representation in terms of a network interface as DSA
662
+ switches do. However, PHYs may be able to detect and timestamp PTP packets, for
663
+ performance reasons: timestamps taken as close as possible to the wire have the
664
+ potential to yield a more stable and precise synchronization.
665
+
666
+ A PHY driver that supports PTP timestamping must create a ``struct
667
+ mii_timestamper `` and add a pointer to it in ``phydev->mii_ts ``. The presence
668
+ of this pointer will be checked by the networking stack.
669
+
670
+ Since PHYs do not have network interface representations, the timestamping and
671
+ ethtool ioctl operations for them need to be mediated by their respective MAC
672
+ driver. Therefore, as opposed to DSA switches, modifications need to be done
673
+ to each individual MAC driver for PHY timestamping support. This entails:
674
+
675
+ - Checking, in ``.ndo_do_ioctl ``, whether ``phy_has_hwtstamp(netdev->phydev) ``
676
+ is true or not. If it is, then the MAC driver should not process this request
677
+ but instead pass it on to the PHY using ``phy_mii_ioctl() ``.
678
+
679
+ - On RX, special intervention may or may not be needed, depending on the
680
+ function used to deliver skb's up the network stack. In the case of plain
681
+ ``netif_rx() `` and similar, MAC drivers must check whether
682
+ ``skb_defer_rx_timestamp(skb) `` is necessary or not - and if it is, don't
683
+ call ``netif_rx() `` at all. If ``CONFIG_NETWORK_PHY_TIMESTAMPING `` is
684
+ enabled, and ``skb->dev->phydev->mii_ts `` exists, its ``.rxtstamp() `` hook
685
+ will be called now, to determine, using logic very similar to DSA, whether
686
+ deferral for RX timestamping is necessary. Again like DSA, it becomes the
687
+ responsibility of the PHY driver to send the packet up the stack when the
688
+ timestamp is available.
689
+
690
+ For other skb receive functions, such as ``napi_gro_receive `` and
691
+ ``netif_receive_skb ``, the stack automatically checks whether
692
+ ``skb_defer_rx_timestamp() `` is necessary, so this check is not needed inside
693
+ the driver.
694
+
695
+ - On TX, again, special intervention might or might not be needed. The
696
+ function that calls the ``mii_ts->txtstamp() `` hook is named
697
+ ``skb_clone_tx_timestamp() ``. This function can either be called directly
698
+ (case in which explicit MAC driver support is indeed needed), but the
699
+ function also piggybacks from the ``skb_tx_timestamp() `` call, which many MAC
700
+ drivers already perform for software timestamping purposes. Therefore, if a
701
+ MAC supports software timestamping, it does not need to do anything further
702
+ at this stage.
703
+
704
+ 3.2.3 MII bus snooping devices
705
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
706
+
707
+ These perform the same role as timestamping Ethernet PHYs, save for the fact
708
+ that they are discrete devices and can therefore be used in conjunction with
709
+ any PHY even if it doesn't support timestamping. In Linux, they are
710
+ discoverable and attachable to a ``struct phy_device `` through Device Tree, and
711
+ for the rest, they use the same mii_ts infrastructure as those. See
712
+ Documentation/devicetree/bindings/ptp/timestamper.txt for more details.
713
+
714
+ 3.2.4 Other caveats for MAC drivers
715
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
716
+
717
+ Stacked PHCs, especially DSA (but not only) - since that doesn't require any
718
+ modification to MAC drivers, so it is more difficult to ensure correctness of
719
+ all possible code paths - is that they uncover bugs which were impossible to
720
+ trigger before the existence of stacked PTP clocks. One example has to do with
721
+ this line of code, already presented earlier::
722
+
723
+ skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
724
+
725
+ Any TX timestamping logic, be it a plain MAC driver, a DSA switch driver, a PHY
726
+ driver or a MII bus snooping device driver, should set this flag.
727
+ But a MAC driver that is unaware of PHC stacking might get tripped up by
728
+ somebody other than itself setting this flag, and deliver a duplicate
729
+ timestamp.
730
+ For example, a typical driver design for TX timestamping might be to split the
731
+ transmission part into 2 portions:
732
+
733
+ 1. "TX": checks whether PTP timestamping has been previously enabled through
734
+ the ``.ndo_do_ioctl `` ("``priv->hwtstamp_tx_enabled == true ``") and the
735
+ current skb requires a TX timestamp ("``skb_shinfo(skb)->tx_flags &
736
+ SKBTX_HW_TSTAMP ``"). If this is true, it sets the
737
+ "``skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS ``" flag. Note: as
738
+ described above, in the case of a stacked PHC system, this condition should
739
+ never trigger, as this MAC is certainly not the outermost PHC. But this is
740
+ not where the typical issue is. Transmission proceeds with this packet.
741
+
742
+ 2. "TX confirmation": Transmission has finished. The driver checks whether it
743
+ is necessary to collect any TX timestamp for it. Here is where the typical
744
+ issues are: the MAC driver takes a shortcut and only checks whether
745
+ "``skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS ``" was set. With a stacked
746
+ PHC system, this is incorrect because this MAC driver is not the only entity
747
+ in the TX data path who could have enabled SKBTX_IN_PROGRESS in the first
748
+ place.
749
+
750
+ The correct solution for this problem is for MAC drivers to have a compound
751
+ check in their "TX confirmation" portion, not only for
752
+ "``skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS ``", but also for
753
+ "``priv->hwtstamp_tx_enabled == true ``". Because the rest of the system ensures
754
+ that PTP timestamping is not enabled for anything other than the outermost PHC,
755
+ this enhanced check will avoid delivering a duplicated TX timestamp to user
756
+ space.
0 commit comments