A PARALLEL PIPELINED PACKET SWITCH ARCHITECTURE FOR MESH-CONNECTED MULTIPROCESSORS WITH INDEPENDENTLY ROUTED FLITS

(Received: 2019-04-27, Revised: 2019-06-30 , Accepted: 2019-07-22)
In this paper, a packet switch architecture for mesh-connected multiprocessors based on the use of a set of input FIFO buffers and an output register matrix controlled by a novel distributed timing-based scheduling scheme is proposed. Simple static routing is assumed, with each packet split into a set of independently routed w-bit-wide flits. The device achieves at least 78% throughput for uniformly distributed traffic and an asymptotic higher bound of 100%. In contrast to the state-of-the-art VOQ-based switch architectures, the proposed switch is shown to reach its maximum throughput with no internal speedup required and has an order of magnitude lower hardware com- plexity. Compared to existing buffered crossbar non-VOQ switches with typical flit scheduling mechanisms, the proposed device demonstrates slightly higher throughput and substantially shorter delays in some practically im- portant cases.
  1. S. Misra and S. Goswami, Network Routing: Fundamentals, Applications and Emerging Technologies, Wiley Telecom, 2014.
  2. A. A. Jerraya and W. Wolf, Multiprocessor Systems-on-Chips, San Francisco, Elsevier, Inc., 2005.
  3. Z. Yu, R. Xiao et al., "A 16-Core Processor with Shared-Memory and Message-Passing Communica- tions," IEEE Trans. Circ. Syst. I: Regular Papers, vol. 61, no. 4, pp. 1081-1094, 2014.
  4. Tilera Corp., "Tile Processor Architecture Overview for The TILE-Gx Series,"[Online], Available: http://www.mellanox.com/repository/solutions/tile-scm/docs/UG130-ArchOverview-TILE-Gx.pdf, (ac- cess date: 30.06.2019).
  5. A. Olofsson, "Epiphany-V: A 1024 Processor 64-bit RISC System-on-Chip,"[Online]: Available: https://www.parallella.org/docs/e5_1024core_soc.pdf, (access date: 30.06.2019).
  6. P. Lotfi-Kamran, M. Modarressi and H. Sarbazi-Azad, "An Efficient Hybrid-Switched Network-on-Chip for Chip Multiprocessors," IEEE Trans. Comput., vol. 65, no. 5, pp. 1656-1662, 2016.
  7. G. Chen, M. A. Anders et al., "A 340 mV-to-0.9 V 20.2 Tb/s Source-synchronous Hybrid Packet/Circuit- switched 16×16 Network-On-Chip in 22 nm Tri-Gate CMOS," IEEE J. Solid-St. Circ., vol. 50, no. 1, pp. 59-67, 2015.
  8. A. Mazloumi and M. Modarressi, "A Hybrid Packet/circuit-switched Router to Accelerate Memory Ac- cess in NoC-based Chip Multiprocessor," Proc. of Design, Automation and Test in Europe Conference and Exhibition (DATE), pp. 908-911, 2015.
  9. M. H. Foroozannejad, M. Hashemi et al., "Time-scalable Mapping for Circuit-switched GALS Chip Mul- tiprocessor Platforms," IEEE Trans. Comput.-aided Design of Integr. Circ. and Syst., vol. 33, no. 5, pp. 752-762, 2014.
  10. M. Karol, M. Hluchyj and S. Morgan, "Input Versus Output Queueing on a Space-Division Packet Switch," IEEE Trans. Commun., vol. 35, no. 12, pp. 1347-1356, 1987.
  11. L. Deng, W. S. Wong et al., "Delay-constrained Input-queued Switch," IEEE J. Selected Areas Commun., vol. 36, no. 11, pp. 2464-2474, 2018.
  12. K. Kang, K.-J. Park, L. Sha and Q. Wang, "Design of a Crossbar VOQ Real-time Switch with Clock- 2O n L 2O n L 2O n L 12n 2max ,O n nL 150 "A Parallel Pipelined Packet Switch Architecture for Mesh-connected Multiprocessors with Independently Routed Flits", J. Al-Azzeh, M. Agmal and I. Zotov. driven Scheduling for a Guaranteed Delay Bound," Real-time Systems, vol. 49, no. 1, pp. 117-135, 2013.
  13. M. J. Neely, E. Modiano and Y. -S. Cheng, "Logarithmic Delay for NxN Packet Switches under the Crossbar Constraint," IEEE/ACM Trans. Networking, vol. 15, no. 3, pp. 657-668, 2007.
  14. S. Durkovic and Z. Cica, "Birkhoff-von Neumann Switch Based on Greedy Scheduling," IEEE Comput. Archit. Letters, vol. 17, no. 1, pp. 13-16, 2018.
  15. C. -S. Chang, D. -S. Lee and C. -Y. Yue, "Providing Guaranteed Rate Services in the Load Balanced Birkhoff-von Neumann Switches," IEEE/ACM Trans. Networking, vol. 14, no. 3, pp. 644-656, 2006.
  16. H. -I. Lee and S. -W. Seo, "Matching Output Queueing with a Multiple Input/Output-queued Switch," IEEE/ACM Trans. Networking, vol. 14, no. 1, pp. 121-132, 2006.
  17. Y. Tamir and G. Frazier, "High Performance Multi-queue Buffers for VLSI Communication Switches," Proc. of the 15th Annu. Symp. Comput. Archit., pp. 343-354, June 1988.
  18. T. Anderson, S. S. Owicki, J. B. Saxe and C. P. Thacker, "High-speed Switch Scheduling for Local-area Networks," ACM Trans. Comput. Syst., vol. 11, no. 4, pp. 319-352, 1993.
  19. N. McKeown, "The iSLIP Scheduling Algorithm for Input-queued Switches," IEEE/ACM Trans. Net- working, vol. 7, no. 2, pp. 188-201, April 1999.
  20. Y. Shen, S. S. Panwar and H. J. Chao, "SQUID: A Practical 100% Throughput Scheduler for Crosspoint Buffered Switches," IEEE/ACM Trans. Networking, vol. 18, no. 4, pp. 1119-1131, August 2010.
  21. N. McKeown, V. Anantharam and J. Walrand, "Achieving 100% Throughput in an Input-queued Switch," Proc. of the 15th IEEE INFOCOM, pp. 296-302, San Francisco, CA, USA, Mar. 1996.
  22. J. Chao, "Saturn: A Terabit Packet Switch Using Dual Round Robin," IEEE Commun. Mag., vol. 38, no. 12, pp. 78-84, Dec. 2000.
  23. S. Mneimneh, "Matching from the First Iteration: An Iterative Switching Algorithm for an Input-queued Switch," IEEE/ACM Trans. Networking, vol. 16, no. 1, pp. 206-217, Feb. 2008.
  24. B. Hu, K. L. Yeung, Q. Zhou and C. He, "On Iterative Scheduling for Input-queued Switches with a Speedup of 2-1/N," IEEE/ACM Trans. Networking, vol. 24, no. 6, pp. 3565-3577, 2016.
  25. B. Hu and K. L. Yeung, "Feedback-based Scheduling for Load Balanced Two-Stage Switches," IEEE/ACM Trans. Networking, vol. 18, no. 4, pp. 1077-1090, Aug. 2010.
  26. C. -S. Chang, D. -S. Lee and Y. -S. Jou, "Load Balanced Birkhoff-von Neumann Switches, Part I: One- stage Buffering," Comput. Commun., vol. 25, no. 6, pp. 611-622, 2002.
  27. Y. Chen, "Cell Switched Network-on-Chip Candidate for Billion-transistor System-on-Chips," Proc. of IEEE Int’l. Soc. Conf., pp. 57-60, 2006.
  28. A. Olofsson, "Mesh Network," US Patent, no. 8531943 B2, Sep. 10, 2013.
  29. M. Lin and N. McKeown, "The Throughput of a Buffered Crossbar Switch," IEEE Commun. Let., vol. 9, no. 5, pp. 465-467, 2005.
  30. M. Nabeshima, "Performance Evaluation of a Combined Input- and Crosspoint-queued Switch," IEICE Trans. Commun., vol. E83-B, pp. 737-741, 2000.
  31. I. V. Zotov, "Distributed Virtual Bit-slice Synchronizer: A Scalable Hardware Barrier Mechanism for N- dimensional Meshes," IEEE Trans. Comput., vol. 59, no. 9, pp. 1187-1199, Sep. 2010.
  32. I. V. Zotov et al., "The VisualQChart Simulation Environment," Computer Program Certificate RU 2007611310, appl. 13.02.2007, publ. 27.03.2007.
  33. D. Zydek, H. Selvaraj and L. Gewali, "Synthesis of Processor Allocator for Torus-based Chip Multipro- cessors," Proc. of the 7th Int’l. Conf. on Information Technology: New Generations, pp. 13-18, 2010.
  34. A. Samad, M. Q. Rafiq and O. Farooq, "Performance Evaluation of Task Assignment Algorithms in Cube-based Multiprocessor Systems," Proc. of the 1st Int’l. Conf. on Emerging Trends and Applications in Computer Science, pp. 48-51, 2013. 151 Jordanian Journal of Computers and Information Technology (JJCIT), Vol. 05, No. 02, August 2019.
  35. J. M. Camara, M. Moreto et al., "Twisted Torus Topologies for Enhanced Interconnection Networks," IEEE Trans. Parallel Distrib. Syst., vol. 21, no. 12, pp. 1765-1778, 2010.
  36. K. Li, Y. Mu, K. Li and G. Min, "Exchanged Crossed Cube: A Novel Interconnection Network for Parallel Computation," IEEE Trans. Parallel Distrib. Syst., vol. 24, no. 11, pp. 2211-2219, 2013.
  37. J. Al Azzeh, "Distributed Mutual Inter-unit Test Method for D-Dimensional Mesh-connected Multipro- cessors with Round-Robin Collision Resolution," Jordanian Journal of Computers and Information Tech- nology (JJCIT), vol. 05, no. 01, pp. 1-16, April 2019.
  38. J. Al Azzeh, "Improved Testability Method for Mesh-connected VLSI Multiprocessors," Jordanian Jour- nal of Computers and Information Technology (JJCIT), vol. 04, no. 02, pp. 116-128, August 2018.
  39. J. Al Azzeh, "Fault-tolerant Routing in Mesh-connected Multicomputers Based on Majority-operator- produced Transfer Direction Identifiers," Jordan Journal of Electrical Engineering, vol. 03, no. 02, pp. 102-111, April 2017.