(Received: 2019-04-27, Revised: 2019-06-30 , Accepted: 2019-07-22)
In this paper, a packet switch architecture for mesh-connected multiprocessors based on the use of a set of input FIFO buffers and an output register matrix controlled by a novel distributed timing-based scheduling scheme is proposed. Simple static routing is assumed, with each packet split into a set of independently routed w-bit-wide flits. The device achieves at least 78% throughput for uniformly distributed traffic and an asymptotic higher bound of 100%. In contrast to the state-of-the-art VOQ-based switch architectures, the proposed switch is shown to reach its maximum throughput with no internal speedup required and has an order of magnitude lower hardware com- plexity. Compared to existing buffered crossbar non-VOQ switches with typical flit scheduling mechanisms, the proposed device demonstrates slightly higher throughput and substantially shorter delays in some practically im- portant cases.

[1] S. Misra and S. Goswami, Network Routing: Fundamentals, Applications and Emerging Technologies, Wiley Telecom, 2014.

[2] A. A. Jerraya and W. Wolf, Multiprocessor Systems-on-Chips, San Francisco, Elsevier, Inc., 2005.

[3] Z. Yu, R. Xiao et al., "A 16-Core Processor with Shared-Memory and Message-Passing Communica- tions," IEEE Trans. Circ. Syst. I: Regular Papers, vol. 61, no. 4, pp. 1081-1094, 2014.

[4] Tilera Corp., "Tile Processor Architecture Overview for The TILE-Gx Series,"[Online], Available:, (ac- cess date: 30.06.2019).

[5] A. Olofsson, "Epiphany-V: A 1024 Processor 64-bit RISC System-on-Chip,"[Online]: Available:, (access date: 30.06.2019).

6] P. Lotfi-Kamran, M. Modarressi and H. Sarbazi-Azad, "An Efficient Hybrid-Switched Network-on-Chip for Chip Multiprocessors," IEEE Trans. Comput., vol. 65, no. 5, pp. 1656-1662, 2016.

[7] G. Chen, M. A. Anders et al., "A 340 mV-to-0.9 V 20.2 Tb/s Source-synchronous Hybrid Packet/Circuit- switched 16×16 Network-On-Chip in 22 nm Tri-Gate CMOS," IEEE J. Solid-St. Circ., vol. 50, no. 1, pp. 59-67, 2015.

[8] A. Mazloumi and M. Modarressi, "A Hybrid Packet/circuit-switched Router to Accelerate Memory Ac- cess in NoC-based Chip Multiprocessor," Proc. of Design, Automation and Test in Europe Conference and Exhibition (DATE), pp. 908-911, 2015.

[9] M. H. Foroozannejad, M. Hashemi et al., "Time-scalable Mapping for Circuit-switched GALS Chip Mul- tiprocessor Platforms," IEEE Trans. Comput.-aided Design of Integr. Circ. and Syst., vol. 33, no. 5, pp. 752-762, 2014.

[10] M. Karol, M. Hluchyj and S. Morgan, "Input Versus Output Queueing on a Space-Division Packet Switch," IEEE Trans. Commun., vol. 35, no. 12, pp. 1347-1356, 1987.

[11] L. Deng, W. S. Wong et al., "Delay-constrained Input-queued Switch," IEEE J. Selected Areas Commun., vol. 36, no. 11, pp. 2464-2474, 2018.

[12] K. Kang, K.-J. Park, L. Sha and Q. Wang, "Design of a Crossbar VOQ Real-time Switch with Clock-driven Scheduling for a Guaranteed Delay Bound," Real-time Systems, vol. 49, no. 1, pp. 117-135, 2013.

[13] M. J. Neely, E. Modiano and Y. -S. Cheng, "Logarithmic Delay for NxN Packet Switches under the Crossbar Constraint," IEEE/ACM Trans. Networking, vol. 15, no. 3, pp. 657-668, 2007.

[14] S. Durkovic and Z. Cica, "Birkhoff-von Neumann Switch Based on Greedy Scheduling," IEEE Comput. Archit. Letters, vol. 17, no. 1, pp. 13-16, 2018.

[15] C. -S. Chang, D. -S. Lee and C. -Y. Yue, "Providing Guaranteed Rate Services in the Load Balanced Birkhoff-von Neumann Switches," IEEE/ACM Trans. Networking, vol. 14, no. 3, pp. 644-656, 2006.

[16] H. -I. Lee and S. -W. Seo, "Matching Output Queueing with a Multiple Input/Output-queued Switch," IEEE/ACM Trans. Networking, vol. 14, no. 1, pp. 121-132, 2006.

[17] Y. Tamir and G. Frazier, "High Performance Multi-queue Buffers for VLSI Communication Switches," Proc. of the 15th Annu. Symp. Comput. Archit., pp. 343-354, June 1988.

[18] T. Anderson, S. S. Owicki, J. B. Saxe and C. P. Thacker, "High-speed Switch Scheduling for Local-area Networks," ACM Trans. Comput. Syst., vol. 11, no. 4, pp. 319-352, 1993.

[19] N. McKeown, "The iSLIP Scheduling Algorithm for Input-queued Switches," IEEE/ACM Trans. Net- working, vol. 7, no. 2, pp. 188-201, April 1999.

[20] Y. Shen, S. S. Panwar and H. J. Chao, "SQUID: A Practical 100% Throughput Scheduler for Crosspoint Buffered Switches," IEEE/ACM Trans. Networking, vol. 18, no. 4, pp. 1119-1131, August 2010.

[21] N. McKeown, V. Anantharam and J. Walrand, "Achieving 100% Throughput in an Input-queued Switch," Proc. of the 15th IEEE INFOCOM, pp. 296-302, San Francisco, CA, USA, Mar. 1996.

[22] J. Chao, "Saturn: A Terabit Packet Switch Using Dual Round Robin," IEEE Commun. Mag., vol. 38, no. 12, pp. 78-84, Dec. 2000.

[23] S. Mneimneh, "Matching from the First Iteration: An Iterative Switching Algorithm for an Input-queued Switch," IEEE/ACM Trans. Networking, vol. 16, no. 1, pp. 206-217, Feb. 2008.

[24] B. Hu, K. L. Yeung, Q. Zhou and C. He, "On Iterative Scheduling for Input-queued Switches with a Speedup of 2-1/N," IEEE/ACM Trans. Networking, vol. 24, no. 6, pp. 3565-3577, 2016.

[25] B. Hu and K. L. Yeung, "Feedback-based Scheduling for Load Balanced Two-Stage Switches," IEEE/ACM Trans. Networking, vol. 18, no. 4, pp. 1077-1090, Aug. 2010.

[26] C. -S. Chang, D. -S. Lee and Y. -S. Jou, "Load Balanced Birkhoff-von Neumann Switches, Part I: One- stage Buffering," Comput. Commun., vol. 25, no. 6, pp. 611-622, 2002.

[27] Y. Chen, "Cell Switched Network-on-Chip Candidate for Billion-transistor System-on-Chips," Proc. of IEEE Int’l. Soc. Conf., pp. 57-60, 2006.

[28] A. Olofsson, "Mesh Network," US Patent, no. 8531943 B2, Sep. 10, 2013.

[29] M. Lin and N. McKeown, "The Throughput of a Buffered Crossbar Switch," IEEE Commun. Let., vol. 9, no. 5, pp. 465-467, 2005.

[30] M. Nabeshima, "Performance Evaluation of a Combined Input- and Crosspoint-queued Switch," IEICE Trans. Commun., vol. E83-B, pp. 737-741, 2000.

[31] I. V. Zotov, "Distributed Virtual Bit-slice Synchronizer: A Scalable Hardware Barrier Mechanism for N- dimensional Meshes," IEEE Trans. Comput., vol. 59, no. 9, pp. 1187-1199, Sep. 2010.

[32] I. V. Zotov et al., "The VisualQChart Simulation Environment," Computer Program Certificate RU 2007611310, appl. 13.02.2007, publ. 27.03.2007.

[33] D. Zydek, H. Selvaraj and L. Gewali, "Synthesis of Processor Allocator for Torus-based Chip Multipro- cessors," Proc. of the 7th Int’l. Conf. on Information Technology: New Generations, pp. 13-18, 2010.

[34] A. Samad, M. Q. Rafiq and O. Farooq, "Performance Evaluation of Task Assignment Algorithms in Cube-based Multiprocessor Systems," Proc. of the 1st Int’l. Conf. on Emerging Trends and Applications in Computer Science, pp. 48-51, 2013. 

[35] J. M. Camara, M. Moreto et al., "Twisted Torus Topologies for Enhanced Interconnection Networks," IEEE Trans. Parallel Distrib. Syst., vol. 21, no. 12, pp. 1765-1778, 2010.

[36] K. Li, Y. Mu, K. Li and G. Min, "Exchanged Crossed Cube: A Novel Interconnection Network for Parallel Computation," IEEE Trans. Parallel Distrib. Syst., vol. 24, no. 11, pp. 2211-2219, 2013.

[37] J. Al Azzeh, "Distributed Mutual Inter-unit Test Method for D-Dimensional Mesh-connected Multipro- cessors with Round-Robin Collision Resolution," Jordanian Journal of Computers and Information Tech- nology (JJCIT), vol. 05, no. 01, pp. 1-16, April 2019.

[38] J. Al Azzeh, "Improved Testability Method for Mesh-connected VLSI Multiprocessors," Jordanian Jour- nal of Computers and Information Technology (JJCIT), vol. 04, no. 02, pp. 116-128, August 2018.

[39] J. Al Azzeh, "Fault-tolerant Routing in Mesh-connected Multicomputers Based on Majority-operator- produced Transfer Direction Identifiers," Jordan Journal of Electrical Engineering, vol. 03, no. 02, pp. 102-111, April 2017.