TY - GEN
T1 - Improving the scalability of parallel jobs by adding parallel awareness to the operating system
AU - Jones, Terry
AU - Dawson, Shawn
AU - Neely, Rob
AU - Tuel, William
AU - Brenner, Larry
AU - Fier, Jeffrey
AU - Blackmore, Robert
AU - Caffrey, Patrick
AU - Maskell, Brian
AU - Tomlinson, Paul
AU - Roberts, Mark
PY - 2003
Y1 - 2003
N2 - A parallel application benefits from scheduling policies that include a global perspective of the application's process working set. As the interactions among cooperating processes increase, mechanisms to ameliorate waiting within one or more of the processes become more important. In particular, collective operations such as barriers and reductions are extremely sensitive to even usually harmless events such as context switches among members of the process working set. For the last 18 months, we have been researching the impact of random short-lived interruptions such as timer-decrement processing and periodic daemon activity, and developing strategies to minimize their impact on large processor-count SPMD bulk-synchronous programming styles. We present a novel co-scheduling scheme for improving performance of fine-grain collective activities such as barriers and reductions, describe an implementation consisting of operating system kernel modifications and run-time system, and present a set of empirical results comparing the technique with traditional operating system scheduling. Our results indicate a speedup of over 300% on synchronizing collectives.
AB - A parallel application benefits from scheduling policies that include a global perspective of the application's process working set. As the interactions among cooperating processes increase, mechanisms to ameliorate waiting within one or more of the processes become more important. In particular, collective operations such as barriers and reductions are extremely sensitive to even usually harmless events such as context switches among members of the process working set. For the last 18 months, we have been researching the impact of random short-lived interruptions such as timer-decrement processing and periodic daemon activity, and developing strategies to minimize their impact on large processor-count SPMD bulk-synchronous programming styles. We present a novel co-scheduling scheme for improving performance of fine-grain collective activities such as barriers and reductions, describe an implementation consisting of operating system kernel modifications and run-time system, and present a set of empirical results comparing the technique with traditional operating system scheduling. Our results indicate a speedup of over 300% on synchronizing collectives.
UR - http://www.scopus.com/inward/record.url?scp=84877078118&partnerID=8YFLogxK
U2 - 10.1145/1048935.1050161
DO - 10.1145/1048935.1050161
M3 - Conference contribution
AN - SCOPUS:84877078118
SN - 1581136951
SN - 9781581136951
T3 - Proceedings of the 2003 ACM/IEEE Conference on Supercomputing, SC 2003
BT - Proceedings of the 2003 ACM/IEEE Conference on Supercomputing, SC 2003
T2 - 2003 ACM/IEEE Conference on Supercomputing, SC 2003
Y2 - 15 November 2003 through 21 November 2003
ER -