BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:2.0 BEGIN:VEVENT DTSTART:20141118T163000Z DTEND:20141118T170000Z LOCATION:393-94-95 DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: The gap between the cost of moving data and the cost of computing=0A continues to grow, making it ever harder to design iterative solvers on=0A extreme-scale architectures. This problem can be alleviated by=0A alternative algorithms that reduce the amount of data movement. We=0A investigate this in the context of Lattice Quantum Chromodynamics=0A and implement such an alternative solver algorithm, based on domain=0A decomposition, on Intel(R) Xeon Phi(TM) co-processor (KNC) clusters. We demonstrate close-to-linear on-chip scaling to all 60 cores of the KNC. With a mix of single- and half-precision the domain-decomposition method sustains 400-500 Gflop/s per chip. Compared to an optimized KNC implementation of a=0A standard solver [1], our full multi-node domain-decomposition solver=0A strong-scales to more nodes and reduces the time-to-solution by a factor of 5. SUMMARY:Lattice QCD with Domain Decomposition on Intel(R) Xeon Phi(TM) Co-Processors PRIORITY:3 END:VEVENT END:VCALENDAR