Mar 17, 2016 : HPC Code Modernization Workshop for Advanced HPC Programmers
This is a 2-day workshop targeted at HPC/Parallel programmers to enable them with state of the art tools and techniques to enhance application performance. During the course, emphasis is on Intel® Processors and Tools’ road-map and features and their usage to improve program performance. Participants can bring their HPC application (C/C++, with test case, 8-10 mins runtime) for hands-on Code Modernization.
Dates : Mar 17, 18, 2016.
Location : RISE Lab.
Detailed Agenda:
Day 1
- Introduction to Code Modernization – what is it / why is it needed?
- Intel’s Road-Map for HPC - Intel® Multi-core and Many-core Processors for HPC – Evolution and Road-map; Xeon-Phi and its operational modes
- Intel Parallel Studio XE 2016 – Intel® Compilers & Tools; Performance Libraries; Tools for Cluster – Intel MPI and ITAC; Profiling tools; Standards based Intel Parallel Programming Models;
- Single Thread Optimization – General Optimization Techniques, Optimization Switches, Profile Guided Optimization, Inter-Procedural Optimization, Floating-Point Optimization, Use of Performance Libraries – Math Kernel Library(MKL), Thread Building Blocks(TBB), Intel Performance Primitives (IPP)
- LAB1: Intel® Compiler – Compiler Optimization features, use of performance libraries – Intel® Threading Building Blocks (TBB), Intel Math Kernel Library and Intel Performance Primitives (IPP),
- Vectorization – Need for Vectorization, Vector Hardware and Instruction sets, Reading compiler reports, Automatic-Guided and Low level vectorization with examples, Compiler vector options, Intel® Vector options, OpenMP 4.0 SIMD, Best Practices for Vectorization
- LAB2: Vector Analyzer – Tool features, Combining tool inference with compiler report, Memory access patterns, trip count analysis
Day 2 :
- OpenMP – History, Task Parallelism on Shared Memory Applications
Thread Spawning, Work-sharing, Data Environment, Synchronization.; Reduction; Task and Taskwait; Thread Affinity; OpenMP4.0 features – Cancel, SIMD (vectorization topic), target, teams, user defined reduction etc.
- LAB3: Intel Advisor and VTune Amplifier – Performance profiler; Annotations for parallelism; Core utilization and re-annotations. Intel® Inspector XE – Memory and Threading Debugger
- Message Passing Interface – New Generation MPI Library for scaling to Exascale, Dynamic Communication Patterns, Heterogenous Systems with Multi-and Many-core, Fault tolerance, New Features in MPI-3 such as Non-blocking collectives, Neighborhood collectives, One-sided Communication enhancements, Matched mPI_MProbe; Process Topology - Cartesian Topology, Grid Topology; MPI error handling
- LAB4: Intel Trace Analyzer and Collector (ITAC) – Tool features, Transport-compute patterns and balancing
- Performance Flow – Flow-charts for use of tools and techniques to gain enhancements in performance; trade off analysis and deriving a thumb-rule for an application