Evaluating memory access pattern specializations in OoO, server-grade cores (M)
- 1 Introduction
- 2 Project Description
- 2.1 Part I – Familiarizing with the simulator, baseline model, and state-of-the-art techniques
- 2.2 Part II – Implementation of the stream-management coprocessor
- 2.3 Part III – Validation of the model against published results
- 2.4 Part IV (optional) – Implementation and validation of alternative solutions
- 2.5 Part V (optional) – Open-sourcing the implementation
- 3 Project Management
- 4 Meetings
- 5 Project Report
- 6 Presentation
- 7 Required Skills
- 8 Project Supervisors
- 9 References
In recent research , we explored the opportunity of adding streaming semantic to the processor memory architecture. This was done in in-order CPUs with the concept of stream-semantic registers. These registers implicitly track a memory stream and effectively offload stream access management from the main core pipeline, achieving significant benefits on in-order CPUs. Similar ideas have been proposed in the context of more complex, out-of-order (OoO) cores , using an ISA extension as the interface to define memory access streams and using a streammanagement coprocessor to map streams into registers and manage memory access. The goal of this MSc thesis project is to implement an evaluation workbench to further analyze these state-of-the-art techniques and related tradeoffs when implemented in superscalar, OoO CPUs. This analysis will serve as a base for a detailed study of the interaction of the streammanagement coprocessor with the memory subsystem and the OoO cores, as well as an exploration of new ideas, such as offloading address computation for indirect accesses into the memory controller.
The primary goal of this project is to develop and validate an implementation of a stream coprocessor similar to the state-of-the-art proposals [1,2] in the gem5 simulator , on top of an existing model of an ARM server-grade multicore processor. The project will be comentored by researchers from Huawei's Zurich Research Center, who will provide the baseline model and support on extending it. This gem5 model developed during the project will be then useful to explore in greater detail the microarchitectural implementation of such techniques and possibly evaluate new solutions. The project can be roughly split into the following main parts:
Part I – Familiarizing with the simulator, baseline model, and state-of-the-art techniques
The first few weeks will be devoted to understanding relevant parts of the gem5 simulator and baseline model and to reviewing and understanding the techniques we want to reproduce. The conclusion of this part will be implementing a simple component in gem5 to do a "best case" study of the achievable performance improvements using the stream semantic concepts. This simplified implementation will mainly involve modifications in the ISA description and implementation to support the new stream related instructions as well as in the Load/Store Unit of the OoO model and in the hardware data prefetchers.
Part II – Implementation of the stream-management coprocessor
This part aims at improving the development made during part I, extending it to make it more realistic. The goal is to understand and overcome the challenges linked to integrating the stream-management coprocessor into a OoO pipeline, especially regarding the speculation. Most parts of the back-end of the OoO model's pipeline will need to be modified and details not fully described in the papers might need to be architected. This part will also include evaluation of the performance difference between the limit study and the more realistic implementation.
Part III – Validation of the model against published results
In order to validate the robustness of the implementation, results will be compared with the literature. It is not expected those will perfectly match, but we expect to see the same trends. This activity will partly proceed concurrently with Part II.
Part IV (optional) – Implementation and validation of alternative solutions
During the realization of parts I and II, new ideas might get discovered and if time permits, those ideas will be implemented and tested against the base model. This part is optional as it will strongly depends on the outcomes of the previous parts. If new interesting solutions can be validated, the mentors will advise on preparing a paper submission to an appropriate venue for publication.
Part V (optional) – Open-sourcing the implementation
We encourage open-sourcing the gem5 components implemented in the projects to the gem5 community through the official reviewing process. However, this step will require further review and code cleanup and it might not be possible to complete it within the timeline for the thesis project, so it is not a requirement for successfully executing the project.
This project will be co-mentored by ETH IIS and Huawei's Zurich Research Center. Huawei will provide support with the simulation infrastructure.
There will be a regular schedule of meetings (e.g., weekly), plus any additional on-demand meetings to address specific issues or discussions. Depending on the COVID-related measures, the meetings will take place online via a conferencing platform, or at ETH or Huawei office, as agreed between the student and the supervisors.
Documentation is an important and often overlooked aspect of engineering. A final report has to be completed within this project. The common language of engineering is de facto English. Therefore, the final report of the work is preferred to be written in English. Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Inkscape or Tgif, or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS and Huawei staff.
There will be a presentation (20min presentation and 5min Q&A) at the end of this project in order to present your results to a wider audience. The exact date will be determined towards the end of the work.
- Necessary skills for successfully taking on the project:
- Good proficiency with modern C++ (at last C++11) and Python 3
- Understanding of basic microarchitectural concepts (pipelining, out-of-order execution, caches, prefetching)
- Willingness to "get your hands dirty" and implement advanced techniques in gem5
- Ability to work independently on the implementation and raise questions and issues to get help and guidance as necessary
- Previous experience with gem5 (desirable, but not necessary)
- Previous experience with adding opcodes to LLVM (desirable, but not necessary)
- Davide Basilio Bartolini (Huawei): email@example.com
- Paul Scheffler (IIS): firstname.lastname@example.org