Physical Implementation of MemPool, PULP's Manycore System (1M/1-2S)
- 1 Introduction
- 2 Project description
- 3 Milestones
- 4 Project Realization
- 5 Required skills
In a quest for high-performance computing systems, few architectural models retain the flexibility of many-core systems. Those systems integrate a very large number of small cores (hundreds, thousands), that work independently to execute highly-parallelizable algorithms.
A compromise is needed to push the core-count of such types of systems to its limits. A solution can be to prevent memory sharing between cores, that access private memory banks---this, however, impacts the programmability of the system. Memory sharing can be achieved through a cache hierarchy, which impacts the energy efficiency of the system through its non-negligible power consumption.
At IIS, PULP's many-core system is MemPool, which integrates 256 Snitch cores and 1MiB of shared-L1 memory. Our example system, in GlobalFoundries 22nm 22FDX technology, runs at 500MHz, and the L1 memory can be accessed by any of the cores through a high-throughput interconnection network with a round-trip latency of at most five cycles. This is a challenging physical implementation, that is limited by routing congestion and wire propagation delay, instead of more common gate delay. Because of this, placing-and-routing MemPool was a unique experience, which required diving deep into the technology properties and on the implementation flow.
The goal of this project is to tape-out a smaller version of MemPool-affectionately called MinPool-on a larger technology node, as a proof of concept. While a 256-core would probably be unfeasible at the technology node we are aiming, we are curious to know how far the design can be pushed at an older technology. Ultimately, the design resulting from this project shall lead to a scientific publication, be used to generate results for scientific publications, and/or be a basis for the work of other people. Hence, the design must be properly tested, the generated source code appropriately commented, and the project well documented.
The whole project can be subdivided into four main phases, which are described in detail in the following.
Part I: Familiarizing with the design
MemPool has been the work of the two advisors for this thesis for more than a year and has already led to three Semester/Master projects. As such, there is some familiarizing with the architecture and the code base that needs to be done at the start of this project.
Part II: Implementing MemPool's tile
MemPool's tile is the basis of its architecture, containing a few Snitch cores and a number of L1 memory accessible within one cycle. This module is the basis for building the MemPool cluster. It is, however, a "normal" module, which can be placed-and-routed as a rather dense macro. We need to determine how fast MemPool's tiles can be clock at the target technology node, also considering that they should be rather dense. The student can play with parameters such as the number of Snitch cores at each tile, the size of the instruction cache, and the size of the L1 memory per tile. This should also give the student a feel to the next step of the project.
Part III: Implementing MinPool
By implementing several MinPool tiles and interconnecting them cleverly, we can build the many-core cluster. We do not know how many of such tiles we can instantiate before degrading timing or latency too much. In MemPool, for example, we managed to implement 64 tiles, each with four cores and 16KiB of L1 memory. After this design exploration phase, the student and the advisors will decide a final set of parameters for MinPool, finally deciding how large should it be. Our goal is to have the system with the largest core-cound of the PULP systems so far.
Part IV: Making MinPool standalone
During this phase, some RTL is needed to make MinPool a stand-alone design. The student will need to integrate a FLL, a boot ROM, and a JTAG to access and initialize the system. While there are many IPs and know-how at IIS for that, this is also highly-dependant on the final design parameters for MinPool. At this phase, the student should also check whether his placed-and-routed design passes the sanity checks.
Throughout the project, a number of milestones have to be reached. In the following, a tentative list of expected milestones is provided, which might be modified during the first weeks of the work:
1. Install MemPool and test it on different configurations
2. Synthesize the tile
3. Place-and-route the tile as a macro
4. Synthesize the cluster, using the tile's black-box module
5. Place-and-route MinPool
6. Finalize the implementation flow.
Weekly meetings will be held between the student and the assistants. The exact time and location of these meetings will be determined within the first week of the project in order to fit the students and the assistants' schedule. These meetings will be used to evaluate the status and progress of the project. Besides these regular meetings, additional meetings can be organized to address urgent issues as well.
The student is required to write a weekly report at the end of each week and to send it to his advisors by email. The idea of the weekly report is to briefly summarize the work, progress, and any findings made during the week, to plan the actions for the next week, and to discuss open questions and points. The weekly report is also an important means for the student to get a goal-oriented attitude to work.
Naming Conventions: Adopting a consistent naming scheme is one of the most important steps in order to make your code easy to understand. If signals, processes, and entities are always named the same way, any inconsistency can be detected easier. Moreover, if a design group shares the same naming convention, all members would immediately feel at home with each other's code. The naming conventions we follow in the PULP project are available here.
Documentation is an important and often overlooked aspect of engineering. A final report has to be completed within this project.
The common language of engineering is de facto English. Therefore, the final report of the work is preferred to be written in English.
Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Inkscape or Tgif, or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff. If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be downloaded here.
There will be a presentation (20min presentation and 5min Q&A) at the end of this project in order to present your results to a wider audience. The exact date will be determined towards the end of the work.
To work on this project, you will need:
- to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL). Having followed the VLSI 1 course is recommended.
- to have prior knowledge of hardware design and computer architecture
- to be motivated to work hard on a super cool open-source project