CS5052: Data-Intensive Systems
This module is offered in 2024-25.
CS5052 is an advanced research focused module, which presents the programming paradigms, algorithmic techniques and design principles for large-scale distributed systems, such as those utilised by companies such as Google, Amazon and Facebook. It has a strong systems research flavour, which includes areas such as operating systems, databases, distributed systems and networking.
Please note the format of the course. The first few weeks of lectures will be presented by the lecturer and will introduce ground basics in distributed systems and distributed algorithms. The remainder of the class will be run as a series of student-led seminars based on the latest research in the field. Students will read, present and discuss several papers taken from a range of topic areas. In addition, there will at least one programming assignment.
Aims
- To present the programming paradigms, algorithmic techniques and design principles for large-scale distributed systems.
- To teach students how to engineer and work with systems which need to process big data.
Learning Outcomes
On successful completion of this module, the student should:
- be aware of programming paradigms, algorithmic techniques and design principles for large-scale distributed systems
- be aware the current research in the field
- have gained practical experience in developing systems
- have gained experience in reading, analysing, and discussing research papers
- have gained experience in preparing and giving presentations
Syllabus
- Distributed systems architecture and design
- Replication and fault tolerance
- Storage systems
- Coordination algorithms, e.g. Paxos
- Scheduling algorithms
- Cloud computing and virtualisation
- Programming models for big data, e.g. MapReduce and Spark
- Stream processing
- Decentralised systems, e.g. Chord
- Incentive-based systems, e.g. BitTorrent
- Social computing, e.g., crowdsourcing techniques
Compulsory Elements
This module has the following compulsory elements in addition to those common to all modules (mark of 4 in each assessment component):
- attend at least 70% of all discussion sessions