Teaching

  • Graduate course: CS4570 - Machine Learning for Software Engineering (Editions: TA/lecturer in 2020 to 2022, responsible professor for 2023, 2024, and 2025), ~100 students, TU Delft.
  • Undergraduate course: TI3115TU - Databases and Software Engineering minor (Editions: lecturer for 2024, responsible professor for 2022 and 2025), ~160-220 students, TU Delft.
  • Undergraduate course: Software Engineering Methods (lecturer for 2024 edition), ~500 students, TU Delft.
  • Undergraduate course: Mentorate (Mentor for 2023 edition), ~30 students, TU Delft.

ML4SE General description

Goal: CS4570 aims to give students a hands-on approach to applying deep neural networks and modern NLP techniques, especially large language models (LLMs) based on the Transformer architecture to solve existing SE problems.

Software analytics and big data have transformed software development by leveraging vast repositories of engineering data—such as source code, bug reports, execution traces, and historical changes—to enhance software quality, maintainability, and evolution. The rise of machine and deep learning, particularly transformers and large language models (LLMs), has further expanded these capabilities, enabling automated code generation, bug detection, anomaly identification, type inference, refactoring, code summarization, and program repair. These AI-driven techniques not only boost developer productivity and reduce technical debt but also facilitated development workflows, marking a shift toward AI-driven software engineering.

Machine Learning for Software Engineering (ML4SE) applies AI-driven solutions, including statistical models and deep learning algorithms, to enhance various stages of the software development lifecycle. Transformer-based large language models, such as those behind tools like Copilot, have significantly advanced code understanding and generation. This course focuses on improving developer productivity and software processes’ efficiency by tailoring LLMs to specifically address software engineering challenges. Additionally, we tackle fundamental issues within LLMs themselves, such as evaluations, memorization, compression, support for low-resource languages, and other limitations.

Learning Objectives

This course will enable students to:

  • Explain current literature in the area of software analytics and machine learning for software engineering.
  • Apply machine learning / deep learning algorithms to solve software engineering tasks.
  • Apply software analytics techniques to extract actionable software engineering insights.
  • Evaluate model performance on a software engineering task.

Before you decide to join the course

We expect students to have experience with ML (especially Transformers) already. We also expect them to have some interest in program analysis. This is NOT an introductory course. Here are the requirements:

  • Experience with programming is required (preferably Python)
  • Experience with machine learning and, specifically, deep learning techniques is a must-have.
  • Familiarity with transformer-based language models and modern NLP is needed
  • Experience with research methods is nice to have

Course Organization

  • 5 ECTS: This means that you need to devote at least 140 hours of study for this course, per person. That is 17.5 hours per week for a duration of 8 weeks. Two hours is for the class, and the rest should be spent on reading papers and working on the projects.

  • Reading sessions: As seminars are a main part of this course, class attendance and engagement in discussions are vital. Per session, we will be discussing papers (presentations are given either by the lecturer or by teams) in terms of techniques, insights, and impact.

  • Action items: Before each lecture, you must read and prepare questions about the papers that will be discussed during the lecture. Please keep in mind that you are attending this course on a voluntary basis. Coming to the classroom unprepared will not be the best use of your time, so do your homework first!

  • Course deliverables: To finish the course you will need to:

    • Give a 30min presentation on one paper plus 15 min for QA.
    • Apply ML to a software engineering problem
    • Provide mid-term project progress, plus weekly progress reports to TAs on the progress of the project
    • Document your project’s results in the form of a research paper
    • Give a final presentation about your project, plus individual Q&A
    • Provide peer feedback for two projects from other groups
  • Groups: You will work in groups of 4-5 people (may need to adjust the number based on the total number of students). CS and DSAIT students are already assigned to 13 groups. If you are a new student from another program and do not have a group, you are free to choose your group partners during the first lecture, hence presence in the first lecture is mandatory, otherwise you cannot contribute to a team and will fail the class.

  • Labs: you are required to spend a minimum of 4 hours per week working on your project with your teammates. No feedback will be provided during lab hours by the lecturers, but you can set on-demand weekly meetings with your TAs.

The project

In this project, we aim to tailor language models to source code to solve software engineering tasks, including but not limited to, code completion, type completion, and code summarization. Each group will fine-tune, perform few-shot learning, or prompt engineering to improve a pre-trained code model for a software engineering problem of their choosing. Alternatively, you can also target a limitation of this area, such as evaluation and address it. Then, they will evaluate the model outputs on a test set. You will use Python as the main language. More details about the project will be shared during the course.

Required action items for week 1:

  • Form a group of 4-5 people with your classmates (if you are not in a group already) and choose a research paper for presentation. As a group, you will present a research paper together and do the course project.
  • Add your group in the Google sheet shared in the classroom.
  • The deadline for the last two items is Friday (Nov 15th) at 17:00. Afterward, you cannot take the class as you do not have a group.

Paper presentation and discussion guidelines

Action items before your presentation

  • Read your selected paper carefully, and discuss it within your group to help each other get a good grasp of the problem, method, and results.
  • Prepare a presentation to present the paper in the classroom.
  • Compile a list of questions and discussion points before the presentation.
  • One working day before your presentation, share the presentation, discussion points, and questions with your TAs and lecturers using your Mattermost group channel (each group has its own channel). They will provide you with feedback if needed.
  • At your will, choose one of your teammates as the moderator whose task is to manage the flow of discussion.
  • At your will, choose another teammate as a transcriber who can take notes of the discussion. You will use these notes in your report of the discussion.

Be prepared to answer the audience’s questions about the work you present!

For the presentation itself, make sure to at least include the following information about the paper. The rest depends on your creativity and style.

Require content in the presentation

  • Full title of the paper
  • Paper information including (1) Appeared in (name of conference/journal), (2) Cited citation count, (3) Why is it important?
  • People: A brief profile of the main authors
  • Motivation: Why do this research? Why is it needed?
  • The problem: What exactly is the problem that is solved?
  • Research method: What does the paper do?
  • Dataset or Benchmark: _What data has been used to conduct the experiments?
  • ML4SE techniques used: How does the paper use ML techniques to solve the above problem?
  • Evaluation: How do the authors evaluate their work?
  • Results: What is the highlight of the paper’s findings?
  • Implications: Why are the results important? What is their impact?
  • Technical questions: A list of questions to gauge the audience’s understanding of the paper.
  • Discussion points: A list of questions to trigger general discussions about the paper.
  • Summary of the discussion: to be filled in after the discussion by the discussion group

Action items after your presentation

Each responsible group must document the above, add a summary of highlights of the discussion in the classroom and return the report to the instructor for each discussed paper.

Lectures and Seminar Contents

TBA for 2026

Teaching Assistants

TBA for 2026

Assessment

Refer to the study guide.

Deadlines

TBA for 2026.

Exam day

TBA for 2026.