CS4570 - Machine Learning for Software Engineering (2024/25 Q2)
General description
Software repositories archive valuable software engineering data, such as source code, execution traces, historical code changes, mailing lists, and bug reports. This data contains a wealth of information about a project’s status and history. By doing data science on software repositories, researchers can gain an empirically based understanding of software development practices, and practitioners can better manage, maintain, and evolve complex software projects.
In recent years, the advances in Machine Learning and AI technologies, as demonstrated by the successful application of Deep Neural Networks in various domains did not go unnoticed in the field of Software Engineering. Researchers have applied DNNs to tackle issues such as automated program repair, code summarization, code completion, code structure representation, etc.
CS4570 is a seminar course that aims to give students a hands-on approach to applying neural networks and NLP techniques to solve existing SE problems.
Learning Objectives
This course will enable students to:
- Understand current literature in the area of software analytics and machine learning for software engineering.
- Apply machine learning / deep learning algorithms to solve software engineering tasks.
- Apply software analytics techniques to extract actionable software engineering insights.
- Evaluate model performance on a software engineering task.
Before you decide to join the course
We expect students to have experience with ML already. We also expect them to have some interest in program analysis. This is NOT an introductory course. Here are the requirements:
- Experience with programming is required (preferably Python)
- Experience with machine learning and specifically deep learning techniques is a must-have.
- Familiarity with transformer-based language models and modern NLP is needed
Recommended:
- Experience with research methods is nice to have
- Taking the NLP course before this class is recommended
Course Organization
-
5 ECTS: This means that you need to devote at least 140 hours of study for this course, per person. That is 17.5 hours per week for a duration of 8 weeks. Two hours is for the class and the rest should be spent on reading papers and working on the projects.
-
Reading sessions: As seminars are a main part of this course, class attendance and engagement in discussions are vital. Per session, we will be discussing papers (presentations are given either by the lecturer or by teams) in terms of techniques, insights, and impact.
-
Action items: Before each lecture, you must read and prepare questions about the papers that will be discussed during the lecture. Please keep in mind that you are attending this course on a voluntary basis. Coming to the classroom unprepared will not be the best use of your time, so do your homework first!
-
Lecturers: The course is supervised by Maliheh Izadi, who is responsible for the content and the project.
-
Course deliverables: To finish the course you will need to:
- Give a 30min presentation on one paper plus 15 min for QA.
- Apply ML to a software engineering problem
- Provide mid-term project progress, plus weekly progress reports to TAs on the progress of the project
- Document your project’s results in the form of a research paper
- Give a final presentation about your project, plus individual Q&A
- Provide peer feedback for two projects from other groups
-
Groups: You will work in groups of 6-7 people (may need to adjust the number based on the total number of students). CS and DSAIT students are already assigned to 13 groups. If you are a new student from another program and do not have a group, you are free to choose your group partners during the first lecture, hence presence in the first lecture is mandatory, otherwise you cannot contribute to a team and will fail the class.
-
Labs: you are encouraged to spend a minimum of 4 hours per week working on your project with your teammates. No feedback will be provided during lab hours by the lecturers.
The project
Lately, machine-learning techniques have been successfully tailored to many software engineering problems. For instance, intelligent code completion helps developers finish their programming tasks faster and more efficiently by decreasing the typing effort, providing type-correct solutions, and enabling them to explore APIs. InCoder, UniXcoder, and CoPilot are among some of the recent deep learning-based solutions for an enhanced software development experience. In this project, we aim to tailor language models to source code to solve software engineering tasks including code completion, type completion, and code summarization. Each group will fine-tune, perform few-shot learning, or prompt engineering to improve a pre-trained code model for a software engineering problem of their choosing. Then, they will evaluate the model outputs on a test set. You will use Python as the main language. More details about the project will be shared during the course.
Required action items for week 1:
- Form a group of 6-7 people with your classmates (if you are not in a group already) and choose a research paper for presentation. As a group, you will present a research paper together and do the course project.
- Add your group in the Google sheet shared here. Share your GitHub ID as well.
- The deadline for the last two items is Friday (Nov 15th) at 17:00. Afterward, you cannot take the class as you do not have a group.
Paper presentation and discussion guidelines
NOTE! Please refer to the sample template.
Contents
Lecturer
- Maliheh Izadi: Maliheh Izadi
Assistants
- Ali Al-kaswan
- Jonathan Katzy
- Daniele Cipolone
- Razvan Popescu
- Nadine Kuo
- Roham Koohestani
Assessment
The course grade will be calculated as:
- Project results in the form of a research paper (70%)
- Final presentation/discussion of results (20%)
- Individual Q&A in presentation (10%)
- One Research paper presentation during the course (Pass/Fail)
- Mid-term project report (Pass/Fail)
- Peer feedback on project reports (Pass/Fail)
Deadlines
- 13/12 at 17:00 PM: Mid-term progress report shared with TAs. Then, arrange a short meeting with your TA on Dec 13th or 16th to discuss your progress and ask your questions.
- 13/01 at 17:00 PM: Submission of the first version of project reports and code of own group to Brightspace and peer
- 17/01 at 11:30 AM: Submission of peer feedback on report and code of other groups on peer
- 20/01 at 17:00 PM: Submission of the revised version of the report and code of own group (incorporating the feedback from your peers)
Exam day
- 22/01 at 11:30 AM: Submit final project presentations to BrightSpace
- 22/01 at 13:45 PM: Final project presentations in Hall D@ta and Pi