AISE lab
Meet our team

Assistant Professor
Computer Science
AISE research explores several topics at the intersection of AI and Software Engineering, including:
- LLMs for Code Generation, Summarization, Refactoring, and Bug Fixing: Leverage LLMs to accelerate various development tasks.
- Longitudinal Evaluation and Benchmarking of Code LLMs: Study the long-term performance of LLMs across languages, tools, and developer workflows. Develop comprehensive benchmarks to assess and compare LLM effectiveness in diverse software engineering tasks.
- Autonomous Software Engineering Agents: Build intelligent, task-driven agents capable of independently executing and managing software engineering workflows.
- Mitigating Memorization and Hallucination: Investigate strategies to reduce factual inaccuracies, hallucinated code, and overfitting in LLM outputs, ensuring reliability in practical applications.
- Human-AI Collaboration in IDEs: Design intuitive IDE interfaces and workflows that foster seamless collaboration between developers and GenAI assistants, maximizing productivity and usability.
- Explainability in Code LLMs: Improve the transparency of LLM-generated suggestions to enhance developer trust and facilitate understanding of model behavior.
- Domain Adaptation and Personalization: Fine-tune models to specific domains or codebases to improve contextual relevance, precision, and performance.
- Repository Management: Develop techniques to automatically associate commits with relevant issues, triage, assignment, and resolve issues or create human-readable documentation from codebases, commit history, and other project artifacts.
PhD Students

PhD candidate (Sep'22)
Former Msc student
Privacy/Security in LLMs
- Code Red! On the Harmfulness of Applying Off-the-shelf Large Language Models to Programming Tasks, ACM International Conference on the Foundations of Software Engineering (FSE), main track, 2025
- How Much Do Code Language Models Remember? An Investigation on Data Extraction Attacks before and after Fine-tuning, IEEE/ACM 22th International Conference on Mining Software Repositories (MSR), 2025
- Traces of Memorisation in Large Language Models for Code, IEEE/ACM 46th International Conference on Software Engineering (ICSE), 2024
- Towards Safe, Secure, and Usable LLMs4Code, IEEE/ACM 46th International Conference on Software Engineering (ICSE), Doctoral Symposium, 2024
- Extending Source Code Pre-trained Language Models to Summarise Decompiled Binaries, IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 2023
- Stacc: Code comment Classification using SentenceTransformers, IEEE/ACM 2nd International Workshop on Natural Language-Based Software Engineering (NLBSE), 2023
- Targeted Attack on GPT-Neo for the SATML Language Model Data Extraction Challenge, The 1st IEEE Conference on Secure and Trustworthy Machine Learning, 2023
- The (ab) Use of Open Source Code to Train Large Language Models, IEEE/ACM 2nd International Workshop on Natural Language-Based Software Engineering (NLBSE), 2023

PhD candidate (Jan'23)
Multilinguality in LLMs
- A Qualitative Investigation into LLM-Generated Multilingual Code Comments and Automatic Evaluation Metrics, The International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE), 2025
- The Heap: A Contamination-Free Multilingual Code Dataset for Evaluating Large Language Models, The 2nd ACM International Conference on AI Foundation Models and Software Engineering (FORGE), 2025
- An Exploratory Investigation into Code License Infringements in Large Language Model Training Datasets, The 2nd ACM International Conference on AI Foundation Models and Software Engineering (FORGE), 2024
- Language models for code completion: A practical evaluation, IEEE/ACM 46th International Conference on Software Engineering (ICSE), 2024
- Programming Language Models in Multilingual Settings, IEEE/ACM 46th International Conference on Software Engineering (ICSE), Doctoral Symposium, 2024
- On the Impact of Language Selection for Training and Evaluating Programming Language Models, IEEE 23rd International Working Conference on Source Code Analysis and Manipulation (SCAM), 2023

PhD candidate (Mar'24)
Evaluation in LLMs
- Long Code Arena: a Set of Benchmarks for Long-context Code Models, under review, 2025

PhD candidate (Apr'24)
Human-AI Interaction in IDE
- Human-AI Experience in Integrated Development Environments: A Systematic Literature Review, under review, 2025
- The Design Space of in-IDE Human-AI Experience, under review, 2024
- In-ide Human-AI Experience in the Era of Large Language Models; a Literature Review, The 1st Workshop on Integrated Development Environments (IDE), 2024

PhD candidate (Sep'24)
LLM Integration in IDE
- Automating the Detection of Code Vulnerabilities by Analyzing GitHub Issues, The 2nd International Workshop on LLM4Code, co-located with ICSE, 2025
- Enhancing Large Language Model Integration in Integrated Development Environments, ACM International Conference on the Foundations of Software Engineering (FSE), Doctoral Symposium, 2025

PhD candidate (Dec'24)
AI/AI Interaction in IDE
- Enhancing Human-IDE Interaction in the SDLC using LLM-based Mediator Agents, The 1st International Workshop on AI-Augmented SDLC, co-located with (FSE), 2025
- Mediating between Human Programmers and Integrated Development Environments using LLM-based Agents, ACM International Conference on the Foundations of Software Engineering (FSE), Doctoral Symposium, 2025

PhD candidate (Feb'25)
Former BSc/Msc student
Robust Datasets for LLM4Code
- The Heap: A Contamination-Free Multilingual Code Dataset for Evaluating Large Language Models, The 2nd ACM International Conference on AI Foundation Models and Software Engineering (FORGE), 2025
- An Exploratory Investigation into Code License Infringements in Large Language Model Training Datasets, The 2nd ACM International Conference on AI Foundation Models and Software Engineering (FORGE), 2024
- Language models for code completion: A practical evaluation, IEEE/ACM 46th International Conference on Software Engineering (ICSE), 2024
Research Assistants/Honour Students

Scientific Dev (Jan'24)
BSc student
Guaranties in GenAI
- Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol, under review, 2025
- Leveraging large language models for enhancing the understandability of generated unit tests, IEEE/ACM 47th International Conference on Software Engineering (ICSE), main track, 2025
- HyperSeq: A Hyper-Adaptive Representation for Predictive Sequencing of States, ACM International Conference on the Foundations of Software Engineering (FSE), 2025
- Rethinking IDE Customization for Enhanced HAX: A Hyperdimensional Perspective, The 2nd Workshop on Integrated Development Environments (IDE), 2025

Research Assistant
BSc/MSc student
Refactoring via LLMs
- A Qualitative Investigation into LLM-Generated Multilingual Code Comments and Automatic Evaluation Metrics, The International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE), 2025
MSc Students



Visitors

Msc student from Italy
Memorization in LLM4sCode
Next: SE @ Stema
- How Much Do Code Language Models Remember? An Investigation on Data Extraction Attacks before and after Fine-tuning, IEEE/ACM 22th International Conference on Mining Software Repositories (MSR), 2025
Alumni

Former BSc/MSc student
Onboarding Agents
Next: Intern @ Microsoft
- A Multi-agent Onboarding Assistant based on Large Language Models, Retrieval Augmented Generation, and Chain-of-Thought, ACM International Conference on the Foundations of Software Engineering (FSE), 2025

Scientific Dev (Sep'23)
Former BSc student
Smart Trigger Models
Next: ML engineer @ JetBrains
- A Transformer-based Approach for Smart Invocation of Automatic Code Completion, The 1st ACM International Conference on AI-powered Software (AIware), 2024

Former BSc/MSc student
AutoCompletion via LLMs
Next: Software Engineer @ Teifi
- Investigating the performance of language models for completing code in functional programming languages: a haskell case study, The 1st ACM International Conference on AI Foundation Models and Software Engineering (FORGE), 2024
- Language models for code completion: A practical evaluation, IEEE/ACM 46th International Conference on Software Engineering (ICSE), 2024
- Enriching Source Code with Contextual Data for Code Completion Models: An Empirical Study, IEEE/ACM 20th International Conference on Mining Software Repositories (MSR), 2023

Former BSc/MSc student
AutoCompletion via LLMs
Next: CTO @ Teifi Digital
- Investigating the performance of language models for completing code in functional programming languages: a haskell case study, The 1st ACM International Conference on AI Foundation Models and Software Engineering (FORGE), 2024

Former MSc student
AI4SE Benchmarking
Next: SE @ Booking.com
- Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol, under review, 2025
- Investigating the performance of language models for completing code in functional programming languages: a haskell case study, The 1st ACM International Conference on AI Foundation Models and Software Engineering (FORGE), 2024

Former MSc student
AI Assistants Impact
Next: SE @ Booking.com
- Beyond Acceptance Rates: The Impact of JetBrains AI Assistant and FLCC, TU Delft - MSc Thesis, 2024
Thesis Supervision
I have (co)-supervised 41 Msc/Bsc, and many of my studnets have graduated cum laude (top 5% of class).
Msc Level
# | Degree | Year | University | Student | Title |
---|---|---|---|---|---|
11 | MSc | 2025 | TUDelft | R. Popescu | Dataset Development for LLMs4Code: Licensing, Contamination, and Reproducibility Challenges |
10 | MSc | 2024 | TUDelft | A.C. Ionescu | Meet Your Onboarding Buddy: A Smart, Adaptive, and Conversational LLM Assistant |
9 | MSc | 2024 | TUDelft | R. Schrijver | Beyond Acceptance Rates: The Impact of JetBrains AI Assistant and FLCC |
8 | MSc | 2024 | TUDelft | T. van Dam | Black-box Context-Aware Code Completion |
7 | MSc | 2024 | TUDelft | P. de Bekker | AI for Software Engineering: Reviewing and Improving Benchmarking Practices |
6 | MSc | 2024 | TUDelft | F. van der Heijden | Interactive & Adaptive LLMs: Building and Evaluating an LLM-based Code Completion Plugin |
5 | MSc | 2024 | U. of Milano-Bicocca | F. Salerno | Extracting Training Data from Fine-tuned Large Language Models |
4 | MSc | 2022 | TUDelft | A. Al-Kaswan | Limits of Binary Code Summarization with Transformers |
3 | MSc | 2021 | Sharif | M. Nejati | Missing Software Tag Recommendation |
2 | MSc | 2021 | Sharif | P. rostami | Issue Commit Linking |
1 | MSc | 2020 | Sharif | K. Akbari | Isure Report Classificatio |
BSc Level
Counter | Degree | Year | University | Student | Title |
---|---|---|---|---|---|
30 | BSc | 2024 | TUDelft | B. Koc | Implications of LLMs4Code on Copyright Infringement |
29 | BSc | 2024 | TUDelft | P. Deatc | Red Teaming LLMs for Dangerous and Unfair Software Applications |
28 | BSc | 2024 | TUDelft | C. Ionescu | Red-Teaming Code LLMs for Malware Generation |
27 | BSc | 2024 | TUDelft | F. Ignijic | Evaluating Adaptive Activation Functions in Language Models |
26 | BSc | 2024 | TUDelft | Y. Wu | Sparse Transformers are (in)Efficient Learners |
25 | BSc | 2024 | TUDelft | R. Mota Borges | Tokenization Matters: Training Your Tokenizer Right |
24 | BSc | 2024 | TUDelft | P. Loizides | LLM of Babel: Evaluation of LLMs on Code (Greek Focus) |
23 | BSc | 2024 | TUDelft | G. Panchu | LLM of Babel: Java Code Summarization in Dutch |
22 | BSc | 2024 | TUDelft | M. Ziemlewski | LLM of Babel: Code Summarization in Polish |
21 | BSc | 2024 | TUDelft | S. Vermeulen | Evaluating CodeGemma-7B for Dutch Code Comment Generation |
20 | BSc | 2024 | TUDelft | Y. Huang | LLM of Babel: Broader Multilingual Evaluation |
19 | BSc | 2024 | TUDelft | I. Vasiliauskas | Detecting Weaknesses in LLM Generated Code |
18 | BSc | 2024 | TUDelft | I. Moruz | How Can LLMs Harm Privacy? Red-Teaming Exploration |
17 | BSc | 2024 | TUDelft | K. Gulamov | Speed/Quality Trade-offs in Attention Mechanisms |
16 | BSc | 2023 | TUDelft | D. Sochirca | Compressing Code Generation Language Models on CPUs |
15 | BSc | 2023 | TUDelft | M. Keeler | Cross-Lingual Evaluation of CodeGen in Code Completion |
14 | BSc | 2023 | TUDelft | E. Malmsten | Distil-CodeGPT: Distilling Code-Generation Models |
13 | BSc | 2023 | TUDelft | M. Storti | Efficient Transformer Quantization for CodeGPT |
12 | BSc | 2023 | TUDelft | H. Kuo | Cross-Lingual Performance of CodeGPT in Completion Tasks |
11 | BSc | 2023 | TUDelft | E. Malmsten | Distil-CodeGPT: Distilling Code-Generation Models |
10 | BSc | 2023 | TUDelft | A. de Moor | Compressing CodeGPT via Layer Reduction and Quantisation |
9 | BSc | 2023 | TUDelft | R. Popescu | Common Code Structures Impact on CodeParrot Completion |
8 | BSc | 2022 | TUDelft | T. van Dam | Performance Analysis of UniXcoder |
7 | BSc | 2022 | TUDelft | F. van der Heijden | Analysis of InCoder on Statement Prediction |
6 | BSc | 2022 | TUDelft | M. Turk | Improving Source Code Conversion for Code Completion |
5 | BSc | 2022 | TUDelft | J. de Weerdt | User Evaluation of UniXcoder with Statement Completion |
4 | BSc | 2022 | TUDelft | M. Otten | User Evaluation of InCoder with Statement Completion |
3 | BSc | 2022 | TUDelft | A.C. Ionescu | Repository Recommender System Using Tag Hierarchies |
2 | BSc | 2022 | TUDelft | C. Botocan | Duplicate Stack Overflow Detection Using Tags and Text |
1 | BSc | 2022 | TUDelft | A. van der Rande | Improving GitHub Tag Recommenders Using Tag Hierarchies |