AISE lab
Meet our team
AISE research explores several topics at the intersection of AI and Software Engineering, including:
Intelligent systems for software creation with large language models: Exploring how generative technologies can support and transform key aspects of software development workflows across diverse contexts and coding tasks inclduing code generation, summarization, refactoring, and bug fixing.
Trust, Transparency, and Model Behavior: Investigating challenges around explainability, hallucination, memroization, and aligned behavior in learning-based systems as they integrate into developer-facing tools.
Evaluating Generative Capabilities in Practice: Developing strategies to assess evolving generative systems across tasks, time, and toolchains, with a focus on actionable insights and practical relevance.
Hybrid Intelligence in Development Environments: Designing future collaboration paradigms between human developers and adaptive, assistive systems—ranging from conversational interfaces to agentic behavior.
Learning from software histories at scale: Using large-scale project data to uncover patterns, enable automation, and shape more intelligent development support systems including issue report management, triage, and automated documentation.
PhD Students
- Code Red! On the Harmfulness of Applying Off-the-shelf Large Language Models to Programming Tasks, ACM International Conference on the Foundations of Software Engineering (FSE), main track, 2025
- How Much Do Code Language Models Remember? An Investigation on Data Extraction Attacks before and after Fine-tuning, IEEE/ACM 22th International Conference on Mining Software Repositories (MSR), 2025
- Traces of Memorisation in Large Language Models for Code, IEEE/ACM 46th International Conference on Software Engineering (ICSE), 2024
- Towards Safe, Secure, and Usable LLMs4Code, IEEE/ACM 46th International Conference on Software Engineering (ICSE), Doctoral Symposium, 2024
- Extending Source Code Pre-trained Language Models to Summarise Decompiled Binaries, IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 2023
- Stacc: Code comment Classification using SentenceTransformers, IEEE/ACM 2nd International Workshop on Natural Language-Based Software Engineering (NLBSE), 2023
- Targeted Attack on GPT-Neo for the SATML Language Model Data Extraction Challenge, The 1st IEEE Conference on Secure and Trustworthy Machine Learning, 2023
- The (ab) Use of Open Source Code to Train Large Language Models, IEEE/ACM 2nd International Workshop on Natural Language-Based Software Engineering (NLBSE), 2023
- A Qualitative Investigation into LLM-Generated Multilingual Code Comments and Automatic Evaluation Metrics, The International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE), 2025
- The Heap: A Contamination-Free Multilingual Code Dataset for Evaluating Large Language Models, The 2nd ACM International Conference on AI Foundation Models and Software Engineering (FORGE), 2025
- An Exploratory Investigation into Code License Infringements in Large Language Model Training Datasets, The 2nd ACM International Conference on AI Foundation Models and Software Engineering (FORGE), 2024
- Language models for code completion: A practical evaluation, IEEE/ACM 46th International Conference on Software Engineering (ICSE), 2024
- Programming Language Models in Multilingual Settings, IEEE/ACM 46th International Conference on Software Engineering (ICSE), Doctoral Symposium, 2024
- On the Impact of Language Selection for Training and Evaluating Programming Language Models, IEEE 23rd International Working Conference on Source Code Analysis and Manipulation (SCAM), 2023
- Long Code Arena: a Set of Benchmarks for Long-context Code Models, under review, 2025
- Human-AI Experience in Integrated Development Environments: A Systematic Literature Review, under review, 2025
- The Design Space of in-IDE Human-AI Experience, under review, 2024
- In-ide Human-AI Experience in the Era of Large Language Models; a Literature Review, The 1st Workshop on Integrated Development Environments (IDE), 2024
- Automating the Detection of Code Vulnerabilities by Analyzing GitHub Issues, The 2nd International Workshop on LLM4Code, co-located with ICSE, 2025
- Enhancing Large Language Model Integration in Integrated Development Environments, ACM International Conference on the Foundations of Software Engineering (FSE), Doctoral Symposium, 2025
- Enhancing Human-IDE Interaction in the SDLC using LLM-based Mediator Agents, The 1st International Workshop on AI-Augmented SDLC, co-located with (FSE), 2025
- Mediating between Human Programmers and Integrated Development Environments using LLM-based Agents, ACM International Conference on the Foundations of Software Engineering (FSE), Doctoral Symposium, 2025
- The Heap: A Contamination-Free Multilingual Code Dataset for Evaluating Large Language Models, The 2nd ACM International Conference on AI Foundation Models and Software Engineering (FORGE), 2025
- An Exploratory Investigation into Code License Infringements in Large Language Model Training Datasets, The 2nd ACM International Conference on AI Foundation Models and Software Engineering (FORGE), 2024
- Language models for code completion: A practical evaluation, IEEE/ACM 46th International Conference on Software Engineering (ICSE), 2024
Research Assistants/Honour Students
- Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol, under review, 2025
- Leveraging large language models for enhancing the understandability of generated unit tests, IEEE/ACM 47th International Conference on Software Engineering (ICSE), main track, 2025
- HyperSeq: A Hyper-Adaptive Representation for Predictive Sequencing of States, ACM International Conference on the Foundations of Software Engineering (FSE), 2025
- Rethinking IDE Customization for Enhanced HAX: A Hyperdimensional Perspective, The 2nd Workshop on Integrated Development Environments (IDE), 2025
- A Qualitative Investigation into LLM-Generated Multilingual Code Comments and Automatic Evaluation Metrics, The International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE), 2025
MSc Students

Nadine Kuo
Joint w/ JetBrains & CMU

Yash Mudhra
Joint w/ ASML

Venelina Pocheva
Joint w/ NXP
Visitors
- How Much Do Code Language Models Remember? An Investigation on Data Extraction Attacks before and after Fine-tuning, IEEE/ACM 22th International Conference on Mining Software Repositories (MSR), 2025
Alumni
- A Multi-agent Onboarding Assistant based on Large Language Models, Retrieval Augmented Generation, and Chain-of-Thought, ACM International Conference on the Foundations of Software Engineering (FSE), 2025

Scientific Dev (Sep'23)
Former BSc student
Smart Trigger Models
Next: ML engineer @ JetBrains
- A Transformer-based Approach for Smart Invocation of Automatic Code Completion, The 1st ACM International Conference on AI-powered Software (AIware), 2024
- Investigating the performance of language models for completing code in functional programming languages: a haskell case study, The 1st ACM International Conference on AI Foundation Models and Software Engineering (FORGE), 2024
- Language models for code completion: A practical evaluation, IEEE/ACM 46th International Conference on Software Engineering (ICSE), 2024
- Enriching Source Code with Contextual Data for Code Completion Models: An Empirical Study, IEEE/ACM 20th International Conference on Mining Software Repositories (MSR), 2023
- Investigating the performance of language models for completing code in functional programming languages: a haskell case study, The 1st ACM International Conference on AI Foundation Models and Software Engineering (FORGE), 2024
- Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol, under review, 2025
- Investigating the performance of language models for completing code in functional programming languages: a haskell case study, The 1st ACM International Conference on AI Foundation Models and Software Engineering (FORGE), 2024
- Beyond Acceptance Rates: The Impact of JetBrains AI Assistant and FLCC, TU Delft - MSc Thesis, 2024
Thesis Supervision
I have (co)-supervised 41 Msc/Bsc, and many of my studnets have graduated cum laude (top 5% of class).
Msc Level
# | Year | University | Student | Title |
---|---|---|---|---|
11 | 2025 | TUDelft | R. Popescu | Dataset Development for LLMs4Code: Licensing, Contamination, and Reproducibility Challenges |
10 | 2024 | TUDelft | A.C. Ionescu | Meet Your Onboarding Buddy: A Smart, Adaptive, and Conversational LLM Assistant |
9 | 2024 | TUDelft | R. Schrijver | Beyond Acceptance Rates: The Impact of JetBrains AI Assistant and FLCC |
8 | 2024 | TUDelft | T. van Dam | Black-box Context-Aware Code Completion |
7 | 2024 | TUDelft | P. de Bekker | AI for Software Engineering: Reviewing and Improving Benchmarking Practices |
6 | 2024 | TUDelft | F. van der Heijden | Interactive & Adaptive LLMs: Building and Evaluating an LLM-based Code Completion Plugin |
5 | 2024 | UMB | F. Salerno | Extracting Training Data from Fine-tuned Large Language Models |
4 | 2022 | TUDelft | A. Al-Kaswan | Limits of Binary Code Summarization with Transformers |
3 | 2021 | Sharif | M. Nejati | Missing Software Tag Recommendation |
2 | 2021 | Sharif | P. rostami | Issue Commit Linking |
1 | 2020 | Sharif | K. Akbari | Isure Report Classificatio |
BSc Level
Counter | Year | University | Student | Title |
---|---|---|---|---|
30 | 2024 | TUDelft | B. Koc | Implications of LLMs4Code on Copyright Infringement |
29 | 2024 | TUDelft | P. Deatc | Red Teaming LLMs for Dangerous and Unfair Software Applications |
28 | 2024 | TUDelft | C. Ionescu | Red-Teaming Code LLMs for Malware Generation |
27 | 2024 | TUDelft | F. Ignijic | Evaluating Adaptive Activation Functions in Language Models |
26 | 2024 | TUDelft | Y. Wu | Sparse Transformers are (in)Efficient Learners |
25 | 2024 | TUDelft | R. Mota Borges | Tokenization Matters: Training Your Tokenizer Right |
24 | 2024 | TUDelft | P. Loizides | LLM of Babel: Evaluation of LLMs on Code (Greek Focus) |
23 | 2024 | TUDelft | G. Panchu | LLM of Babel: Java Code Summarization in Dutch |
22 | 2024 | TUDelft | M. Ziemlewski | LLM of Babel: Code Summarization in Polish |
21 | 2024 | TUDelft | S. Vermeulen | Evaluating CodeGemma-7B for Dutch Code Comment Generation |
20 | 2024 | TUDelft | Y. Huang | LLM of Babel: Broader Multilingual Evaluation |
19 | 2024 | TUDelft | I. Vasiliauskas | Detecting Weaknesses in LLM Generated Code |
18 | 2024 | TUDelft | I. Moruz | How Can LLMs Harm Privacy? Red-Teaming Exploration |
17 | 2024 | TUDelft | K. Gulamov | Speed/Quality Trade-offs in Attention Mechanisms |
16 | 2023 | TUDelft | D. Sochirca | Compressing Code Generation Language Models on CPUs |
15 | 2023 | TUDelft | M. Keeler | Cross-Lingual Evaluation of CodeGen in Code Completion |
14 | 2023 | TUDelft | E. Malmsten | Distil-CodeGPT: Distilling Code-Generation Models |
13 | 2023 | TUDelft | M. Storti | Efficient Transformer Quantization for CodeGPT |
12 | 2023 | TUDelft | H. Kuo | Cross-Lingual Performance of CodeGPT in Completion Tasks |
11 | 2023 | TUDelft | E. Malmsten | Distil-CodeGPT: Distilling Code-Generation Models |
10 | 2023 | TUDelft | A. de Moor | Compressing CodeGPT via Layer Reduction and Quantisation |
9 | 2023 | TUDelft | R. Popescu | Common Code Structures Impact on CodeParrot Completion |
8 | 2022 | TUDelft | T. van Dam | Performance Analysis of UniXcoder |
7 | 2022 | TUDelft | F. van der Heijden | Analysis of InCoder on Statement Prediction |
6 | 2022 | TUDelft | M. Turk | Improving Source Code Conversion for Code Completion |
5 | 2022 | TUDelft | J. de Weerdt | User Evaluation of UniXcoder with Statement Completion |
4 | 2022 | TUDelft | M. Otten | User Evaluation of InCoder with Statement Completion |
3 | 2022 | TUDelft | A.C. Ionescu | Repository Recommender System Using Tag Hierarchies |
2 | 2022 | TUDelft | C. Botocan | Duplicate Stack Overflow Detection Using Tags and Text |
1 | 2022 | TUDelft | A. van der Rande | Improving GitHub Tag Recommenders Using Tag Hierarchies |