cv | Varun Sampath Kumar

Basics

Name	Varun Sampath Kumar
Label	PhD Student
Email	vaku@mmmi.sdu.dk
Url	https://Lordvarun23.github.io
Summary	PhD researcher in Responsible AI, specializing in efficient machine unlearning algorithms for both vision (discriminative and generative) models and large language models (LLMs).

Work

2024.12 - 2025.06
Software Engineering Intern

KLA

Worked on large-scale data management, query optimization, and data ingestion pipelines.
- Implemented a rule-based maintenance plan for Apache Iceberg tables using PySpark and Nessie GC, enabling efficient data cleanup and optimization.
- Designed and deployed a FastAPI-based dashboard to visualize real-time Iceberg table metrics and trigger maintenance operations via PyIceberg and PySpark.
- Optimized an existing clustering algorithm, achieving a 3–4× execution time improvement using OpenMP, and explored distributed processing with Ray.
- Generated and ingested 520 billion records to benchmark Trino and DuckDB performance using Ibis, evaluating large-scale query efficiency.
- Contributed to the DuckDB Iceberg extension, resolving issues with DayTransform partition handling to enhance query latency.
2024.05 - 2024.07
Software Engineering Intern

KLA

Worked on database performance optimization, memory management in WPF applications, and LLM-based system log analysis.
- Optimized large-scale data processing by profiling code, rewriting SQL queries, and replacing legacy DB connectors (SourcePro → Pro*C).
- Resolved critical WPF memory issues by diagnosing and fixing Out-of-Memory exceptions, improving application stability.
- Fine-tuned a large language model (LLM) to analyze system metrics and application logs during a hackathon, enabling automated error diagnosis and resolution suggestions.
2023.06 - 2023.11
Software Engineering Intern

KLA

Worked on self-supervised representation learning and model optimization for wafer defect classification pipelines.
- Developed an image clustering pipeline leveraging self-supervised representation learning techniques.
- Improved the wafer defect classification pipeline performance by 1.25× through pretraining-based feature enhancement.
- Accelerated the inference pipeline by 2.5× using model quantization, operator fusion, and pruning techniques.
- Prototyped a gRPC-based image streaming system that reduced latency by 4× compared to SFTP.

Education

2025.08 - 2028.07

Odense, Denmark
PhD

University Of Southern Denmark

Responsible AI
2020.08 - 2025.06

Tamil Nadu, India
Integrated MSc

PSG College Of Technology

Data Science

Skills

	Programming & Tools
	Python
	C++
	SQL
	PyTorch
	TensorFlow
	Keras
	Scikit-Learn
	Numpy
	Pandas
	SpaCy
	NLTK
	DuckDB
	PySpark
	Docker
	Flask
	Postman
	Git

Basics

Work

Software Engineering Intern

KLA

Worked on large-scale data management, query optimization, and data ingestion pipelines.

Software Engineering Intern

KLA

Worked on database performance optimization, memory management in WPF applications, and LLM-based system log analysis.

Software Engineering Intern

KLA

Worked on self-supervised representation learning and model optimization for wafer defect classification pipelines.

Education

PhD

University Of Southern Denmark

Responsible AI

Integrated MSc

PSG College Of Technology

Data Science

Skills