cv
Below is the condensed version of my CV. For the full CV please click on the Download PDF button.
Basics
| Name | Varun Sampath Kumar |
| Label | PhD Student |
| vaku@mmmi.sdu.dk | |
| Url | https://Lordvarun23.github.io |
| Summary | PhD researcher in Responsible AI, specializing in efficient machine unlearning algorithms for both vision (discriminative and generative) models and large language models (LLMs). |
Work
-
2024.12 - 2025.06 Software Engineering Intern
KLA
Worked on large-scale data management, query optimization, and data ingestion pipelines.
- Implemented a rule-based maintenance plan for Apache Iceberg tables using PySpark and Nessie GC, enabling efficient data cleanup and optimization.
- Designed and deployed a FastAPI-based dashboard to visualize real-time Iceberg table metrics and trigger maintenance operations via PyIceberg and PySpark.
- Optimized an existing clustering algorithm, achieving a 3–4× execution time improvement using OpenMP, and explored distributed processing with Ray.
- Generated and ingested 520 billion records to benchmark Trino and DuckDB performance using Ibis, evaluating large-scale query efficiency.
- Contributed to the DuckDB Iceberg extension, resolving issues with DayTransform partition handling to enhance query latency.
-
2024.05 - 2024.07 Software Engineering Intern
KLA
Worked on database performance optimization, memory management in WPF applications, and LLM-based system log analysis.
- Optimized large-scale data processing by profiling code, rewriting SQL queries, and replacing legacy DB connectors (SourcePro → Pro*C).
- Resolved critical WPF memory issues by diagnosing and fixing Out-of-Memory exceptions, improving application stability.
- Fine-tuned a large language model (LLM) to analyze system metrics and application logs during a hackathon, enabling automated error diagnosis and resolution suggestions.
-
2023.06 - 2023.11 Software Engineering Intern
KLA
Worked on self-supervised representation learning and model optimization for wafer defect classification pipelines.
- Developed an image clustering pipeline leveraging self-supervised representation learning techniques.
- Improved the wafer defect classification pipeline performance by 1.25× through pretraining-based feature enhancement.
- Accelerated the inference pipeline by 2.5× using model quantization, operator fusion, and pruning techniques.
- Prototyped a gRPC-based image streaming system that reduced latency by 4× compared to SFTP.
Education
-
2025.08 - 2028.07 Odense, Denmark
-
2020.08 - 2025.06 Tamil Nadu, India
Skills
| Programming & Tools | |
| Python | |
| C++ | |
| SQL | |
| PyTorch | |
| TensorFlow | |
| Keras | |
| Scikit-Learn | |
| Numpy | |
| Pandas | |
| SpaCy | |
| NLTK | |
| DuckDB | |
| PySpark | |
| Docker | |
| Flask | |
| Postman | |
| Git |