cv

Below is the condensed version of my CV. For the full CV please click on the Download PDF button.

Basics

Name Varun Sampath Kumar
Label PhD Student
Email vaku@mmmi.sdu.dk
Url https://Lordvarun23.github.io
Summary PhD researcher in Responsible AI, specializing in efficient machine unlearning algorithms for both vision (discriminative and generative) models and large language models (LLMs).

Work

  • 2024.12 - 2025.06
    Software Engineering Intern
    KLA
    Worked on large-scale data management, query optimization, and data ingestion pipelines.
    • Implemented a rule-based maintenance plan for Apache Iceberg tables using PySpark and Nessie GC, enabling efficient data cleanup and optimization.
    • Designed and deployed a FastAPI-based dashboard to visualize real-time Iceberg table metrics and trigger maintenance operations via PyIceberg and PySpark.
    • Optimized an existing clustering algorithm, achieving a 3–4× execution time improvement using OpenMP, and explored distributed processing with Ray.
    • Generated and ingested 520 billion records to benchmark Trino and DuckDB performance using Ibis, evaluating large-scale query efficiency.
    • Contributed to the DuckDB Iceberg extension, resolving issues with DayTransform partition handling to enhance query latency.
  • 2024.05 - 2024.07
    Software Engineering Intern
    KLA
    Worked on database performance optimization, memory management in WPF applications, and LLM-based system log analysis.
    • Optimized large-scale data processing by profiling code, rewriting SQL queries, and replacing legacy DB connectors (SourcePro → Pro*C).
    • Resolved critical WPF memory issues by diagnosing and fixing Out-of-Memory exceptions, improving application stability.
    • Fine-tuned a large language model (LLM) to analyze system metrics and application logs during a hackathon, enabling automated error diagnosis and resolution suggestions.
  • 2023.06 - 2023.11
    Software Engineering Intern
    KLA
    Worked on self-supervised representation learning and model optimization for wafer defect classification pipelines.
    • Developed an image clustering pipeline leveraging self-supervised representation learning techniques.
    • Improved the wafer defect classification pipeline performance by 1.25× through pretraining-based feature enhancement.
    • Accelerated the inference pipeline by 2.5× using model quantization, operator fusion, and pruning techniques.
    • Prototyped a gRPC-based image streaming system that reduced latency by 4× compared to SFTP.

Education

  • 2025.08 - 2028.07

    Odense, Denmark

    PhD
    University Of Southern Denmark
    Responsible AI
  • 2020.08 - 2025.06

    Tamil Nadu, India

    Integrated MSc
    PSG College Of Technology
    Data Science

Skills

Programming & Tools
Python
C++
SQL
PyTorch
TensorFlow
Keras
Scikit-Learn
Numpy
Pandas
SpaCy
NLTK
DuckDB
PySpark
Docker
Flask
Postman
Git