Projects
In the last two years, I was investigated in a broad range of topics within the field of AI.
Building on my PhD work in Active Learning of object images in human-robot teacher scenarios, I delved deeper into explainable Object Detection, where I focused on visualizing the intermediate processing, receptive fields, and trained features of single-stage object detection approaches within the YOLO family.
With the rise of Large Language Models (LLMs) and advancements like ChatGPT, the advantages of these models for Natural Language Processing and also other modalities have become evident. This significant shift led me to focus my research on Transformer methods, the foundation of these LLMs.
In the domain of Vision-Language Models (VLMs), I found that a combination of two text-promptable VLMs can be leveraged for a highly flexible perception framework on drones. Specifically I applied YOLO World for detecting people and GPT-4V for classifying their actions. I experimented with various configurations and prompting techniques to make the knowledge of these models accessible. Currently, I am continuing my work on this by investigating the trustworthiness of GPT-4V-provided explanations and their reflection on classification accuracy.
Beyond vision, I ventured into generative AI for audio, where I train a latent 2D audio map by a Variational Autoencoder which is then utilized for conditioning a GPT-2 model. By selecting points on the map that compresses the manifold of the audio training set into 2D, a more natural conditioning of the Transformer model can be achieved.
Please navigate to the Publications sub-page to find the mentioned work.
Other Projects
ChordClash
Combining my interests of programming, signal processing and music producing, I am currently implementing a toolbox for musicians. This is a long-term project, that I’m gonna update continuously.
ALeFra – Active Learning Framework
Convert any classifier (online and offline) in an active trainable classifier. The classifier can be trained easily and the training progress can be visualized in different ways to better understand what is going on. Please check the Github page for full example usages and interactive result plots.
OpenSCAD Polygon Editor
I created this Web-App for creating and editing 2D polygons, which can be exported to and imported from OpenSCAD code. There is a variety of tools available to manipulate polygons with e.g. subdividing them or drawing them by using images as masks.
A2VQ Visualization Label Interface (demonstration video)
One sub-goal in my PhD was it to enable an intuitive and effective cooperation between human and robot. In an object teaching setting, this relates to a powerful labeling interface. We found out, that current dimension reduction techniques like t-SNE or UMAP can be utilized for creating high-quality 2D embeddings of image samples which can be used for visualizing thumbnails to the user for labeling. I implemented a flexible web-based frontend with the d3 library for comfortably label image samples and also for animating the proposed querying method from the paper.