Talks
Introduction to SQLAlchemy
18-05-2023
During this talk, I discuss ORMs in python and consequently go into SQLAlchemy as a framework. I compare it against the main alternatives for SQL Server, which is PyODBC. I show the advantages of both as well as potential pitfalls and drawbacks. The conclusion of the talk is: in most more complex or larger projects, SQLAlchemy should have a clear preference over alternatives in Python.
Diving into NLP Tokenizers - DSFC 2022
April 2022
A deep-dive into NLP tokenizers - their differences, similarities and advantages.
This presentation was prepared for my talk at DSFC 2022. DSFC is a data science conference hosted by the largest 5 banks in the Netherlands.
The total presentation consists of two sets of slides. One non-interactive and one interactive. The non-interactive one you can see the easiest by clicking preview in github. The interactive slides are made in jupyter notebook with RISE, which converts a notebook into slides. You can view the notebook regardless, but if you want them as slides you will need the RISE plugin.
Practical Data Science - tips, tricks and pitfalls
April 2022
This talk, given at an ABN AMRO hackathon to prepare for a kaggle competition, was aimed at giving some practical tips to data scientists. The tips are aimed at people fairly new to the field but also those medior and hopefully even some seniors can learn something from it. The goal of these tips is to clarify some unexpected behaviour when working with data and models, as well as put a spotlight on some common pitfalls.
It covers six main topics:
- Business tips
- Short tip on models
- Ordinal/Nominal data encodings
- Feature Importance with Trees
- Class Imbalance
- Order of pre-processing
Some highlights:
Effects of imbalanced data sampling techniques with lower amounts of data
Behaviour of KNN with different encodings
Code first introduction to Machine Learning
July 2020
Haven been an avid participant in Fast.ai, for most things Iām learning or teaching I try to take a code-first approach. In this talk, I presented some of the basics of machine learning to python developers.
The talk covers machine-learning basics such as:
- Training a simple model
- Influence of hyper parameters on perfromance.
- Under- and overfitting
- Plotting decision trees
- Types of machine learning
- Feature Engineering
FakeFynder: Deepfake detection for the masses
August 2019
In our submission for the Hackathon for Good, we created a working POC which is a website where people can easily paste youtube links or upload videos to have them be checked for manipulated sections. The deepfake detection is done using the model from FaceForensics++, which has around 80% accuracy on a combination of compressed videos, but achieves around 99% accuracy on a single type of compression.
The POC also allows for easy checking whether a video has been seen before with a database of video hashes which can be searched.
On the Generation and Evaluation of Synthetic Tabular Data using GANs - Master Thesis
September 2019
In my thesis, I researched improvements that we can make to Generative Adversarial Networks (GANs), to apply them better to tabular data. Contrary to GANs for vision tasks, GANs for tabular data were/are still very early work with only some researchers advancing the field. Apart from two improvements to the GAN architecture, I also wrote an open source library that focuses on how to evaluate synthetic data. The presentation also goes into uses-cases and value created for companies, specifically ABN AMRO. You can find the github repos which includes the thesis PDF at the link above.