Master Thesis Student / Software Developer
SAP Innovation Center Potsdam
مجموع سنوات الخبرة :9 years, 2 أشهر
Written a Master thesis with focus on employing ensemble machine learning techniques using the SAP HANA database for analyzing large biological datasets. The first part of the thesis was to set up a fully working proteomics analysis pipeline to preprocess mass spectrometry data of cancer patient and healthy control groups. This was done by porting previously implemented C++ algorithms linked against the HANA database to the SAP Application Function Library (AFL) and setting up the pipeline via SQLScript Procedures. For the second part of the thesis, ensemble machine learning algorithms (namely Bagging and AdaBoost) were implemented using SAP SQLScript procedures employing the SAP Predictive Analysis Library. Finally, the two parts were brought together by feeding the results from the proteomics pipeline into our machine learning procedures and performing statistical analysis on the results.
Responsible for extending SAP HANA database with domain specific functionalities to analyze mass spectrometry (MS) data.
Skills acquired:
- C++ database extensions for
-> morphological filters, i.e. erosion/dilation
-> feature detection
-> feature selection
-> clustering
- SQLScript procedures to call C++ implementations
Analyzed the applicability of the SAP In-Memory Database HANA for efficient storage and retrieval of large biological datasets with a focus on mass spectrometry (MS) data.
Skills acquired and responsibilities:
- designing the database data layout
- implementing procedures for uploading and accessing data in SQL/SQLScript
- implementing filtering functionalities using the SAP database language L.
Software Developer of parts of the backend and graphical user interface of a Java-based software framework for the analysis of biological mass data on cloud and/or grid systems.
Skills acquired:
- Agile software development
- Subversion
- Java
- Maven
- Eclipse-Plugins
- OpenGL / JOGL
Evaluated different Java frameworks for flexible data storage and access via Java Content Repositories for a prototypical platform for the development of algorithms that allow for effective data storage and analysis of large datasets. Implemented PoC platforms
Skills acquired / frameworks used:
- Spring
- Apache Jackrabbit
- JBoss Modeshape
- Subversion
- Maven
- Eclipse SDK
Responsible for the implementation of a Java-based educational multi-player online game.
Skills acquired / frameworks used:
- Apache Maven
- PulpCore Java API
- Sun Labs' Project Darkstar
Investigated oxygen transport in proteins using Molecular Dynamics (MD) Simulations.
Skills Acquired:
- Visualization of MD Simulations (using VMD - Visual Molecular Dynamics)
- TCL scripting for display and video generation
- MD Simulations (using NAMD - Scalable Molecular Dynamics)
- Usage of Supercomputers (HLRN)
- Cluster Computing
- C/C++ Programming
- Bash-Scripting