Badr Tajini, a former student of the Applied MSc in Data Science and AI did during his 6-month internship at EDF. Today he is a PhD Student in Deep learning and Digital Security (First Year) at EURECOM Engineering School.
The main objective of my end-of-studies project is to be at the heart of R&D innovation by participating in several transverse uses cases that deal with the topics of Machine Learning, Deep Learning, Big Data Architecture, Embedded System or even Business Applications. During the internship, I also studied their state-of-the-art, their technology as well as their application in a test/pre-production environment.
In the first part of my internship, we saw that the choice of distribution was a real challenge because it guarantees the durability of the platform. We chose Hortonworks distribution because it is the only one entirely free and open source with documentation and a support of quality. At Hortonworks, as elsewhere, there is a multitude of services. You need to understand their roles and how they work in Hadoop to ensure a stable system. Governance and data security are often overlooked by Hadoop. In the R&D department at EDF, we used Kerberos to ensure security but it entails a significant complexity.
Finally, we saw how we can create a deep learning environment from scratch with Nvidia 1080 GTX graphics card. We also saw the usefulness of a local docker registry that allows us to have an agile and secure ecosystem to experiment different approaches, technologies and algorithms in different projects.
In the second part of my internship, we have seen how the problem of lack of data in the current state of EDF has been solved by developing a unique and personalized data generator to increase our data for training a neural network. Thus, we decided to develop and industrialize a custom docker image for our data generator. The latter has been specifically developed for these types of problems using electric data. The output data of this generator will be used to train our neural network.
Our choice of neural networks was focused on Convolutional Neural Network and Stacked Denoising Autoencoders. These two proposals have been the most appropriate for the problem of “Energy Disaggregation” where the aggregate energy consumption signal is decomposed into individual devices.
We have also proposed a perspective to develop in the near future on how to industrialize an architecture that will allow us to embed our neural network in a Raspberry Pi in order to detect and predict the energy consumption of a household, billing real-time consumption, a predictive maintenance of home appliances and the development of a smartphone application that will provide a data visualization of consumption results.
In the third part of my internship, we have been able to build a machine learning model to do predictive maintenance but the most important part was how we can tune in our hyper-parameters by using different algorithms such as Grid Search, Random Search, Dask SearchCV, Bayesian Optimization distributed and parallel environment.
We used Dask for this problem. Dask support TensorFlow in a few ways: We found it convenient that Dask and TensorFlow could play nicely with each other. Dask supported TensorFlow without getting in the way. We also saw how one can couple the use of Dask-MPI and HPC with random datasets.
In the last part of my internship, we propose a proof of concept on meteorological data. The main objective was to use a recurrent neural network (Long Short-term memory or LSTM) to predict the weather. The data source is a data stream that is retrieved in real time from sensors. We also used Spark to manage the data stream of the sensors.