Article
26 November 2020

Explore deeper your data using hosted python notebooks

How to easily explore data?

If you’re looking for a tool to load, manipulate and explore your data, there’s a good chance you already encountered Notebook whether you’re a data scientist, data engineer or an expert in your field with basic coding skills.

Notebook is an open-source web environment which allows you to write and execute code in several languages (python, julia, r…), display results as interactive visualizations and write explanations using markdown markup. A notebook can be easily exported in pdf or html and sent to collaborators over email but a notebook server is also available which makes the collaboration simpler as several users can work on the same shared notebook.

This tool’s adoption has been constantly increasing for the past few years: universities now often make their students use this tool in computer science courses and it has become a basic feature in many IT platforms. Many engineers are starting to use this tool by installing it locally on their laptop, see its potential and would like to see it more widely used in their organizations.

Meanwhile, we know that our platform is great if you want to model the problem based on complex event processing rules, if you need to search a precise moment in your data or if you want to explore your data by drilling down on a dashboard. We also know it can be improved when it comes to building a fleet model, using machine learning or processing in memory a large dataset.

Notebook server available for everyone

Based on those observations, InUse now proposes in its latest version, natively and without extra costs, a hosted notebook server for all the users which have access to the Studio (ie: https://studio.{yourcode}.productinuse.com). It is accessible from the left menu in the Explore section.

How does it work?

The notebook server is like a virtual machine on which you’ll be able to upload files or start a python notebook like in the animated image below. 

All the files will be accessible from any other members of your instance with access to the Studio which might be very convenient if you want to share data with colleagues. The notebook you will create will also be automatically accessible from others. This means they can read but also execute the code written inside which allows you to work collaboratively on the same data analysis.

Leveraging the python data stack

Among other languages, the notebook supports Python and all the data science libraries which come with it. A python notebook is a very handy option if you have to:

  • Do some data cleansing on a dataset.
  • Do some data exploration on a dataset.
  • Build and explain a predictive (statistical/machine learning/deep learning) model.

By default, a python 3.8 environment is installed with the following libraries: ipywidgets, pandas, matplotlib, numpy, scipy, sklearn, scipy, seaborn and statsmodels. This setup should be a good starting point for most of your analysis. Other libraries could be easily added in the future if needed.

Fully integrated in our solution

This feature is fully integrated in our solution. This is why we’ve added our own python client library to interact with our API, this will help you to :

  • Fetch acquired data.
  • Automate administration of organizations & assets.
  • Prototype models you can easily put back in your engine later on.

Is it secure?

As usual, we are very careful about security and especially with this feature as the notebook server acts as a virtual machine on which you will have a lot of permissions. You’ll be able to upload, view and edit any file on the virtual machine but the machine will be in an isolated virtual network with a very limited access to other parts of our infrastructure.

We only expose a read-only access to the ElasticSearch instance which stores the time-series data. You’ll find below an example of how to make authenticated queries to ElasticSearch and fetch live data from your instance.

This looks great but i don’t know how to start…

There are a lot of online resources to start using this tool: datacamp, openclassrooms, towardsdatascience. In addition to that, we’ve added our own examples which goes from simple csv loading to complex data modeling applied to industrial use cases. Those are available in the examples directory located at the root of your server.

Some explanatory examples

Load a csv file

The first example consists of loading a csv file and plotting some data. The illustrations below shows it can be easily done using pandas & matplotlib libraries.

Fetch live data from your models

We know that using the sandbox is not suited for every use case, in particular when we want to test a model on a dataset with more than 50k rows (which is the upper limit of the sandbox). This is why we expose our own python client library which provides methods to securely fetch data from our ElasticSearch cluster.

Using exploration tools

The notebook also comes with the altair_widgets library which can be used to easily do some point & click data exploration on DataFrame as the animated gif illustrates below.

What’s next?

To sum up, this Notebook is very likely to be useful at least at two moments:

  • At the beginning: when you’ll start doing some exploration of a newly acquired dataset (either from our acquisition system or another one).
  • To improve already built models: once you have implemented your first models using our complex event processing engine and your prior knowledge, you might want to improve its accuracy.

At InUse, we believe that a successfully implemented IoT project requires several tools for the several steps of the project and our mission is to provide you a fully integrated IoT solution that turns raw data into business applications which will help your team’s efficiency. Notebooks is one of those tools and we’ll be pleased to help you use it in your project.

 

Guillaume Thomas
InUse, CTO
© OptimData 2020    |     Privacy policy - Terms of use - Cookies policy - Legal Notices