A couple of weeks ago, a few members of our team had the pleasure to participate in PyData Warsaw 2017 held at the Copernicus Science Center in Warsaw on October 19-20, 2017.

PyData conferences aim to connect users and developers of data analysis tools to meet, share ideas, and learn from one another. This event was the first one and it was already a huge success! The community plans to gather every year to discuss applications of Python tools (plus tools using R and Julia) and meet the challenges in data management, processing, analytics, and visualization.

Here’s what this year’s PyData conference looked like.

Upon Arrival at PyData Warsaw 2017

PyData attracted over 300 participants, and it’s clear that the number was much higher than expected by the organizers – which is a great thing! The location was just excellent. The Copernicus Science Center is located in the central part of the city, but somehow you don’t feel that annoying city buzz – many people we talked to really liked the venue.

A minor organizational glitch was the division of the auditorium space. The conference was divided into 3 streams, and 2 streams would always get something like 1/4 of the conference room. That’s why it was often hard for participants to find a seat in these streams.

Keynote Lectures

The first keynote speaker was Jarek Kuśmierek, a Senior Engineering Manager at Google. He talked about the current revolutions in data science, especially in machine learning. He used many examples to show how Google applies machine learning – for example, they used machine learning in the network that operates the ventilation system at their data centers, allowing for 40% of savings in energy use.

PyData Warsaw

He also presented Google’s API for machine learning. A developer who has never worked with machine learning before will be able to create an application that can recognize human speech or classify images automatically and tag them. All in all, it was hard not to agree with Jarek – we are right in the middle of this revolution.

The second keynote lecture was by Radim Řehůřek, the creator of Gensim and machine learning consultant. Radim fascinating talk was about interpretable data models. As machine learning algorithms are becoming more popular and advanced, we are beginning to lose the understanding and control of them. We increasingly treat these algorithms like black boxes to which we just feed the data, and they give us a result, without us knowing exactly how the program arrived at that conclusion. A potential solution to that problem lies in building and using tools that help developers understand how a given neural network works like. We should also try to be more responsible about using machine learning and stop feeding algorithms with vast amounts of data without a second thought.

Radim said we should try to come up with the result on our own first. He suggested that we tend to trust in machine learning a little too much today – and we couldn’t agree more.

Other Interesting Talks

In general, we saw a lot of talks about Natural Language Processing (NLP), and it’s clear that there is much work still to be done in this area. We’re talking primarily about the Slavic language family which includes fusional languages – contrary to English, languages like Polish or Russian have more complex rules and contain many ambiguities. They are also surrounded by a much smaller community and are hard to process.

We learned about tools that could be useful in our projects at sunscrapers: word2vec, GloVe, fasttext, to name just a few.

Here are 3 talks we found particularly interesting:

Szymon Warda offered an interesting review of alternatives to databases and told us what type of databases are most useful for specific applications. Fun fact: Apache Accumulo is a database that was created for security purposes by none other than the NSA…

Another interesting talk was given by Kornel Lewandowski who looked at personal data security in medical documentation. Analyzing medical documentation can be very productive – however, these documents often contain plenty of personal data that needs to be cleared before analysis. Kornel showed us various techniques for identifying personal data, for example regular expressions, dictionaries, rule-based methods, machine learning based named entity recognizers. We got a view of the entire workflow responsible for that type of function, as well as the architecture.

The last talk that made a great impression on us was “Despicable machines: how computers can be assholes” by Maciek Gryka. As you can tell from the title, the talk was dedicated to a phenomenon known as the machine bias. Machine learning algorithms that analyze data about humans can easily learn behaviors that were not intended by developers. For example, there exists an algorithm that calculates the likelihood of committing a future crime that learned to take into account factors like skin color or facial expression. We don’t have an easy solution for this controversial issue yet, though interpretability might be one.

To put it simply, models that serve to describe and judge humans need to be understandable to us. We should know exactly how they work and why they deliver specific results. With that knowledge, we will be able to modify these models to avoid machine bias. You can see Maciek’s talk on the topic here.

Naturally, there were many more interesting presentations and we wish we could talk about them here. To get the idea, have a look at the schedule to see short descriptions and abstracts of all talks.

PyData Warsaw 2017 was packed with inspiring talks that showed us all some pretty smart solutions and potential directions for the future. We wish to thank the organizers for making that happen – it’s great to be part of the PyData community!

PyData Warsaw

We had a blast and will surely be there next year, so see you at PyData Warsaw 2018!

Sunscrapers Team
Sunscrapers

Sunscrapers empowers visionary leaders to ride the wave of the digital transformation with solutions that generate tangible business results. Thanks to agile and lean startup methods, we deliver high-quality software at top speed and efficiency.

LOAD COMMENTS

arrow

Growth & culture Project management Startups

Outsourcing best practices: How to manage agile collaborations with challenging clients

Let me begin by saying this: in outsourcing partnerships, it’s not the clients that present a challenge, but communication. Which is quite tough to master in general. This article [...]

Python

Python best practices: Static typing in Python with mypy

Static typing is an approach to writing code that allows developers to specify the type of variables and return type of functions before the code is actually run. By [...]

Join our newsletter.

Scroll to bottom

Hi there, we use cookies to provide you with an amazing experience on our site. If you continue without changing the settings, we'll assume that you're happy to receive all cookies on the Sunscrapers website. You can change your cookie settings at any time.

Learn more