Use a Supercomputer easily (SLURM-emission)

I’ve created the slurm-emission package to make my life easier when I need to run experiments on a supercomputer. It allows me to focus on the Python code and not worry about the SLURM details. In this post, I will explain how to use it.

How to settle in Milan for non-EU researchers

This article was written by Xiaohe Zhou, after all her struggles to settle in Milan as a non-EU researcher. She is currently a postdoc at the University of Milan, Bicocca.

I don't get the Strong Law of Large Numbers

I’ve been reading the article An Elementary Proof of the Strong Law of Large Numbers since it’s closely related to a result that I used in my RNN theory. I thought, once I understand it, I can refine my proof. However, I don’t understand it. Yet :)

Linear Recurrent Unit in TensorFlow 2.10

I wanted to check if my RNN theory worked on state-space models. So I implemented the Linear Recurrent Unit in TensorFlow 2.10, and since I have it, why not to share it? I tried to make it a clean code that could be easy to use and understand. In the coming days I’ll turn it into a pip package. The LRU was introduced in Resurrecting Recurrent Neural Networks for Long Sequences at ICML, and belongs to the state-space models family, which are models able to handle extremely long sequences more gracefully than attention based architectures. You can find here the JAX implementation that I took as a reference, as recommended by one of the authors of the LRU.

Neural ODEs and Continuization in Spirit

There’s something missing in Neural ODEs. Let me know if I convinced you. If we use the standard Neural ODE for a linear map we have

Desample and Deslice

This is a hyper-technical post, only for despicable freaks like me. Maybe somebody is looking for similar functions so if that’s the case, feel free to copy paste. For a project I’m working on, I needed to sample a tensor, but keep it inside the original tensorflow graph. So I had to sample the tensor, keep the remainder, and then get back the original tensor together, as if nothing had happened. I found a way to do it. I call it sample and desample. For another part of the project I needed a random slice of a multidimensional tensor, and put it back to its original form, to have it back in the graph. I call it slice and deslice. I’m going to explain how I did it, and how it works. You can find the code here, and give a try to the containing package that I’m building here. I’d love to get some feedback.

Starting a Startup

It’s been since last Feb, that I’ve been working with two friends on what we would like to become a full-fledged startup. Since we are all busy with our own lives, we have been working on this project honestly too little for my taste. Since as well I have to control my tendencies to follow blindly my enthusiasm, I’ve been isolating the hours I allow myself to work on it on Sats.

Bot with Big Personality

I’m working on a bot that has to keep a consistent personality while talking about anything. What I have now is a bot that can be given a description of an environment, a description of which personality it should take, and the sentence that a human is asking to it. For now it is able to provide varied replies. The language it uses is not yet the sharpest possible but I think it’s fun to interact with.

Take a Derivative wrt the Moon

For the article I’m working on, I needed to control some properties of the derivative of the hidden state of a recurrent network wrt to the hidden state at the previous time step, basically

Intel and CogHear presentations

Just as a way to keep track of things I’ve been doing, I recently presented my work at Intel’s INRC meeting, and my collegue Maryam Hosseini presented the work I coauthored at CogHear.

Our Workshop at NeurIPS

What started as wishful thinking between like minded friends, ended up as a complete Workshop in what is probably the most important scenario for Artificial Intelligence, NeurIPS. They realized not enough attention was being paid to the emerging field that lies in between deep learning (DL) and differential equations (DE) and they wanted to provide a platform for that scientific community to be able to discuss and share ideas. In recent years, there has been a rapid increase of machine learning applications in computational sciences, with some of the most impressive results at the interface of those two fields.

Consequences of zero variance distribution

M asked, what happens when the variance of a distribution is equal to zero, can it still have higher moments different from zero?

BiGamma distribution

The gamma distribution is defined as

TensorBoard for Grads and Individual Weights

I needed to record gradients during training and individual weights, instead of tensor means, using tensorboard, but the former is not available anymore by default, and the latter never was. So I updated the tf2 callback, and since probably some of you might find it useful you can find it here.

A chatbot to talk about the real stuff [Ep. 2]

I wanted to complete the previous post where I reproduced in tensorflow one of the generative chatbots proposed by Wizard of Wikipedia: Knowledge-Powered Conversational agents. Funny enough, they linked my project in their official repository (check the Note: an unofficial …), which could not make me more proud! To be honest I asked if they could, but they did! As well I mentioned in the previous post that I didn’t know if they optimized on the perplexity (PPL) or masked perplexity, and they confirmed by email that they optimized on PPL, which is good news for me, since my results are better on PPL, not when I compare my masked PPL with their PPL.

Reacting to 'Demystifying the Writing of a Scientific Paper'

I’ve been recommended by my PhD supervisor to watch the video Demystifying the Writing of a Scientific Paper. Super interesting and fully recommendable in my opinion too. As a teaser I’ll present a summary here, that I will structure in 3 axis: plagiarism, clarity and revision.

ReQU might not be the activation you want

The book The Principles of Deep Learning Theory proposes in chapter 5 a method to study the quality of activation functions. The network they study is a fully connected network. They use the assumption that a desirable property for a network is that the covariance of their activations remains constant through the network, neither exploding nor imploding. They use this assumption/desiderata, to be able to use fixed point analysis to study how well a given activations encourages such representations.

A chatbot to talk about the real stuff [Ep. 1]

Have you ever wondered how it is to talk to an AI that can answer to precise questions about life? Well I wonder too, and as I wanted an answer, I tried to code it myself. What I did instead, is to translate into tensorflow what is defined as a grounded chatbot, that was originally written in pytorch for the ParlAI library. If you are interested, you can find my tensorflow implementation here.

A higher derivative gives info from further away

The definition of a derivative states that the derivative quantifies the relationship between two points. We prove that higher order derivatives quantify the relationship among several points.

Towards a Machine Scientist against Covid19

For those that are interested in Language Generation, here I’ll show you how to use the HuggingFace/Transformers library to finetune GPT2, on the Kaggle Covid19 dataset. Ideally the goal is to make an automatic scientist. If it is first finetuned on a larger scientific dataset and then finetuned on the covid19 dataset the connections that the algorithm would make would be more interesting. The full code is available here.