M asked, what happens when the variance of a distribution is equal to zero, can it still have higher moments different from zero?
The gamma distribution is defined as
I needed to record gradients during training and individual weights, instead of tensor means, using tensorboard, but the former is not available anymore by default, and the latter never was. So I updated the tf2 callback, and since probably some of you might find it useful you can find it here.
I wanted to complete the previous post where I reproduced in tensorflow one of the generative chatbots proposed by Wizard of Wikipedia: Knowledge-Powered Conversational agents. Funny enough, they linked my project in their official repository (check the Note: an unofficial …), which could not make me more proud! To be honest I asked if they could, but they did! As well I mentioned in the previous post that I didn’t know if they optimized on the perplexity (PPL) or masked perplexity, and they confirmed by email that they optimized on PPL, which is good news for me, since my results are better on PPL, not when I compare my masked PPL with their PPL.
I’ve been recommended by my PhD supervisor to watch the video Demystifying the Writing of a Scientific Paper. Super interesting and fully recommendable in my opinion too. As a teaser I’ll present a summary here, that I will structure in 3 axis: plagiarism, clarity and revision.
The book The Principles of Deep Learning Theory proposes in chapter 5 a method to study the quality of activation functions. The network they study is a fully connected network. They use the assumption that a desirable property for a network is that the covariance of their activations remains constant through the network, neither exploding nor imploding. They use this assumption/desiderata, to be able to use fixed point analysis to study how well a given activations encourages such representations.
Have you ever wondered how it is to talk to an AI that can answer to precise questions about life? Well we wonder too, and we don’t have the answer for you tonight. What we did is to translate into tensorflow what can be defined as a grounded chatbot, that was originally written in pytorch for ParlAI.
The definition of a derivative states that the derivative quantifies the relationship between two points. We prove that higher order derivatives quantify the relationship among several points.
For those that are interested in Language Generation, here I’ll show you how to use the HuggingFace/Transformers library to finetune GPT2, on the Kaggle Covid19 dataset. Ideally the goal is to make an automatic scientist. If it is first finetuned on a larger scientific dataset and then finetuned on the covid19 dataset the connections that the algorithm would make would be more interesting. The full code is available here.
Argmax allows us to identify the most likely item in a probability distribution. But there is a problem: it is not differentiable. At least in the implementation that is commonly used.