Take a Derivative wrt the Moon

For the article I’m working on, I needed to control some properties of the derivative of the hidden state of a recurrent network wrt to the hidden state at the previous time step, basically

\[\begin{align*} \frac{\partial h_t}{\partial h_{t-1}} \end{align*}\]

I couldn’t find anybody’s code, and it’s kind of tricky, so I thought I could post it here in case anybody else needs it. It would be very cool if tensorflow and pytorch had an built in way to ask for any derivative of any tensor wrt to any other, but it seems much easier to get the derivative wrt a parameter than wrt a tensor representation. I start off by defining a custom RNN, for the sake of making it simple and easy to check mathematically at the end that I’m getting what I want to get. The mathematical definition of my custom RNN is

\[\begin{align*} h_t =& W_h ReLU(h_{t-1}) \\ c_t =& c_{t-1} - h_{t-1}+h_{t}\\ o_t =& sigmoid(W_o x_{t}) +c_t\\ \end{align*}\]

where we have two hidden states, $h_t, c_t$, kind of like in the LSTM, the output $o_t$ is built as a combination of both, and the neuron receives the input $x_t$. As I said, it’s not a genius architecture, but it makes the code output easy to check. In tensorflow it looks like

class customRNN(tf.keras.layers.Layer):

    def get_config(self):
        return self.init_args

    def __init__(self, num_neurons=None, initializer='glorot_uniform', **kwargs):
        self.init_args = dict(num_neurons=num_neurons)
        super().__init__(**kwargs)
        self.__dict__.update(self.init_args)

        self.activation_rec = tf.nn.relu
        # self.activation_rec = tf.nn.sigmoid
        # self.activation_rec = lambda x: x
        self.activation_in = tf.nn.sigmoid

        self.state_size = [num_neurons, num_neurons]

        self.linear_rec = tf.keras.layers.Dense(num_neurons, kernel_initializer=initializer)
        self.linear_in = tf.keras.layers.Dense(num_neurons, use_bias=False, kernel_initializer=initializer)

    def call(self, inputs, states, training=None):
        old_h, old_c = states

        h = self.linear_rec(self.activation_rec(old_h))
        c = old_c - old_h + h

        output = self.activation_in(self.linear_in(inputs) + c)

        new_state = [h, c]
        return output, new_state

We define the model construction as

def build_model():
    input_layer = tf.keras.Input(shape=(1, input_dim), batch_size=batch_size)
    hi = tf.keras.Input(shape=(units,), batch_size=batch_size)
    ci = tf.keras.Input(shape=(units,), batch_size=batch_size)

    cell = customRNN(units)
    lstm = tf.keras.layers.RNN(cell, return_state=True, return_sequences=True, stateful=True)

    lstm_out, hidden_state, cell_state = lstm(input_layer, initial_state=(hi, ci))

    model = tf.keras.Model(inputs=[input_layer, hi, ci], outputs=[lstm_out, hidden_state, cell_state])
    return model

Finally, the code that will give you the desired derivative is the following

import numpy as np
import tensorflow as tf

np.random.seed(42)
tf.random.set_seed(42)

input_dim = 2
time_steps = 4
batch_size = 1
units = 100

batch = tf.random.normal((batch_size, time_steps, input_dim))
h0 = tf.random.normal((batch_size, units))
c0 = tf.random.normal((batch_size, units))

ht, ct = h0, c0

for t in range(time_steps):
    print(t, '-' * 30)
    bt = batch[:, t, :][:, None]
    with tf.GradientTape() as tape:
        tape.watch(bt)
        tape.watch(ht)
        tape.watch(ct)
        model = build_model()
        otp1, htp1, ctp1 = model([bt, ht, ct])

    grad = tape.batch_jacobian(htp1, ht, )

    ht, ct = htp1, ctp1

    print(np.var(grad)*units)

The output of that last print is $0.5$, which is exactly what we would expect for a Glorot initialization of a ReLU architecture, which is the mathematical check that I mentioned above. As well it prints $1.0$ if we make the change from a ReLU to a linear activation, which is expected for a Glorot initialization of a linear architecture.

If you can figure out a better way to calculate these derivatives, let me know, it would be very interesting!

Written on August 18, 2022