2. Create a neural network

Now let’s look how to create neural networks in Gluon. In addition the NDArray package (nd) that we just covered, we now will also import the neural network nn package from gluon.

In [1]:
from mxnet import nd
from mxnet.gluon import nn

2.1. Create your neural network’s first layer

Let’s start with a dense layer with 2 output units.

In [2]:
layer = nn.Dense(2)
Dense(None -> 2, linear)

Then initialize its weights with the default initialization method, which draws random values uniformly from \([-0.7, 0.7]\).

In [3]:

Then we do a forward pass with random data. We create a \((3,4)\) shape random input x and feed into the layer to compute the output.

In [4]:
x = nd.random.uniform(-1,1,(3,4))

[[-0.02524132 -0.00874885]
 [-0.06026538 -0.01308061]
 [ 0.02468396 -0.02181557]]
<NDArray 3x2 @cpu(0)>

As can be seen, we the layer’s input limit of 2 produced a \((3,2)\) shape output from our \((3,4)\) input. Note that we didn’t specify the input size of layer before (though we can specify it with the argument in_units=4 here), the system will automatically infer it during the first time we feed in data, create and initialize the weights. So we can access the weight after the first forward pass:

In [5]:

[[-0.00873779 -0.02834515  0.05484822 -0.06206018]
 [ 0.06491279 -0.03182812 -0.01631819 -0.00312688]]
<NDArray 2x4 @cpu(0)>

2.2. Chain layers into a neural network

Let’s first consider a simple case that a neural network is a chain of layers. During the forward pass, we run layers sequentially one-by-one. The following code implements a famous network called LeNet through nn.Sequential.

In [6]:
net = nn.Sequential()
# Creating layers in a name scope to assign each layer a unique
# name so we can load/save their parameters later.
with net.name_scope():
    # Add a sequence of layers.
        # Similar to Dense, it is not necessary to specify the
        # input channels by the argument `in_channels`, which will be
        # automatically inferred in the first forward pass. Also,
        # we apply a relu activation on the output.
        # In addition, we can use a tuple to specify a
        # non-square kernel size, such as `kernel_size=(2,4)`
        nn.Conv2D(channels=6, kernel_size=5, activation='relu'),
        # One can also use a tuple to specify non-symmetric
        # pool and stride sizes
        nn.MaxPool2D(pool_size=2, strides=2),
        nn.Conv2D(channels=16, kernel_size=3, activation='relu'),
        nn.MaxPool2D(pool_size=2, strides=2),
        # flatten the 4-D input into 2-D with shape
        # `(x.shape[0], x.size/x.shape[0])` so that it can be used
        # by the following dense layers
        nn.Dense(120, activation="relu"),
        nn.Dense(84, activation="relu"),
  (0): Conv2D(None -> 6, kernel_size=(5, 5), stride=(1, 1))
  (1): MaxPool2D(size=(2, 2), stride=(2, 2), padding=(0, 0), ceil_mode=False)
  (2): Conv2D(None -> 16, kernel_size=(3, 3), stride=(1, 1))
  (3): MaxPool2D(size=(2, 2), stride=(2, 2), padding=(0, 0), ceil_mode=False)
  (4): Flatten
  (5): Dense(None -> 120, Activation(relu))
  (6): Dense(None -> 84, Activation(relu))
  (7): Dense(None -> 10, linear)

The usage of nn.Sequential is similar to nn.Dense. In fact, both of them are subclasses of nn.Block. The following codes show how to initialize the weights and run the forward pass.

In [7]:
# Input shape is (batch_size, RGB_channels, height, width)
x = nd.random.uniform(shape=(4,1,28,28))
y = net(x)
(4, 10)

We can use [] to index a particular layer. For example, the following accesses the 1st layer’s weight and 6th layer’s bias.

In [8]:
(net[0].weight.data().shape, net[5].bias.data().shape)
((6, 1, 5, 5), (120,))

2.3. Create a neural network flexibly

In nn.Sequential, MXNet will automatically construct the forward function that sequentially executes added layers. Now let’s introduce another way to construct a network with a flexible forward function.

To do it, we create a subclass of nn.Block and implement two methods:

  • __init__ create the layers
  • forward define the forward function.
In [9]:
class MixMLP(nn.Block):
    def __init__(self, **kwargs):
        # Run `nn.Block`'s init method
        super(MixMLP, self).__init__(**kwargs)
        with self.name_scope():
            self.blk = nn.Sequential()
            # Already within a name scope, no need to create
            # another scope.
                nn.Dense(3, activation='relu'),
                nn.Dense(4, activation='relu')
            self.dense = nn.Dense(5)
    def forward(self, x):
        y = nd.relu(self.blk(x))
        return self.dense(y)

net = MixMLP()
  (blk): Sequential(
    (0): Dense(None -> 3, Activation(relu))
    (1): Dense(None -> 4, Activation(relu))
  (dense): Dense(None -> 5, linear)

In the sequential chaining approach, we can only add instances with nn.Block as the base class and then run them in a forward pass. In this example, we used print to get the intermediate results and nd.relu to apply relu activation. So this approach provides a more flexible way to define the forward function.

The usage of net is similar as before.

In [10]:
x = nd.random.uniform(shape=(2,2))

[[  0.00000000e+00   0.00000000e+00   6.29003858e-04   7.64455399e-05]
 [  0.00000000e+00   0.00000000e+00   1.19893858e-03   1.23752037e-03]]
<NDArray 2x4 @cpu(0)>

[[ -3.80618403e-05   1.55683501e-05   4.36682149e-06   4.28530584e-05
 [ -1.83455195e-05   2.64030787e-05   2.46857308e-05   7.70193728e-05
<NDArray 2x5 @cpu(0)>

Finally, let’s access a particular layer’s weight

In [11]:

[[-0.0343901  -0.05805862 -0.06187592]
 [-0.06210143 -0.00918167 -0.00170272]
 [-0.02634858  0.05334064  0.02748809]
 [ 0.06669661 -0.01711474  0.01647211]]
<NDArray 4x3 @cpu(0)>