Symbolic Frameworks
Frameworks for symbolic computation (MXNET, TensorFlow, Theano) feature symbolic graphs of vector operations, such as matrix addition/multiplication or convolution. A layer is just a set of those operations. Thanks to the granularity of operations, users can build new complex types of layers without using low-level languages.
I have the experience of using various symbolic computation frameworks. As it turned out, there is no such thing as a perfect symbolic framework that would fit all the requirements. Each framework has its advantages and disadvantages. However, currently, I’m using Theano.
Let’s compare the three major symbolic frameworks in the table below.
Non-symbolic frameworksSymbolic vs. Non-symbolic Frameworks
Pros:
- Non-symbolic (imperative) neural frameworks, such as astorch, and caffe, have a very similar structure in terms of computation.
- In terms of expressiveness, imperative frameworks are built pretty well; they can also have a graph-like interface
Cons:
- Manual optimization can be regarded as a major drawback of imperative networks. For example, in-place operations need to be implemented manually.
- Most imperative frameworks are inferior to symbolic frameworks in terms of expressiveness.
Symbolic frameworks
Pros:
- Symbolic frameworks support automatic optimization using the dependency graphs.
- Symbolic frameworks offer wider possibilities in terms of memory reuse (as in MXNET).
- Symbolic frameworks can automatically compute the optimal traffic.
Cons:
- The existing open source symbolic frameworks show lower performance than their imperative counterparts.
Adding new operations
In all the examined frameworks, adding new operations with reasonable performance is pretty complicated.
Code re-usability
Since training deep networks is extremely time-consuming, Caffe launched several pre-trained models (so-called “model zoos”) to be used as initial weights while transfer learning or adjusting deep networks.
Low-level Tensor operators
All the frameworks have a pretty efficient implementation of low-level operators. Those can be used as components for building new models, eliminating the need to write new operations.
Control flow operators
Control flow operators enhance the expressiveness and versatility of a symbolic system.
High-level support
Performance
Single-GPU
I measured the performance of LeNet model on MNIST dataset using a single processor (NVIDIA Quadro K1200).
Memory
Because GPU memory is limited, this can be a major problem for large models.
Single-GPU speed
Theano takes too long to compile graphs, especially when it comes to complex models. TensorFlow is even slower.
Parallel and distributed support
Final considerations
Theano (with higher-level Lasagne and Keras solutions) is a great choice for deep learning models. Lasagne/Keras makes it amazingly easy to build new networks and modify the existing ones. Since I prefer Python, I choose Lasagne/Keras because of their well-developed Python interface. At the same time, these solutions don’t support R. Due to their transfer learning and fine-tuning capacities, Lasagne and Keras make it easy to modify the existing networks and adjust them with domain-specific user data.
Based on the comparison of frameworks, we can see that the winner is MXNET. It features higher performance and uses memory in a more effective way. Plus, it has a great R support. In fact, MXNET is the only platform that supports all functions on R. While MXNET has the transfer learning and fine-tuning features, they are harder to use than in Lasagne/Keras. This is why modifying the existing trained networks and working with domain-specific data in MXNET is quite a challenge.