Applications
Keras Applications are deep learning models that are made available alongside pre-trained weights. These models can be used for prediction, feature extraction, and fine-tuning.
Weights are downloaded automatically when instantiating a model. They are stored at ~/.keras/models/
.
Available models
Models for image classification with weights trained on ImageNet:
All of these architectures (except Xception) are compatible with both TensorFlow and Theano, and upon instantiation the models will be built according to the image dimension ordering set in your Keras configuration file at ~/.keras/keras.json
. For instance, if you have set image_dim_ordering=tf
, then any model loaded from this repository will get built according to the TensorFlow dimension ordering convention, "Width-Height-Depth".
The Xception model is only available for TensorFlow, due to its reliance on SeparableConvolution
layers.
Model for music audio file auto-tagging (taking as input Mel-spectrograms):
Usage examples for image classification models
Classify ImageNet classes with ResNet50
from keras.applications.resnet50 import ResNet50
from keras.preprocessing import image
from keras.applications.resnet50 import preprocess_input, decode_predictions
import numpy as np
model = ResNet50(weights='imagenet')
img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
preds = model.predict(x)
# decode the results into a list of tuples (class, description, probability)
# (one such list for each sample in the batch)
print('Predicted:', decode_predictions(preds, top=3)[0])
# Predicted: [(u'n02504013', u'Indian_elephant', 0.82658225), (u'n01871265', u'tusker', 0.1122357), (u'n02504458', u'African_elephant', 0.061040461)]
Extract features with VGG16
from keras.applications.vgg16 import VGG16
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input
import numpy as np
model = VGG16(weights='imagenet', include_top=False)
img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
features = model.predict(x)
Extract features from an arbitrary intermediate layer with VGG19
from keras.applications.vgg19 import VGG19
from keras.preprocessing import image
from keras.applications.vgg19 import preprocess_input
from keras.models import Model
import numpy as np
base_model = VGG19(weights='imagenet')
model = Model(input=base_model.input, output=base_model.get_layer('block4_pool').output)
img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
block4_pool_features = model.predict(x)
Fine-tune InceptionV3 on a new set of classes
from keras.applications.inception_v3 import InceptionV3
from keras.preprocessing import image
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D
from keras import backend as K
# create the base pre-trained model
base_model = InceptionV3(weights='imagenet', include_top=False)
# add a global spatial average pooling layer
x = base_model.output
x = GlobalAveragePooling2D()(x)
# let's add a fully-connected layer
x = Dense(1024, activation='relu')(x)
# and a logistic layer -- let's say we have 200 classes
predictions = Dense(200, activation='softmax')(x)
# this is the model we will train
model = Model(input=base_model.input, output=predictions)
# first: train only the top layers (which were randomly initialized)
# i.e. freeze all convolutional InceptionV3 layers
for layer in base_model.layers:
layer.trainable = False
# compile the model (should be done *after* setting layers to non-trainable)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
# train the model on the new data for a few epochs
model.fit_generator(...)
# at this point, the top layers are well trained and we can start fine-tuning
# convolutional layers from inception V3. We will freeze the bottom N layers
# and train the remaining top layers.
# let's visualize layer names and layer indices to see how many layers
# we should freeze:
for i, layer in enumerate(base_model.layers):
print(i, layer.name)
# we chose to train the top 2 inception blocks, i.e. we will freeze
# the first 172 layers and unfreeze the rest:
for layer in model.layers[:172]:
layer.trainable = False
for layer in model.layers[172:]:
layer.trainable = True
# we need to recompile the model for these modifications to take effect
# we use SGD with a low learning rate
from keras.optimizers import SGD
model.compile(optimizer=SGD(lr=0.0001, momentum=0.9), loss='categorical_crossentropy')
# we train our model again (this time fine-tuning the top 2 inception blocks
# alongside the top Dense layers
model.fit_generator(...)
Build InceptionV3 over a custom input tensor
from keras.applications.inception_v3 import InceptionV3
from keras.layers import Input
# this could also be the output a different Keras model or layer
input_tensor = Input(shape=(224, 224, 3)) # this assumes K.image_dim_ordering() == 'tf'
model = InceptionV3(input_tensor=input_tensor, weights='imagenet', include_top=True)
Documentation for individual models
Xception
keras.applications.xception.Xception(include_top=True, weights='imagenet', input_tensor=None, input_shape=None)
Xception V1 model, with weights pre-trained on ImageNet.
On ImageNet, this model gets to a top-1 validation accuracy of 0.790 and a top-5 validation accuracy of 0.945.
Note that this model is only available for the TensorFlow backend,
due to its reliance on SeparableConvolution
layers. Additionally it only supports
the dimension ordering "tf" (width, height, channels).
The default input size for this model is 299x299.
Arguments
- include_top: whether to include the fully-connected layer at the top of the network.
- weights: one of
None
(random initialization) or "imagenet" (pre-training on ImageNet). - input_tensor: optional Keras tensor (i.e. output of
layers.Input()
) to use as image input for the model. - inputs_shape: optional shape tuple, only to be specified
if
include_top
is False (otherwise the input shape has to be(299, 299, 3)
. It should have exactly 3 inputs channels, and width and height should be no smaller than 71. E.g.(150, 150, 3)
would be one valid value.
Returns
A Keras model instance.
References
License
These weights are trained by ourselves and are released under the MIT license.
VGG16
keras.applications.vgg16.VGG16(include_top=True, weights='imagenet', input_tensor=None, input_shape=None)
VGG16 model, with weights pre-trained on ImageNet.
This model is available for both the Theano and TensorFlow backend, and can be built both with "th" dim ordering (channels, width, height) or "tf" dim ordering (width, height, channels).
The default input size for this model is 224x224.
Arguments
- include_top: whether to include the 3 fully-connected layers at the top of the network.
- weights: one of
None
(random initialization) or "imagenet" (pre-training on ImageNet). - input_tensor: optional Keras tensor (i.e. output of
layers.Input()
) to use as image input for the model. - inputs_shape: optional shape tuple, only to be specified
if
include_top
is False (otherwise the input shape has to be(224, 224, 3)
(withtf
dim ordering) or(3, 224, 244)
(withth
dim ordering). It should have exactly 3 inputs channels, and width and height should be no smaller than 48. E.g.(200, 200, 3)
would be one valid value.
Returns
A Keras model instance.
References
- Very Deep Convolutional Networks for Large-Scale Image Recognition: please cite this paper if you use the VGG models in your work.
License
These weights are ported from the ones released by VGG at Oxford under the Creative Commons Attribution License.
VGG19
keras.applications.vgg19.VGG19(include_top=True, weights='imagenet', input_tensor=None, input_shape=None)
VGG19 model, with weights pre-trained on ImageNet.
This model is available for both the Theano and TensorFlow backend, and can be built both with "th" dim ordering (channels, width, height) or "tf" dim ordering (width, height, channels).
The default input size for this model is 224x224.
Arguments
- include_top: whether to include the 3 fully-connected layers at the top of the network.
- weights: one of
None
(random initialization) or "imagenet" (pre-training on ImageNet). - input_tensor: optional Keras tensor (i.e. output of
layers.Input()
) to use as image input for the model. - inputs_shape: optional shape tuple, only to be specified
if
include_top
is False (otherwise the input shape has to be(224, 224, 3)
(withtf
dim ordering) or(3, 224, 244)
(withth
dim ordering). It should have exactly 3 inputs channels, and width and height should be no smaller than 48. E.g.(200, 200, 3)
would be one valid value.
Returns
A Keras model instance.
References
License
These weights are ported from the ones released by VGG at Oxford under the Creative Commons Attribution License.
ResNet50
keras.applications.resnet50.ResNet50(include_top=True, weights='imagenet', input_tensor=None, input_shape=None)
ResNet50 model, with weights pre-trained on ImageNet.
This model is available for both the Theano and TensorFlow backend, and can be built both with "th" dim ordering (channels, width, height) or "tf" dim ordering (width, height, channels).
The default input size for this model is 224x224.
Arguments
- include_top: whether to include the fully-connected layer at the top of the network.
- weights: one of
None
(random initialization) or "imagenet" (pre-training on ImageNet). - input_tensor: optional Keras tensor (i.e. output of
layers.Input()
) to use as image input for the model. - inputs_shape: optional shape tuple, only to be specified
if
include_top
is False (otherwise the input shape has to be(224, 224, 3)
(withtf
dim ordering) or(3, 224, 244)
(withth
dim ordering). It should have exactly 3 inputs channels, and width and height should be no smaller than 197. E.g.(200, 200, 3)
would be one valid value.
Returns
A Keras model instance.
References
License
These weights are ported from the ones released by Kaiming He under the MIT license.
InceptionV3
keras.applications.inception_v3.InceptionV3(include_top=True, weights='imagenet', input_tensor=None, input_shape=None)
Inception V3 model, with weights pre-trained on ImageNet.
This model is available for both the Theano and TensorFlow backend, and can be built both with "th" dim ordering (channels, width, height) or "tf" dim ordering (width, height, channels).
The default input size for this model is 299x299.
Arguments
- include_top: whether to include the fully-connected layer at the top of the network.
- weights: one of
None
(random initialization) or "imagenet" (pre-training on ImageNet). - input_tensor: optional Keras tensor (i.e. output of
layers.Input()
) to use as image input for the model. - inputs_shape: optional shape tuple, only to be specified
if
include_top
is False (otherwise the input shape has to be(299, 299, 3)
(withtf
dim ordering) or(3, 299, 299)
(withth
dim ordering). It should have exactly 3 inputs channels, and width and height should be no smaller than 139. E.g.(150, 150, 3)
would be one valid value.
Returns
A Keras model instance.
References
License
These weights are trained by ourselves and are released under the MIT license.
MusicTaggerCRNN
keras.applications.music_tagger_crnn.MusicTaggerCRNN(weights='msd', input_tensor=None, include_top=True)
A convolutional-recurrent model taking as input a vectorized representation of the MelSpectrogram of a music track and capable of outputting the musical genre of the track. You can use keras.applications.music_tagger_crnn.preprocess_input
to convert a sound file to a vectorized spectrogram. This requires to have installed the Librosa library. See the usage example.
Arguments
- weights: one of
None
(random initialization) or "msd" (pre-training on Million Song Dataset). - input_tensor: optional Keras tensor (i.e. output of
layers.Input()
) to use as image input for the model. - include_top: whether to include the 1 fully-connected layer (output layer) at the top of the network. If False, the network outputs 32-dim features.
Returns
A Keras model instance.
References
License
These weights are ported from the ones released by Keunwoo Choi under the MIT license.
Examples: music tagging and audio feature extraction
from keras.applications.music_tagger_crnn import MusicTaggerCRNN
from keras.applications.music_tagger_crnn import preprocess_input, decode_predictions
import numpy as np
# 1. Tagging
model = MusicTaggerCRNN(weights='msd')
audio_path = 'audio_file.mp3'
melgram = preprocess_input(audio_path)
melgrams = np.expand_dims(melgram, axis=0)
preds = model.predict(melgrams)
print('Predicted:')
print(decode_predictions(preds))
# print: ('Predicted:', [[('rock', 0.097071797), ('pop', 0.042456303), ('alternative', 0.032439161), ('indie', 0.024491295), ('female vocalists', 0.016455274)]])
#. 2. Feature extraction
model = MusicTaggerCRNN(weights='msd', include_top=False)
audio_path = 'audio_file.mp3'
melgram = preprocess_input(audio_path)
melgrams = np.expand_dims(melgram, axis=0)
feats = model.predict(melgrams)
print('Features:')
print(feats[0, :10])
# print: ('Features:', [-0.19160545 0.94259131 -0.9991011 0.47644514 -0.19089699 0.99033844 0.1103896 -0.00340496 0.14823607 0.59856361])