Geeks With Blogs
Josh Reuben

Keras QuickRef

Keras is a high-level neural networks API, written in Python that runs on top of the Deep Learning framework TensorFlow. In fact, tf.keras will be integrated directly into TensorFlow 1.2 !
Here are my API notes:

Model API

load_weights(filepath, by_name)

Model Sequential /Functional APIs

compile(optimizer, loss, metrics, sample_weight_mode)
fit(x, y, batch_size, nb_epoch, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight)
evaluate(x, y, batch_size, verbose, sample_weight)

predict(x, batch_size, verbose)
predict_classes(x, batch_size, verbose)
predict_proba(x, batch_size, verbose)

train_on_batch(x, y, class_weight, sample_weight)
test_on_batch(x, y, class_weight)

fit_generator(generator, samples_per_epoch, nb_epoch, verbose, callbacks, validation_data, nb_val_samples, class_weight, max_q_size, nb_worker, pickle_safe)
evaluate_generator(generator, val_samples, max_q_size, nb_worker, pickle_safe)
predict_generator(generator, val_samples, max_q_size, nb_worker, pickle_safe)

get_layer(name, index)



Densevanilla fully connected NN layer(nb_samples, input_dim) --> (nb_samples, output_dim)output_dim/shape, init, activation, weights, W_regularizer, b_regularizer, activity_regularizer, W_constraint, b_constraint, bias, input_dim/shape
ActivationApplies an activation function to an outputTN --> TNactivation
Dropoutrandomly set fraction p of input units to 0 at each update during training time --> reduce overfittingTN --> TNp
SpatialDropout2D/3Ddropout of entire 2D/3D feature maps to counter pixel / voxel proximity correlation(samples, rows, cols, [stacks,] channels) --> (samples, rows, cols, [stacks,] channels)p
FlattenFlattens the input to 1D(nb_samples, D1, D2, D3) --> (nb_samples, D1xD2xD3)-
ReshapeReshapes an output to a different factorizationeg (None, 3, 4) --> (None, 12) or (None, 2, 6)target_shape
PermutePermutes dimensions of input - output_shape is same as the input shape, but with the dimensions re-orderedeg (None, A, B) --> (None, B, A)dims
RepeatVectorRepeats the input n times(nb_samples, features) --> (nb_samples, n, features)n
Mergemerge a list of tensors into a single tensor[TN] --> TNlayers, mode, concat_axis, dot_axes, output_shape, output_mask, node_indices, tensor_indices, name
LambdaTensorFlow expressionflexiblefunction, output_shape, arguments
ActivityRegularizationregularize the cost functionTN --> TNl1, l2
Maskingidentify timesteps in D1 to be skippedTN --> TNmask_value
HighwayLSTM for FFN ?(nb_samples, input_dim) --> (nb_samples, output_dim)same as Dense + transform_bias
MaxoutDensetakes the element-wise maximum of prev layer - to learn a convex, piecewise linear activation function over the inputs ??(nb_samples, input_dim) --> (nb_samples, output_dim)same as Dense + nb_feature
TimeDistributedApply a Dense layer for each D1 time_dimension(nb_sample, time_dimension, input_dim) --> (nb_sample, time_dimension, output_dim)Dense


Convolution1Dfilter neighborhoods of 1D inputs(samples, steps, input_dim) --> (samples, new_steps, nb_filter)nb_filter, filter_length, init, activation, weights, border_mode, subsample_length, W_regularizer, b_regularizer, activity_regularizer, W_constraint, b_constraint, bias, input_dim, input_length
Convolution2Dfilter neighborhoods of 2D inputs(samples, rows, cols, channels) --> (samples, new_rows, new_cols, nb_filter)like Convolution1D + nb_row, nb_col instead of filter_lengthsubsample, dim_ordering
AtrousConvolution1/2Ddilated convolution with holessame as Convolution2Dsame as Convolution1/2D + atrous_rate
SeparableConvolution2Dfirst does a depth 1st spatial convolution on each input channel separately, then a pointwise convolution which mixes together the resulting output channels.same as Convolution2Dsame as Convolution2D + depth_multiplier, depthwise_regularizer, pointwise_regularizer, depthwise_constraint, pointwise_constraint
Deconvolution2DTransposed convolution ???
Convolution3D(samples, conv_dim1, conv_dim2, conv_dim3, channels) --> (samples, new_conv_dim1, new_conv_dim2, new_conv_dim3, nb_filter)kernel_dim1, kernel_dim2, kernel_dim3
Cropping1D/2D/3Dcrops along the dimension(s)(samples, depth, [axes_to_crop]) -->(samples, depth, [cropped_axes])cropping, dim_order
UpSampling1D/2D/3DRepeat each step x times along the specified axes(samples, [dims], channels) --> (samples, [upsampled_dims], channels)size, dim_order
ZeroPadding1/2/3D0 padding(samples, [dims], channels) --> (samples, [padded_dims], channels)padding, dim_order

Pooling && Locally Connected

Max/AveragePooling1/2/3Ddownscale to max / average(samples, [len_pool_dimN], channels) -->(samples, [pooled_dimN], channels)pool_size, strides, border_mode, dim_ordering
GlobalMax/GlobalAveragePooling1/2Ddownscale to max / average(samples, [len_pool_dimN], channels) -->(samples, [pooled_dimN], channels)dim_ordering
Locally Connected1D/2Dsimilarly to ConvolutionxD but weights are unshared - different filters applied at each patchlike ConvolutionxD + subsample


Recurrentabstract base class(nb_samples, timesteps, input_dim) --> (return_sequences)?(nb_samples, timesteps, output_dim):(nb_samples, output_dim)weights, return_sequences, go_backwards, stateful, unroll, consume_less, input_dim, input_length
SimpleRNNFully-connected RNN where output is fed back as inputlike RecurrentRecurrent + output_dim, init, inner_init, activation, W_regularizer, U_regularizer, b_regularizer, dropout_W, dropout_U
GRUGated Recurrent Unitlike Recurrentlike SimpleRNN
LSTMLong-Short Term Memory unitlike Recurrentlike SimpleRNN


EmbeddedTurn positive integers (indexes) into dense vectors of fixed size(nb_samples, sequence_length) --> (nb_samples, sequence_length, output_dim)input_dim, output_dim, init, input_length, W_regularizer, activity_regularizer, W_constraint, mask_zero, weights, dropout
BatchNormalizationat each batch, normalize activations of previous layer (mean:0, sd: 1)TN --> TNepsilon, mode, axis, momentum, weights, beta_init, gamma_init, gamma_regularizer, beta_regularizer


LeakyReLUReLU that allows a small gradient when unit is inactive: f(x) = alpha * x for x < 0, f(x) = x for x >= 0TN --> TNalpha
PReLUParametric ReLU - gradient is a learned array: f(x) = alphas * x for x < 0, f(x) = x for x >= 0TN --> TNinit, weights
ELUExponential Linear Unit: f(x) = alpha * (exp(x) - 1.) for x < 0, f(x) = x for x >= 0TN --> TNalpha
ParametricSoftplusalpha * log(1 + exp(beta * x))TN --> TNalpha, beta
ThresholdedReLUf(x) = x for x > theta f(x) = 0 otherwiseTN --> TNtheta
SReLUS-shaped ReLUTN --> TNt_left_init, a_left_init, t_right_init, a_right_init


GaussianNoisemitigate overfitting by smoothing: 0-centered Gaussian noise with standard deviation sigmaTN --> TNsigma
GaussianDropoutmitigate overfitting by smoothing: 0-centered Gaussian noise with standard deviation sqrt(p/(1-p))TN --> TNp


sequencepad_sequenceslist of nb_samplesscalar sequence --> 2D array of shape (nb_samples, nb_timesteps)sequences, maxlen, dtype
skipgramsword index list of int --> list of (word,word)sequence, vocabulary_size, window_size, negative_samples, shuffle, categorical, sampling_table
make_sampling_tablegenerate word index array of shape (size,) for skipgramssize, sampling_factor
Texttext_to_word_sequencesentence --> list of wordstext, filters, lower, split
one_hottext --> list of n word indexestext, n, filters, lower, split
Tokenizertext --> list of word indexesnb_words, filters, lower, split
imageImageDataGeneratorbatches of image tensorsfeaturewise_center, samplewise_center, featurewise_std_normalization, samplewise_std_normalization,zca_whitening, rotation_range,width_shift_range, height_shift_range,shear_range,zoom_range,channel_shift_range, fill_mode, cval, horizontal_flip, vertical_flip, rescale, dim_ordering

Objectives (Loss Functions)

  • mean_squared_error / mse
  • mean_absolute_error / mae
  • mean_absolute_percentage_error / mape
  • mean_squared_logarithmic_error / msle
  • squared_hinge
  • hinge
  • binary_crossentropy (logloss)
  • categorical_crossentropy (multiclass logloss) - requires labels be binary arrays of shape (nb_samples, nb_classes)
  • sparse_categorical_crossentropy As above but accepts sparse labels
  • kullback_leibler_divergence / kld Information gain from a predicted probability distribution Q to a true probability distribution P
  • poisson Mean of (predictions - targets * log(predictions))
  • cosine_proximity negative mean cosine proximity between predictions and targets


  • binary_accuracy - for binary classification
  • categorical_accuracy -for multiclass classification
  • sparse_categorical_accuracy
  • top_k_categorical_accuracy - when the target class is within the top-k predictions provided
  • mean_squared_error (mse) - for regression
  • mean_absolute_error (mae)
  • mean_absolute_percentage_error (mape)
  • mean_squared_logarithmic_error (msle)
  • hinge - hinge loss: `max(1 - y_true * y_pred, 0)``
  • squared_hinge hinge ^ 2
  • categorical_crossentropy - for multiclass classification
  • sparse_categorical_crossentropy
  • binary_crossentropy -for binary classification
  • kullback_leibler_divergence
  • poisson
  • cosine_proximity
  • matthews_correlation - for quality of binary classification
  • fbeta_score - weighted harmonic mean of precision and recall in multi-label classification


  • SGD - Stochastic gradient descent, with support for momentum, learning rate decay, and Nesterov momentum
  • RMSProp - good for RNNs
  • Adagrad
  • AdaDelta
  • AdaMax
  • Adam
  • Nadam

Activation Functions

  • softmax
  • softplus
  • softsign
  • relu
  • tanh
  • sigmoid
  • hard_sigmoid
  • linear


Callbackabstract base class - hooks: on_epoch_endon_batch_starton_batch_end
BaseLoggeraccumulates epoch averages of metrics being monitored
ProgbarLoggerwrites to stdout
Historyrecords events into a History object (automatic)
ModelCheckpointSave model after every epoch, according to monitored quantityfilepath, monitor, verbose, save_best_only, save_weights_only, mode
EarlyStoppingstop training when a monitored quantity has stopped improving after patiencemonitor, min_delta, patience, verbose, mode
RemoteMonitorstream events to a serverroot, path, field
TensorBoardwrite a log for TensorBaord to visualizelog_dir, histogram_freq, write_graph, write_images
ReduceLROnPlateauReduce learning rate when a metric has stopped improvingmonitor, factor, patience, verbose, mode, epsilon, cooldown, min_lr
CSVLoggerstream epoch results to a csv filefilename, separator, append
LambdaCallbackcustom callbackon_epoch_begin, on_epoch_end, on_batch_begin, on_batch_end, on_train_begin, on_train_end

Init Functions

  • uniform
  • lecun_uniform
  • identity
  • orthogonal
  • zero
  • glorot_normal - Gaussian initialization * **scaled by fan_in + fan_out
  • glorot_uniform
  • he_uniform



  • W_regularizer, b_regularizer (WeightRegularizer)
  • activity_regularizer (ActivityRegularizer)


  • l1 - LASSO
  • l2 - weight decay, Ridge
  • l1l2 - ElasticNet



  • W_constraint - for the main weights matrix
  • b_constraint for bias


  • maxnorm - maximum-norm
  • nonneg - non-negativity
  • unitnorm - unit-norm

Tuning Hyper-Parameters:

  • batch size
  • number of epochs
  • training optimization algorithm
  • Learning Weight
  • momentum
  • network weight initialization
  • activation function
  • dropout regularization
  • number of neurons in a hidden layer
  • depth of hidden layers
Posted on Friday, March 17, 2017 3:26 PM Artificial Intelligence , TensorFlow | Back to top

Comments on this post: Keras QuickRef

# Guest Post Sell Only good sites
Requesting Gravatar...

Can you post my article on your site so please feel to free contact me Thank you!
Left by alex on May 02, 2017 4:01 PM

# re: Keras QuickRef
Requesting Gravatar...
It has been wonderful information passport renewal online
Thank you
Left by Sumit smith on Jul 29, 2017 10:08 AM

Your comment:
 (will show your gravatar)

Copyright © JoshReuben | Powered by: