参考资料:Keras中文文档,Keras英文文档, Keras教程, W3CSCHOOL,Tensorflow中文社区,

  1. Sequential顺序模型

    • 定义:

      1
      2
      from keras.models import Sequential
      model = Sequential()
    • 使用add堆叠模型:

      1
      2
      3
      from keras.layers import Dense
      model.add(Dense(units=64, activation='relu', input_dim=100))
      model.add(Dense(units=10, activation='softmax'))
      • activation: 激活函数
      • kernel_regularizerbias_regularizer: 层的权重、偏差的正则化方法。
      • kernel_initializerbias_initializer: 层创建时,权值和偏差的初始化方法,默认为Glorot uniform
        1
        2
        3
        4
        5
        6
        7
        8
        9
        10
        11
        # A linear layer with L1 regularization of factor 0.01 applied to the kernel matrix:
        layers.Dense(64, kernel_regularizer=keras.regularizers.l1(0.01))

        # A linear layer with L2 regularization of factor 0.01 applied to the bias vector:
        layers.Dense(64, bias_regularizer=keras.regularizers.l2(0.01))

        # A linear layer with a kernel initialized to a random orthogonal matrix:
        layers.Dense(64, kernel_initializer='orthogonal')

        # A linear layer with a bias vector initialized to 2.0s:
        layers.Dense(64, bias_initializer=keras.initializers.constant(2.0))
    • 使用compile配置学习过程

      1
      2
      3
      model.compile(loss='categorical_crossentropy',
      optimizer='sgd',
      metrics=['accuracy'])
    • 配置优化器

      1
      2
      model.compile(loss=keras.losses.categorical_crossentropy,
      optimizer=keras.optimizers.SGD(lr=0.01, momentum=0.9, nesterov=True))
    • 训练数据

      1
      2
      # x_train 和 y_train 是 Numpy 数组
      model.fit(x_train, y_train, epochs=5, batch_size=32)
    • 评估模型

      1
      loss_and_metrics = model.evaluate(x_test, y_test, batch_size=128)
    • 预测

      1
      classes = model.predict(x_test, batch_size=128)
    • 更多实例

  2. Functional API函数式API

    • 可用于定义更复杂的模型。
    • 层可调用,返回值为一个tensor
    • 输入tensors和输出tensors被用来定义一个tf.keras.model实例
    • 训练方法与sequential一样

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      inputs = keras.Input(shape=(32,))  # Returns a placeholder tensor

      # A layer instance is callable on a tensor, and returns a tensor.
      x = keras.layers.Dense(64, activation='relu')(inputs)
      x = keras.layers.Dense(64, activation='relu')(x)
      predictions = keras.layers.Dense(10, activation='softmax')(x)

      # Instantiate the model given inputs and outputs.
      model = keras.Model(inputs=inputs, outputs=predictions)

      # The compile step specifies the training configuration.
      model.compile(optimizer=tf.train.RMSPropOptimizer(0.001),
      loss='categorical_crossentropy',
      metrics=['accuracy'])

      # Trains for 5 epochs
      model.fit(data, labels, batch_size=32, epochs=5)
    • 更多实例

  3. 模型的方法和属性

    • model.layers 是包含模型网络层的展平列表。
    • model.inputs 是模型输入张量的列表。
    • model.outputs 是模型输出张量的列表。
    • model.summary() 打印出模型概述信息。 它是 utils.print_summary 的简捷调用。
    • model.get_config() 返回包含模型配置信息的字典。通过以下代码,就可以根据这些配置信息重新实例化模型:
    • model.get_weights() 返回模型中所有权重张量的列表,类型为 Numpy 数组。
    • model.set_weights(weights) 从 Numpy 数组中为模型设置权重。列表中的数组必须与get_weights()返回的权重具有相同的尺寸。
    • model.save_weights(filepath) 将模型权重存储为 HDF5 文件。
    • model.load_weights(filepath, by_name=False): 从 HDF5 文件(由 save_weights 创建)中加载权重。默认情况下,模型的结构应该是不变的。 如果想将权重载入不同的模型(部分层相同), 设置 by_name=True 来载入那些名字相同的层的权重。
  4. Model类继承
    通过编写tf.keras.Model的子类来构建一个自定义模型。在init方法里创建 layers。在call方法里定义前向传播过程。在call中,你可以指定自定义的损失函数,通过调用self.add_loss(loss_tensor)

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    import keras

    class SimpleMLP(keras.Model):

    def __init__(self, use_bn=False, use_dp=False, num_classes=10):
    super(SimpleMLP, self).__init__(name='mlp')
    self.use_bn = use_bn
    self.use_dp = use_dp
    self.num_classes = num_classes

    self.dense1 = keras.layers.Dense(32, activation='relu')
    self.dense2 = keras.layers.Dense(num_classes, activation='softmax')
    if self.use_dp:
    self.dp = keras.layers.Dropout(0.5)
    if self.use_bn:
    self.bn = keras.layers.BatchNormalization(axis=-1)

    def call(self, inputs):
    x = self.dense1(inputs)
    if self.use_dp:
    x = self.dp(x)
    if self.use_bn:
    x = self.bn(x)
    return self.dense2(x)

    model = SimpleMLP()
    model.compile(...)
    model.fit(...)

    在类继承模型中,模型的拓扑结构是由 Python 代码定义的(而不是网络层的静态图)。这意味着该模型的拓扑结构不能被检查或序列化。因此,以下方法和属性不适用于类继承模型:

    • model.inputsmodel.outputs
    • model.to_yaml()model.to_json()
    • model.get_config()model.save()
  5. tf.keras.layers

    • Input: 定义模型的输入
    • Embedding: 定义嵌入层[参考]
      • Keras提供了一个嵌入层,适用于文本数据的神经网络。
      • 它要求输入数据是整数编码的,所以每个字都用一个唯一的整数表示。这个数据准备步骤可以使用Keras提供的Tokenizer API来执行。
      • 嵌入层用随机权重进行初始化,并将学习训练数据集中所有单词的嵌入。
      • e = Embedding(input_dim=200, output_dim=32, input_length=50)
    • add: 将两个输出加和
    • Concatenate: 链接两个张量
    • dot
  6. 自定义layer
    可以通过编写tf.keras.layers.Layer的子类来创建一个自定义layer,该子类编写过程中需要编写下面的方法:

    • build:创建层的参数。通过add_weight来添加权值
    • call:定义前向传播过程。
    • compute_output_shape:指定怎么根据输入去计算layer的输出shape
    • layer可以通过get_config方法和from_config方法实现串行。
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      class MyLayer(keras.layers.Layer):

      def __init__(self, output_dim, **kwargs):
      self.output_dim = output_dim
      super(MyLayer, self).__init__(**kwargs)

      def build(self, input_shape):
      shape = tf.TensorShape((input_shape[1], self.output_dim))
      # Create a trainable weight variable for this layer.
      self.kernel = self.add_weight(name='kernel',
      shape=shape,
      initializer='uniform',
      trainable=True)
      # Be sure to call this at the end
      super(MyLayer, self).build(input_shape)

      def call(self, inputs):
      return tf.matmul(inputs, self.kernel)

      def compute_output_shape(self, input_shape):
      shape = tf.TensorShape(input_shape).as_list()
      shape[-1] = self.output_dim
      return tf.TensorShape(shape)

      def get_config(self):
      base_config = super(MyLayer, self).get_config()
      base_config['output_dim'] = self.output_dim

      @classmethod
      def from_config(cls, config):
      return cls(**config)


      # Create a model using the custom layer
      model = keras.Sequential([MyLayer(10),
      keras.layers.Activation('softmax')])

      # The compile step specifies the training configuration
      model.compile(optimizer=tf.train.RMSPropOptimizer(0.001),
      loss='categorical_crossentropy',
      metrics=['accuracy'])

      # Trains for 5 epochs.
      model.fit(data, targets, batch_size=32, epochs=5)
  7. LSTM

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    from keras.layers import Input, Embedding, LSTM, Dense
    from keras.models import Model

    # 标题输入:接收一个含有 100 个整数的序列,每个整数在 1 到 10000 之间。
    # 注意我们可以通过传递一个 "name" 参数来命名任何层。
    main_input = Input(shape=(100,), dtype='int32', name='main_input')

    # Embedding 层将输入序列编码为一个稠密向量的序列,
    # 每个向量维度为 512。
    x = Embedding(output_dim=512, input_dim=10000, input_length=100)(main_input)

    # LSTM 层把向量序列转换成单个向量,
    # 它包含整个序列的上下文信息
    lstm_out = LSTM(32)(x)

    # 插入辅助损失
    auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(lstm_out)

    auxiliary_input = Input(shape=(5,), name='aux_input')
    x = keras.layers.concatenate([lstm_out, auxiliary_input])

    # 堆叠多个全连接网络层
    x = Dense(64, activation='relu')(x)
    x = Dense(64, activation='relu')(x)
    x = Dense(64, activation='relu')(x)

    # 最后添加主要的逻辑回归层
    main_output = Dense(1, activation='sigmoid', name='main_output')(x)

    # 定义
    model = Model(inputs=[main_input, auxiliary_input], outputs=[main_output, auxiliary_output])

    # 编译
    model.compile(optimizer='rmsprop', loss='binary_crossentropy',
    loss_weights=[1., 0.2])

    # 传递输入数组和目标数组
    model.fit([headline_data, additional_data], [labels, labels],
    epochs=50, batch_size=32)

    # 根据 name 参数编译
    model.compile(optimizer='rmsprop',
    loss={'main_output': 'binary_crossentropy', 'aux_output': 'binary_crossentropy'},
    loss_weights={'main_output': 1., 'aux_output': 0.2})

    # 然后使用以下方式训练:
    model.fit({'main_input': headline_data, 'aux_input': additional_data},
    {'main_output': labels, 'aux_output': labels},
    epochs=50, batch_size=32)
  8. tokenizer与embedding

  9. GELU

    1
    2
    3
    4
    5
    6
    7
    from keras.layers import Activation
    from keras.utils.generic_utils import get_custom_objects

    def custom_gelu(x):
    return 0.5 * x * (1 + tf.tanh(tf.sqrt(2 / np.pi) * (x + 0.044715 * tf.pow(x, 3))))
    get_custom_objects().update({'custom_gelu': Activation(custom_gelu)})
    fit1.add(Dense(output_dim=1, activation=custom_gelu))
  10. RNN (LSTM, GRU) 模型

    • lstm1, lstm_h, lstm_c = LSTM(hideen_size, return_sequences=True, return_state=True)(input)
      返回lstm的每层隐状态lstm1,最后输出lstm_h,最后的单元状态lstm_c
  11. Bidirenctial layer

    • lstm_out = Bidirectional(LSTM(10, return_sequences=True)(input))
    • 也可以分开写

      1
      2
      3
      forward_layer = LSTM(10, return_sequences=True)(input)
      backward_layer = LSTM(10, activation='relu', return_sequences=True,
      go_backwards=True)(input)
    • 使用return_state

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      encoder_inputs = Input(shape=(None, num_encoder_tokens))
      encoder = Bidirectional(LSTM(latent_dim, return_state=True))
      encoder_outputs, forward_h, forward_c, backward_h, backward_c = encoder(encoder_inputs)

      state_h = Concatenate()([forward_h, backward_h])
      state_c = Concatenate()([forward_c, backward_c])

      encoder_states = [state_h, state_c]

      decoder_inputs = Input(shape=(None, num_decoder_tokens))
      decoder_lstm = LSTM(latent_dim * 2, return_sequences=True, return_state=True)
      decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
  12. mask 使用方法

    • Masking layer:

      1
      2
      mask = keras.layers.Masking(mask_value= 0, input_shape=(time_step,feature_size))(input)
      lstm_output = keras.layers.LSTM(hidden_size, return_sequences= True)(mask)
    • Embedding layer:

      1
      2
      embed = keras.layers.Embedding(vocab_size, embedding_size, mask_zero= True)(input)
      lstm_output = keras.layers.LSTM(hidden_size, return_sequences= True)(emded)
  13. optimizer

    • tf.keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
      • decay: 学习率衰减
    • Learning rate

      • tf 2.0: tf.keras.optimizers.schedules.LearningRateSchedule

        1
        2
        3
        4
        5
        6
        7
        8
        9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        class CustomSchedule(tf.keras.optimizers.schedules.LearningRateSchedule):
        def __init__(self, d_model, warmup_steps=4000):
        super(CustomSchedule, self).__init__()

        self.d_model = d_model
        self.d_model = tf.cast(self.d_model, tf.float32)

        self.warmup_steps = warmup_steps

        def __call__(self, step):
        arg1 = tf.math.rsqrt(step)
        arg2 = step * (self.warmup_steps ** -1.5)

        return tf.math.rsqrt(self.d_model) * tf.math.minimum(arg1, arg2)

        learning_rate = CustomSchedule(200)
        custom_adam = tf.keras.optimizers.Adam(learning_rate, beta_1=0.9, beta_2=0.98,
        epsilon=1e-9)
      • keras: keras.callbacks.LearningRateScheduler(schedule)

        1
        2
        3
        4
        5
        6
        7
        8
        9
        10
        11
        12
        13
        import keras.backend as K
        from keras.callbacks import LearningRateScheduler

        def scheduler(epoch):
        # 每隔100个epoch,学习率减小为原来的1/10
        if epoch % 100 == 0 and epoch != 0:
        lr = K.get_value(model.optimizer.lr)
        K.set_value(model.optimizer.lr, lr * 0.1)
        print("lr changed to {}".format(lr * 0.1))
        return K.get_value(model.optimizer.lr)

        reduce_lr = LearningRateScheduler(scheduler)
        model.fit(train_x, train_y, batch_size=32, epochs=300, callbacks=[reduce_lr])
    • Reduce LR On Plateau
      keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=10, verbose=0, mode='auto', epsilon=0.0001, cooldown=0, min_lr=0)

      1
      2
      3
      from keras.callbacks import ReduceLROnPlateau
      reduce_lr = ReduceLROnPlateau(monitor='val_loss', patience=10, mode='auto')
      model.fit(train_x, train_y, batch_size=32, epochs=300, validation_split=0.1, callbacks=[reduce_lr])