Spaces:

MCP-1st-Birthday
/

ML-Starter

Running

App Files Files Community

ML-Starter / knowledge_base /timeseries /timeseries_classification_transformer.py

emreatilgan

feat: Initialize mcp_server with embedding and loader modules

9ce984a 20 days ago

raw

history blame contribute delete

5 kB

	"""
	Title: Timeseries classification with a Transformer model
	Author: [Theodoros Ntakouris](https://github.com/ntakouris)
	Date created: 2021/06/25
	Last modified: 2021/08/05
	Description: This notebook demonstrates how to do timeseries classification using a Transformer model.
	Accelerator: GPU
	"""

	"""
	## Introduction

	This is the Transformer architecture from
	[Attention Is All You Need](https://arxiv.org/abs/1706.03762),
	applied to timeseries instead of natural language.

	This example requires TensorFlow 2.4 or higher.

	## Load the dataset

	We are going to use the same dataset and preprocessing as the
	[TimeSeries Classification from Scratch](https://keras.io/examples/timeseries/timeseries_classification_from_scratch)
	example.
	"""

	import numpy as np
	import keras
	from keras import layers


	def readucr(filename):
	data = np.loadtxt(filename, delimiter="\t")
	y = data[:, 0]
	x = data[:, 1:]
	return x, y.astype(int)


	root_url = "https://raw.githubusercontent.com/hfawaz/cd-diagram/master/FordA/"

	x_train, y_train = readucr(root_url + "FordA_TRAIN.tsv")
	x_test, y_test = readucr(root_url + "FordA_TEST.tsv")

	x_train = x_train.reshape((x_train.shape[0], x_train.shape[1], 1))
	x_test = x_test.reshape((x_test.shape[0], x_test.shape[1], 1))

	n_classes = len(np.unique(y_train))

	idx = np.random.permutation(len(x_train))
	x_train = x_train[idx]
	y_train = y_train[idx]

	y_train[y_train == -1] = 0
	y_test[y_test == -1] = 0

	"""
	## Build the model

	Our model processes a tensor of shape `(batch size, sequence length, features)`,
	where `sequence length` is the number of time steps and `features` is each input
	timeseries.

	You can replace your classification RNN layers with this one: the
	inputs are fully compatible!

	We include residual connections, layer normalization, and dropout.
	The resulting layer can be stacked multiple times.

	The projection layers are implemented through `keras.layers.Conv1D`.
	"""

	# This implementation applies Layer Normalization before the residual connection
	# to improve training stability by producing better-behaved gradients and often
	# eliminating the need for learning rate warm-up.


	def transformer_encoder(inputs, head_size, num_heads, ff_dim, dropout=0):
	# Attention and Normalization
	x = layers.MultiHeadAttention(
	key_dim=head_size, num_heads=num_heads, dropout=dropout
	)(inputs, inputs)
	x = layers.Dropout(dropout)(x)
	x = layers.LayerNormalization(epsilon=1e-6)(x)
	res = x + inputs

	# Feed Forward Part
	x = layers.Conv1D(filters=ff_dim, kernel_size=1, activation="relu")(res)
	x = layers.Dropout(dropout)(x)
	x = layers.Conv1D(filters=inputs.shape[-1], kernel_size=1)(x)
	x = layers.LayerNormalization(epsilon=1e-6)(x)
	return x + res


	"""
	The main part of our model is now complete. We can stack multiple of those
	`transformer_encoder` blocks and we can also proceed to add the final
	Multi-Layer Perceptron classification head. Apart from a stack of `Dense`
	layers, we need to reduce the output tensor of the `TransformerEncoder` part of
	our model down to a vector of features for each data point in the current
	batch. A common way to achieve this is to use a pooling layer. For
	this example, a `GlobalAveragePooling1D` layer is sufficient.
	"""


	def build_model(
	input_shape,
	head_size,
	num_heads,
	ff_dim,
	num_transformer_blocks,
	mlp_units,
	dropout=0,
	mlp_dropout=0,
	):
	inputs = keras.Input(shape=input_shape)
	x = inputs
	for _ in range(num_transformer_blocks):
	x = transformer_encoder(x, head_size, num_heads, ff_dim, dropout)

	x = layers.GlobalAveragePooling1D(data_format="channels_last")(x)
	for dim in mlp_units:
	x = layers.Dense(dim, activation="relu")(x)
	x = layers.Dropout(mlp_dropout)(x)
	outputs = layers.Dense(n_classes, activation="softmax")(x)
	return keras.Model(inputs, outputs)


	"""
	## Train and evaluate
	"""

	input_shape = x_train.shape[1:]

	model = build_model(
	input_shape,
	head_size=256,
	num_heads=4,
	ff_dim=4,
	num_transformer_blocks=4,
	mlp_units=[128],
	mlp_dropout=0.4,
	dropout=0.25,
	)

	model.compile(
	loss="sparse_categorical_crossentropy",
	optimizer=keras.optimizers.Adam(learning_rate=1e-4),
	metrics=["sparse_categorical_accuracy"],
	)
	model.summary()

	callbacks = [keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True)]

	model.fit(
	x_train,
	y_train,
	validation_split=0.2,
	epochs=150,
	batch_size=64,
	callbacks=callbacks,
	)

	model.evaluate(x_test, y_test, verbose=1)

	"""
	## Conclusions

	In about 110-120 epochs (25s each on Colab), the model reaches a training
	accuracy of ~0.95, validation accuracy of ~84 and a testing
	accuracy of ~85, without hyperparameter tuning. And that is for a model
	with less than 100k parameters. Of course, parameter count and accuracy could be
	improved by a hyperparameter search and a more sophisticated learning rate
	schedule, or a different optimizer.

	"""