# DeepWeeds on Vertex AI 
from [GCP codelab](https://codelabs.developers.google.com/vertex_notebook_executor#4)


The DeepWeeds dataset consists of 17,509 images capturing eight different weed species native to Australia. In this section, you'll write the code to preprocess the DeepWeeds dataset and build and train an image classification model using feature vectors downloaded from TensorFlow Hub.

In [3]:
import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow_hub as hub

In [4]:
data, info = tfds.load(name='deep_weeds', as_supervised=True, with_info=True)
NUM_CLASSES = info.features['label'].num_classes
DATASET_SIZE = info.splits['train'].num_examples

2022-01-12 16:45:46.873546: W tensorflow/core/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "Not found: Could not locate the credentials file.". Retrieving token from GCE failed with "Failed precondition: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Could not resolve host: metadata".


[1mDownloading and preparing dataset 469.32 MiB (download: 469.32 MiB, generated: 469.99 MiB, total: 939.31 MiB) to /Users/johnnydevriese/tensorflow_datasets/deep_weeds/3.0.0...[0m


Dl Completed...: 0 url [00:00, ? url/s]
Dl Completed...: 0%| | 0/1 [00:00[0;34m[0m
[0;32m----> 1[0;31m [0mdata[0m[0;34m,[0m [0minfo[0m [0;34m=[0m [0mtfds[0m[0;34m.[0m[0mload[0m[0;34m([0m[0mname[0m[0;34m=[0m[0;34m'deep_weeds'[0m[0;34m,[0m [0mas_supervised[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m [0mwith_info[0m[0;34m=[0m[0;32mTrue[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[1;32m 2[0m [0mNUM_CLASSES[0m [0;34m=[0m [0minfo[0m[0;34m.[0m[0mfeatures[0m[0;34m[[0m[0;34m'label'[0m[0;34m][0m[0;34m.[0m[0mnum_classes[0m[0;34m[0m[0;34m[0m[0m
[1;32m 3[0m [0mDATASET_SIZE[0m [0;34m=[0m [0minfo[0m[0;34m.[0m[0msplits[0m[0;34m[[0m[0;34m'train'[0m[0;34m][0m[0;34m.[0m[0mnum_examples[0m[0;34m[0m[0;34m[0m[0m
[0;32m~/miniforge3/envs/pytorch_m1/lib/python3.8/site-packages/tensorflow_datasets/core/load.py[0m in [0;36mload[0;34m(name, split, data_dir, batch_size, shuffle_files, download, as_supervised, decoders, read

In [None]:
def preprocess_data(image, label):
 image = tf.image.resize(image, (300,300))
 return tf.cast(image, tf.float32) / 255., label

In [None]:
# Create train/validation splits

# Shuffle dataset
dataset = data['train'].shuffle(1000)

train_split = 0.8
val_split = 0.2
train_size = int(train_split * DATASET_SIZE)
val_size = int(val_split * DATASET_SIZE)

train_data = dataset.take(train_size)
train_data = train_data.map(preprocess_data)
train_data = train_data.batch(64)

validation_data = dataset.skip(train_size)
validation_data = validation_data.map(preprocess_data)
validation_data = validation_data.batch(64)


In [None]:
feature_extractor_model = "inception_v3"

In [None]:
tf_hub_uri = f"https://tfhub.dev/google/imagenet/{feature_extractor_model}/feature_vector/5"


In [None]:
feature_extractor_layer = hub.KerasLayer(
 tf_hub_uri,
 trainable=False)


In [None]:
model = tf.keras.Sequential([
 feature_extractor_layer,
 tf.keras.layers.Dense(units=NUM_CLASSES)
])


In [None]:
model.compile(
 optimizer=tf.keras.optimizers.Adam(),
 loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
 metrics=['acc'])

model.fit(train_data, validation_data=validation_data, epochs=20)
