Neural machine translation with attention  |  Text  |  TensorFlow (2023)

View on TensorFlow.org Neural machine translation with attention | Text | TensorFlow (2) Run in Google Colab Neural machine translation with attention | Text | TensorFlow (3) View source on GitHub Neural machine translation with attention | Text | TensorFlow (4)Download notebook

This tutorial demonstrates how to train a sequence-to-sequence (seq2seq) model for Spanish-to-English translation roughly based on Effective Approaches to Attention-based Neural Machine Translation (Luong et al., 2015).

Neural machine translation with attention | Text | TensorFlow (5)
This tutorial: An encoder/decoder connected by attention.

While this architecture is somewhat outdated, it is still a very useful project to work through to get a deeper understanding of sequence-to-sequence models and attention mechanisms (before going on to Transformers).

This example assumes some knowledge of TensorFlow fundamentals below the level of a Keras layer:

  • Working with tensors directly
  • Writing custom keras.Models and keras.layers

After training the model in this notebook, you will be able to input a Spanish sentence, such as "¿todavia estan en casa?", and return the English translation: "are you still at home?"

The resulting model is exportable as a tf.saved_model, so it can be used in other TensorFlow environments.

The translation quality is reasonable for a toy example, but the generated attention plot is perhaps more interesting. This shows which parts of the input sentence has the model's attention while translating:

Neural machine translation with attention | Text | TensorFlow (6)

Setup

pip install "tensorflow-text>=2.10"pip install einops
import numpy as npimport typingfrom typing import Any, Tupleimport einopsimport matplotlib.pyplot as pltimport matplotlib.ticker as tickerimport tensorflow as tfimport tensorflow_text as tf_text
2022-10-07 12:31:47.174384: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered2022-10-07 12:31:47.774589: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvrtc.so.11.1: cannot open shared object file: No such file or directory2022-10-07 12:31:47.774811: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvrtc.so.11.1: cannot open shared object file: No such file or directory2022-10-07 12:31:47.774822: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

This tutorial uses a lot of low level API's where it's easy to get shapes wrong. This class is used to check shapes throughout the tutorial.

class ShapeChecker(): def __init__(self): # Keep a cache of every axis-name seen self.shapes = {} def __call__(self, tensor, names, broadcast=False): if not tf.executing_eagerly(): return parsed = einops.parse_shape(tensor, names) for name, new_dim in parsed.items(): old_dim = self.shapes.get(name, None) if (broadcast and new_dim == 1): continue if old_dim is None: # If the axis name is new, add its length to the cache. self.shapes[name] = new_dim continue if new_dim != old_dim: raise ValueError(f"Shape mismatch for dimension: '{name}'\n" f" found: {new_dim}\n" f" expected: {old_dim}\n")

The data

The tutorial uses a language dataset provided by Anki. This dataset contains language translation pairs in the format:

May I borrow this book? ¿Puedo tomar prestado este libro?

They have a variety of languages available, but this example uses the English-Spanish dataset.

Download and prepare the dataset

For convenience, a copy of this dataset is hosted on Google Cloud, but you can also download your own copy. After downloading the dataset, here are the steps you need to take to prepare the data:

  1. Add a start and end token to each sentence.
  2. Clean the sentences by removing special characters.
  3. Create a word index and reverse word index (dictionaries mapping from word → id and id → word).
  4. Pad each sentence to a maximum length.
# Download the fileimport pathlibpath_to_zip = tf.keras.utils.get_file( 'spa-eng.zip', origin='http://storage.googleapis.com/download.tensorflow.org/data/spa-eng.zip', extract=True)path_to_file = pathlib.Path(path_to_zip).parent/'spa-eng/spa.txt'
Downloading data from http://storage.googleapis.com/download.tensorflow.org/data/spa-eng.zip2638744/2638744 [==============================] - 0s 0us/step
def load_data(path): text = path.read_text(encoding='utf-8') lines = text.splitlines() pairs = [line.split('\t') for line in lines] context = np.array([context for target, context in pairs]) target = np.array([target for target, context in pairs]) return target, context
target_raw, context_raw = load_data(path_to_file)print(context_raw[-1])
Si quieres sonar como un hablante nativo, debes estar dispuesto a practicar diciendo la misma frase una y otra vez de la misma manera en que un músico de banjo practica el mismo fraseo una y otra vez hasta que lo puedan tocar correctamente y en el tiempo esperado.
print(target_raw[-1])
If you want to sound like a native speaker, you must be willing to practice saying the same sentence over and over in the same way that banjo players practice the same phrase over and over until they can play it correctly and at the desired tempo.

Create a tf.data dataset

From these arrays of strings you can create a tf.data.Dataset of strings that shuffles and batches them efficiently:

BUFFER_SIZE = len(context_raw)BATCH_SIZE = 64is_train = np.random.uniform(size=(len(target_raw),)) < 0.8train_raw = ( tf.data.Dataset .from_tensor_slices((context_raw[is_train], target_raw[is_train])) .shuffle(BUFFER_SIZE) .batch(BATCH_SIZE))val_raw = ( tf.data.Dataset .from_tensor_slices((context_raw[~is_train], target_raw[~is_train])) .shuffle(BUFFER_SIZE) .batch(BATCH_SIZE))
for example_context_strings, example_target_strings in train_raw.take(1): print(example_context_strings[:5]) print() print(example_target_strings[:5]) break
tf.Tensor([b'A m\xc3\xad no me mires.' b'Hay que cambiar la chapa de la puerta.' b'No pod\xc3\xada recordar el nombre de aquella canci\xc3\xb3n.' b'\xc2\xbfPod\xc3\xa9s pensar en otra cosa?' b'Mi padre siempre est\xc3\xa1 ocupado.'], shape=(5,), dtype=string)tf.Tensor([b"It's not my fault." b'You have to change the lock on the door.' b"I couldn't remember the title of that song." b'Can you think of anything else?' b'My father is always busy.'], shape=(5,), dtype=string)

Text preprocessing

One of the goals of this tutorial is to build a model that can be exported as a tf.saved_model. To make that exported model useful it should take tf.string inputs, and return tf.string outputs: All the text processing happens inside the model. Mainly using a layers.TextVectorization layer.

Standardization

The model is dealing with multilingual text with a limited vocabulary. So it will be important to standardize the input text.

The first step is Unicode normalization to split accented characters and replace compatibility characters with their ASCII equivalents.

The tensorflow_text package contains a unicode normalize operation:

example_text = tf.constant('¿Todavía está en casa?')print(example_text.numpy())print(tf_text.normalize_utf8(example_text, 'NFKD').numpy())
b'\xc2\xbfTodav\xc3\xada est\xc3\xa1 en casa?'b'\xc2\xbfTodavi\xcc\x81a esta\xcc\x81 en casa?'

Unicode normalization will be the first step in the text standardization function:

(Video) Lecture 10: Neural Machine Translation and Models with Attention

def tf_lower_and_split_punct(text): # Split accented characters. text = tf_text.normalize_utf8(text, 'NFKD') text = tf.strings.lower(text) # Keep space, a to z, and select punctuation. text = tf.strings.regex_replace(text, '[^ a-z.?!,¿]', '') # Add spaces around punctuation. text = tf.strings.regex_replace(text, '[.?!,¿]', r' \0 ') # Strip whitespace. text = tf.strings.strip(text) text = tf.strings.join(['[START]', text, '[END]'], separator=' ') return text
print(example_text.numpy().decode())print(tf_lower_and_split_punct(example_text).numpy().decode())
¿Todavía está en casa?[START] ¿ todavia esta en casa ? [END]

Text Vectorization

This standardization function will be wrapped up in a tf.keras.layers.TextVectorization layer which will handle the vocabulary extraction and conversion of input text to sequences of tokens.

max_vocab_size = 5000context_text_processor = tf.keras.layers.TextVectorization( standardize=tf_lower_and_split_punct, max_tokens=max_vocab_size, ragged=True)

The TextVectorization layer and many other Keras preprocessing layers have an adapt method. This method reads one epoch of the training data, and works a lot like Model.fit. This adapt method initializes the layer based on the data. Here it determines the vocabulary:

context_text_processor.adapt(train_raw.map(lambda context, target: context))# Here are the first 10 words from the vocabulary:context_text_processor.get_vocabulary()[:10]
['', '[UNK]', '[START]', '[END]', '.', 'que', 'de', 'el', 'a', 'no']

That's the Spanish TextVectorization layer, now build and .adapt() the English one:

target_text_processor = tf.keras.layers.TextVectorization( standardize=tf_lower_and_split_punct, max_tokens=max_vocab_size, ragged=True)target_text_processor.adapt(train_raw.map(lambda context, target: target))target_text_processor.get_vocabulary()[:10]
['', '[UNK]', '[START]', '[END]', '.', 'the', 'i', 'to', 'you', 'tom']

Now these layers can convert a batch of strings into a batch of token IDs:

example_tokens = context_text_processor(example_context_strings)example_tokens[:3, :]
<tf.RaggedTensor [[2, 8, 24, 9, 18, 2635, 4, 3], [2, 60, 5, 823, 11, 1, 6, 11, 181, 4, 3], [2, 9, 254, 1053, 7, 237, 6, 934, 457, 4, 3]]>

The get_vocabulary method can be used to convert token IDs back to text:

context_vocab = np.array(context_text_processor.get_vocabulary())tokens = context_vocab[example_tokens[0].numpy()]' '.join(tokens)
'[START] a mi no me mires . [END]'

The returned token IDs are zero-padded. This can easily be turned into a mask:

plt.subplot(1, 2, 1)plt.pcolormesh(example_tokens.to_tensor())plt.title('Token IDs')plt.subplot(1, 2, 2)plt.pcolormesh(example_tokens.to_tensor() != 0)plt.title('Mask')
Text(0.5, 1.0, 'Mask')

Neural machine translation with attention | Text | TensorFlow (7)

Process the dataset

The process_text function below converts the Datasets of strings, into 0-padded tensors of token IDs. It also converts from a (context, target) pair to an ((context, target_in), target_out) pair for training with keras.Model.fit. Keras expects (inputs, labels) pairs, the inputs are the (context, target_in) and the labels are target_out. The difference between target_in and target_out is that they are shifted by one step relative to eachother, so that at each locaton the label is the next token.

def process_text(context, target): context = context_text_processor(context).to_tensor() target = target_text_processor(target) targ_in = target[:,:-1].to_tensor() targ_out = target[:,1:].to_tensor() return (context, targ_in), targ_outtrain_ds = train_raw.map(process_text, tf.data.AUTOTUNE)val_ds = val_raw.map(process_text, tf.data.AUTOTUNE)

Here is the first sequence of each, from the first batch:

for (ex_context_tok, ex_tar_in), ex_tar_out in train_ds.take(1): print(ex_context_tok[0, :10].numpy()) print() print(ex_tar_in[0, :10].numpy()) print(ex_tar_out[0, :10].numpy())
[ 2 39 9 48 259 1114 4 3 0 0][ 2 6 27 23 10 763 329 4 0 0][ 6 27 23 10 763 329 4 3 0 0]

The encoder/decoder

The following diagrams shows an overview of the model. In both the encoder is on the left, the decoder is on the right. At each time-step the decoder's output is combined with the encoder's output, to predict the next word.

The original [left] contains a few extra connections that are intentionally omitted from this tutorial's model [right], as they are generally unnecessary, and dificuly to implement. Those missing connections are:

  1. Feeding the state from the encoder's RNN to the decoder's RNN
  2. Feeding the attention output back to the RNN's input.
Neural machine translation with attention | Text | TensorFlow (8) Neural machine translation with attention | Text | TensorFlow (9)
The original from Effective Approaches to Attention-based Neural Machine Translation This tutorial's model

Before getting into it define constants for the model:

UNITS = 256

The encoder

The goal of the encoder is to process the context sequence into a sequence of vectors that are useful for the decoder as it attempts to predict the next output for each timestep. Since the context sequence is constant, there is no restriction on how information can flow in the encoder, so use a bidirectional-RNN to do the processing:

Neural machine translation with attention | Text | TensorFlow (10)
A bidirectional RNN

The encoder:

  1. Takes a list of token IDs (from context_text_processor).
  2. Looks up an embedding vector for each token (Using a layers.Embedding).
  3. Processes the embeddings into a new sequence (Using a bidirectional layers.GRU).
  4. Returns the processed sequence. This will be passed to the attention head.
class Encoder(tf.keras.layers.Layer): def __init__(self, text_processor, units): super(Encoder, self).__init__() self.text_processor = text_processor self.vocab_size = text_processor.vocabulary_size() self.units = units # The embedding layer converts tokens to vectors self.embedding = tf.keras.layers.Embedding(self.vocab_size, units, mask_zero=True) # The RNN layer processes those vectors sequentially. self.rnn = tf.keras.layers.Bidirectional( merge_mode='sum', layer=tf.keras.layers.GRU(units, # Return the sequence and state return_sequences=True, recurrent_initializer='glorot_uniform')) def call(self, x): shape_checker = ShapeChecker() shape_checker(x, 'batch s') # 2. The embedding layer looks up the embedding vector for each token. x = self.embedding(x) shape_checker(x, 'batch s units') # 3. The GRU processes the sequence of embeddings. x = self.rnn(x) shape_checker(x, 'batch s units') # 4. Returns the new sequence of embeddings. return x def convert_input(self, texts): texts = tf.convert_to_tensor(texts) if len(texts.shape) == 0: texts = tf.convert_to_tensor(texts)[tf.newaxis] context = self.text_processor(texts).to_tensor() context = self(context) return context

Try it out:

# Encode the input sequence.encoder = Encoder(context_text_processor, UNITS)ex_context = encoder(ex_context_tok)print(f'Context tokens, shape (batch, s): {ex_context_tok.shape}')print(f'Encoder output, shape (batch, s, units): {ex_context.shape}')
Context tokens, shape (batch, s)&colon; (64, 19)Encoder output, shape (batch, s, units)&colon; (64, 19, 256)

The attention layer

The attention layer lets the decoder access the information extracted by the encoder. It computes a vector from the entire context sequence, and adds that to the decoder's output.

The simplest way you could calculate a single vector from the entire sequence would be to take the average across the sequence (layers.GlobalAveragePooling1D). An attention layer is similar, but calculates a weighted average across over the context sequence. Where the weights are calculated from the combination of context and "query" vectors.

(Video) Tensorflow neural machine translation with attention (Practice Part)

Neural machine translation with attention | Text | TensorFlow (11)
The attention layer
class CrossAttention(tf.keras.layers.Layer): def __init__(self, units, **kwargs): super().__init__() self.mha = tf.keras.layers.MultiHeadAttention(key_dim=units, num_heads=1, **kwargs) self.layernorm = tf.keras.layers.LayerNormalization() self.add = tf.keras.layers.Add() def call(self, x, context): shape_checker = ShapeChecker() shape_checker(x, 'batch t units') shape_checker(context, 'batch s units') attn_output, attn_scores = self.mha( query=x, value=context, return_attention_scores=True) shape_checker(x, 'batch t units') shape_checker(attn_scores, 'batch heads t s') # Cache the attention scores for plotting later. attn_scores = tf.reduce_mean(attn_scores, axis=1) shape_checker(attn_scores, 'batch t s') self.last_attention_weights = attn_scores x = self.add([x, attn_output]) x = self.layernorm(x) return x
attention_layer = CrossAttention(UNITS)# Attend to the encoded tokensembed = tf.keras.layers.Embedding(target_text_processor.vocabulary_size(), output_dim=UNITS, mask_zero=True)ex_tar_embed = embed(ex_tar_in)result = attention_layer(ex_tar_embed, ex_context)print(f'Context sequence, shape (batch, s, units): {ex_context.shape}')print(f'Target sequence, shape (batch, t, units): {ex_tar_embed.shape}')print(f'Attention result, shape (batch, t, units): {result.shape}')print(f'Attention weights, shape (batch, t, s): {attention_layer.last_attention_weights.shape}')
Context sequence, shape (batch, s, units)&colon; (64, 19, 256)Target sequence, shape (batch, t, units)&colon; (64, 18, 256)Attention result, shape (batch, t, units)&colon; (64, 18, 256)Attention weights, shape (batch, t, s)&colon; (64, 18, 19)

The attention weights will sum to 1 over the context sequence, at each location in the target sequence.

attention_layer.last_attention_weights[0].numpy().sum(axis=-1)
array([1. , 1. , 1. , 1. , 1. , 1. , 1. , 1. , 0.99999994, 0.99999994, 0.99999994, 0.99999994, 0.99999994, 0.99999994, 0.99999994, 0.99999994, 0.99999994, 0.99999994], dtype=float32)

Here are the attention weights across the context sequences at t=0:

attention_weights = attention_layer.last_attention_weightsmask=(ex_context_tok != 0).numpy()plt.subplot(1, 2, 1)plt.pcolormesh(mask*attention_weights[:, 0, :])plt.title('Attention weights')plt.subplot(1, 2, 2)plt.pcolormesh(mask)plt.title('Mask');

Neural machine translation with attention | Text | TensorFlow (12)

Because of the small-random initialization the attention weights are initially all close to 1/(sequence_length). The model will learn to make these less uniform as training progresses.

The decoder

The decoder's job is to generate predictions for the next token at each location in the target sequence.

  1. It looks up embeddings for each token in the target sequence.
  2. It uses an RNN to process the target sequence, and keep track of what it has generated so far.
  3. It uses RNN output as the "query" to the attention layer, when attending to the encoder's output.
  4. At each location in the output it predicts the next token.

When training, the model predicts the next word at each location. So it's important that the information only flows in one direction through the model. The decoder uses a unidirectional (not bidirectional) RNN to process the target sequence.

When running inference with this model it produces one word at a time, and those are fed back into the model.

Neural machine translation with attention | Text | TensorFlow (13)
A unidirectional RNN

Here is the Decoder class' initializer. The initializer creates all the necessary layers.

class Decoder(tf.keras.layers.Layer): @classmethod def add_method(cls, fun): setattr(cls, fun.__name__, fun) return fun def __init__(self, text_processor, units): super(Decoder, self).__init__() self.text_processor = text_processor self.vocab_size = text_processor.vocabulary_size() self.word_to_id = tf.keras.layers.StringLookup( vocabulary=text_processor.get_vocabulary(), mask_token='', oov_token='[UNK]') self.id_to_word = tf.keras.layers.StringLookup( vocabulary=text_processor.get_vocabulary(), mask_token='', oov_token='[UNK]', invert=True) self.start_token = self.word_to_id('[START]') self.end_token = self.word_to_id('[END]') self.units = units # 1. The embedding layer converts token IDs to vectors self.embedding = tf.keras.layers.Embedding(self.vocab_size, units, mask_zero=True) # 2. The RNN keeps track of what's been generated so far. self.rnn = tf.keras.layers.GRU(units, return_sequences=True, return_state=True, recurrent_initializer='glorot_uniform') # 3. The RNN output will be the query for the attention layer. self.attention = CrossAttention(units) # 4. This fully connected layer produces the logits for each # output token. self.output_layer = tf.keras.layers.Dense(self.vocab_size)

Training

Next, the call method, takes 3 arguments:

  • inputs - a context, x pair where:
    • context - is the context from the encoder's output.
    • x - is the target sequence input.
  • state - Optional, the previous state output from the decoder (the internal state of the decoder's RNN). Pass the state from a previous run to continue generating text where you left off.
  • return_state - [Default: False] - Set this to True to return the RNN state.
@Decoder.add_methoddef call(self, context, x, state=None, return_state=False): shape_checker = ShapeChecker() shape_checker(x, 'batch t') shape_checker(context, 'batch s units') # 1. Lookup the embeddings x = self.embedding(x) shape_checker(x, 'batch t units') # 2. Process the target sequence. x, state = self.rnn(x, initial_state=state) shape_checker(x, 'batch t units') # 3. Use the RNN output as the query for the attention over the context. x = self.attention(x, context) self.last_attention_weights = self.attention.last_attention_weights shape_checker(x, 'batch t units') shape_checker(self.last_attention_weights, 'batch t s') # Step 4. Generate logit predictions for the next token. logits = self.output_layer(x) shape_checker(logits, 'batch t target_vocab_size') if return_state: return logits, state else: return logits

That will be sufficient for training. Create an instance of the decoder to test out:

decoder = Decoder(target_text_processor, UNITS)

In training you'll use the decoder like this:

Given the context and target tokens, for each taget token it predicts the next target token.

logits = decoder(ex_context, ex_tar_in)print(f'encoder output shape: (batch, s, units) {ex_context.shape}')print(f'input target tokens shape: (batch, t) {ex_tar_in.shape}')print(f'logits shape shape: (batch, target_vocabulary_size) {logits.shape}')
encoder output shape&colon; (batch, s, units) (64, 19, 256)input target tokens shape&colon; (batch, t) (64, 18)logits shape shape&colon; (batch, target_vocabulary_size) (64, 18, 5000)

Inference

To use it for inference you'll need a couple more methods:

@Decoder.add_methoddef get_initial_state(self, context): batch_size = tf.shape(context)[0] start_tokens = tf.fill([batch_size, 1], self.start_token) done = tf.zeros([batch_size, 1], dtype=tf.bool) embedded = self.embedding(start_tokens) return start_tokens, done, self.rnn.get_initial_state(embedded)[0]
@Decoder.add_methoddef tokens_to_text(self, tokens): words = self.id_to_word(tokens) result = tf.strings.reduce_join(words, axis=-1, separator=' ') result = tf.strings.regex_replace(result, '^ *\[START\] *', '') result = tf.strings.regex_replace(result, ' *\[END\] *$', '') return result
@Decoder.add_methoddef get_next_token(self, context, next_token, done, state, temperature = 0.0): logits, state = self( context, next_token, state = state, return_state=True) if temperature == 0.0: next_token = tf.argmax(logits, axis=-1) else: logits = logits[:, -1, :]/temperature next_token = tf.random.categorical(logits, num_samples=1) # If a sequence produces an `end_token`, set it `done` done = done | (next_token == self.end_token) # Once a sequence is done it only produces 0-padding. next_token = tf.where(done, tf.constant(0, dtype=tf.int64), next_token) return next_token, done, state

With those extra functions, you can write a generation loop:

# Setup the loop variables.next_token, done, state = decoder.get_initial_state(ex_context)tokens = []for n in range(10): # Run one step. next_token, done, state = decoder.get_next_token( ex_context, next_token, done, state, temperature=1.0) # Add the token to the output. tokens.append(next_token)# Stack all the tokens together.tokens = tf.concat(tokens, axis=-1) # (batch, t)# Convert the tokens back to a a stringresult = decoder.tokens_to_text(tokens)result[:3].numpy()
array([b'target entered president shown creep wed sixthirty soul pocket farmer', b'ive fan osaka native songs factories confused fierce hurry nephew', b'wondering remain humiliating impression slightly time cards gold also tortured'], dtype=object)

Since the model's untrained, it outputs items from the vocabulary almost uniformly at random.

The model

Now that you have all the model components, combine them to build the model for training:

class Translator(tf.keras.Model): @classmethod def add_method(cls, fun): setattr(cls, fun.__name__, fun) return fun def __init__(self, units, context_text_processor, target_text_processor): super().__init__() # Build the encoder and decoder encoder = Encoder(context_text_processor, units) decoder = Decoder(target_text_processor, units) self.encoder = encoder self.decoder = decoder def call(self, inputs): context, x = inputs context = self.encoder(context) logits = self.decoder(context, x) #TODO(b/250038731): remove this try: # Delete the keras mask, so keras doesn't scale the loss+accuracy. del logits._keras_mask except AttributeError: pass return logits

During training the model will be used like this:

(Video) How Google Translate Works - The Machine Learning Algorithm Explained!

model = Translator(UNITS, context_text_processor, target_text_processor)logits = model((ex_context_tok, ex_tar_in))print(f'Context tokens, shape: (batch, s, units) {ex_context_tok.shape}')print(f'Target tokens, shape: (batch, t) {ex_tar_in.shape}')print(f'logits, shape: (batch, t, target_vocabulary_size) {logits.shape}')
Context tokens, shape&colon; (batch, s, units) (64, 19)Target tokens, shape&colon; (batch, t) (64, 18)logits, shape&colon; (batch, t, target_vocabulary_size) (64, 18, 5000)

Train

For training, you'll want to implement your own masked loss and accuracy fuinctions:

def masked_loss(y_true, y_pred): # Calculate the loss for each item in the batch. loss_fn = tf.keras.losses.SparseCategoricalCrossentropy( from_logits=True, reduction='none') loss = loss_fn(y_true, y_pred) # Mask off the losses on padding. mask = tf.cast(y_true != 0, loss.dtype) loss *= mask # Return the total. return tf.reduce_sum(loss)/tf.reduce_sum(mask)
def masked_acc(y_true, y_pred): # Calculate the loss for each item in the batch. y_pred = tf.argmax(y_pred, axis=-1) y_pred = tf.cast(y_pred, y_true.dtype) match = tf.cast(y_true == y_pred, tf.float32) mask = tf.cast(y_true != 0, tf.float32) return tf.reduce_sum(match)/tf.reduce_sum(mask)

Configure the model for training:

model.compile(optimizer='adam', loss=masked_loss, metrics=[masked_acc, masked_loss])

The model is randomly initialized, and shoukld give roughly uniform output probabilities. So it's easy to predict what the initial values of the metrics should be:

vocab_size = 1.0 * target_text_processor.vocabulary_size(){"expected_loss": tf.math.log(vocab_size).numpy(), "expected_acc": 1/vocab_size}
{'expected_loss'&colon; 8.517193, 'expected_acc'&colon; 0.0002}

That should roughly match the values returned by running a few steps of evaluation:

model.evaluate(val_ds, steps=20, return_dict=True)
20/20 [==============================] - 7s 20ms/step - loss&colon; 8.5317 - masked_acc&colon; 0.0000e+00 - masked_loss&colon; 8.5317{'loss'&colon; 8.531679153442383, 'masked_acc'&colon; 0.0, 'masked_loss'&colon; 8.531679153442383}
history = model.fit( train_ds.repeat(), epochs=100, steps_per_epoch = 100, validation_data=val_ds, validation_steps = 20, callbacks=[ tf.keras.callbacks.EarlyStopping(patience=3)])
Epoch 1/1002022-10-07 12&colon;32&colon;17.410805&colon; W tensorflow/core/common_runtime/forward_type_inference.cc&colon;332] Type inference failed. This indicates an invalid graph that escaped type checking. Error message&colon; INVALID_ARGUMENT&colon; expected compatible input types, but input 1&colon;type_id&colon; TFT_OPTIONALargs { type_id&colon; TFT_PRODUCT args { type_id&colon; TFT_TENSOR args { type_id&colon; TFT_INT32 } }} is neither a subtype nor a supertype of the combined inputs preceding it&colon;type_id&colon; TFT_OPTIONALargs { type_id&colon; TFT_PRODUCT args { type_id&colon; TFT_TENSOR args { type_id&colon; TFT_INT8 } }} while inferring type of node 'cond_41/output/_22'100/100 [==============================] - 13s 31ms/step - loss&colon; 5.0253 - masked_acc&colon; 0.2604 - masked_loss&colon; 5.0253 - val_loss&colon; 4.0925 - val_masked_acc&colon; 0.3579 - val_masked_loss&colon; 4.0925Epoch 2/100100/100 [==============================] - 3s 31ms/step - loss&colon; 3.6952 - masked_acc&colon; 0.4027 - masked_loss&colon; 3.6952 - val_loss&colon; 3.3559 - val_masked_acc&colon; 0.4449 - val_masked_loss&colon; 3.3559Epoch 3/100100/100 [==============================] - 3s 30ms/step - loss&colon; 3.1026 - masked_acc&colon; 0.4865 - masked_loss&colon; 3.1026 - val_loss&colon; 2.8402 - val_masked_acc&colon; 0.5164 - val_masked_loss&colon; 2.8402Epoch 4/100100/100 [==============================] - 3s 30ms/step - loss&colon; 2.7049 - masked_acc&colon; 0.5376 - masked_loss&colon; 2.7049 - val_loss&colon; 2.5211 - val_masked_acc&colon; 0.5658 - val_masked_loss&colon; 2.5211Epoch 5/100100/100 [==============================] - 3s 30ms/step - loss&colon; 2.4025 - masked_acc&colon; 0.5825 - masked_loss&colon; 2.4025 - val_loss&colon; 2.2021 - val_masked_acc&colon; 0.6061 - val_masked_loss&colon; 2.2021Epoch 6/100100/100 [==============================] - 3s 30ms/step - loss&colon; 2.1764 - masked_acc&colon; 0.6155 - masked_loss&colon; 2.1764 - val_loss&colon; 2.0614 - val_masked_acc&colon; 0.6296 - val_masked_loss&colon; 2.0614Epoch 7/100100/100 [==============================] - 3s 31ms/step - loss&colon; 1.9957 - masked_acc&colon; 0.6416 - masked_loss&colon; 1.9957 - val_loss&colon; 1.8755 - val_masked_acc&colon; 0.6600 - val_masked_loss&colon; 1.8755Epoch 8/100100/100 [==============================] - 3s 31ms/step - loss&colon; 1.8411 - masked_acc&colon; 0.6654 - masked_loss&colon; 1.8411 - val_loss&colon; 1.8128 - val_masked_acc&colon; 0.6654 - val_masked_loss&colon; 1.8128Epoch 9/100100/100 [==============================] - 3s 31ms/step - loss&colon; 1.7877 - masked_acc&colon; 0.6690 - masked_loss&colon; 1.7877 - val_loss&colon; 1.6793 - val_masked_acc&colon; 0.6839 - val_masked_loss&colon; 1.6793Epoch 10/100100/100 [==============================] - 3s 31ms/step - loss&colon; 1.6820 - masked_acc&colon; 0.6827 - masked_loss&colon; 1.6820 - val_loss&colon; 1.6389 - val_masked_acc&colon; 0.6888 - val_masked_loss&colon; 1.6389Epoch 11/100100/100 [==============================] - 3s 31ms/step - loss&colon; 1.6351 - masked_acc&colon; 0.6906 - masked_loss&colon; 1.6351 - val_loss&colon; 1.5408 - val_masked_acc&colon; 0.7064 - val_masked_loss&colon; 1.5408Epoch 12/100100/100 [==============================] - 3s 30ms/step - loss&colon; 1.5624 - masked_acc&colon; 0.6989 - masked_loss&colon; 1.5624 - val_loss&colon; 1.5168 - val_masked_acc&colon; 0.7050 - val_masked_loss&colon; 1.5168Epoch 13/100100/100 [==============================] - 3s 30ms/step - loss&colon; 1.5186 - masked_acc&colon; 0.7084 - masked_loss&colon; 1.5186 - val_loss&colon; 1.5057 - val_masked_acc&colon; 0.6994 - val_masked_loss&colon; 1.5057Epoch 14/100100/100 [==============================] - 3s 31ms/step - loss&colon; 1.4886 - masked_acc&colon; 0.7131 - masked_loss&colon; 1.4886 - val_loss&colon; 1.4049 - val_masked_acc&colon; 0.7253 - val_masked_loss&colon; 1.4049Epoch 15/100100/100 [==============================] - 3s 32ms/step - loss&colon; 1.4060 - masked_acc&colon; 0.7204 - masked_loss&colon; 1.4065 - val_loss&colon; 1.4144 - val_masked_acc&colon; 0.7209 - val_masked_loss&colon; 1.4144Epoch 16/100100/100 [==============================] - 3s 31ms/step - loss&colon; 1.2321 - masked_acc&colon; 0.7427 - masked_loss&colon; 1.2321 - val_loss&colon; 1.3486 - val_masked_acc&colon; 0.7303 - val_masked_loss&colon; 1.3486Epoch 17/100100/100 [==============================] - 3s 31ms/step - loss&colon; 1.1963 - masked_acc&colon; 0.7505 - masked_loss&colon; 1.1963 - val_loss&colon; 1.4044 - val_masked_acc&colon; 0.7227 - val_masked_loss&colon; 1.4044Epoch 18/100100/100 [==============================] - 3s 31ms/step - loss&colon; 1.2161 - masked_acc&colon; 0.7459 - masked_loss&colon; 1.2161 - val_loss&colon; 1.3877 - val_masked_acc&colon; 0.7269 - val_masked_loss&colon; 1.3877Epoch 19/100100/100 [==============================] - 3s 31ms/step - loss&colon; 1.2075 - masked_acc&colon; 0.7470 - masked_loss&colon; 1.2075 - val_loss&colon; 1.3780 - val_masked_acc&colon; 0.7274 - val_masked_loss&colon; 1.3780
plt.plot(history.history['loss'], label='loss')plt.plot(history.history['val_loss'], label='val_loss')plt.ylim([0, max(plt.ylim())])plt.xlabel('Epoch #')plt.ylabel('CE/token')plt.legend()
<matplotlib.legend.Legend at 0x7ff6f043bb20>

Neural machine translation with attention | Text | TensorFlow (14)

plt.plot(history.history['masked_acc'], label='accuracy')plt.plot(history.history['val_masked_acc'], label='val_accuracy')plt.ylim([0, max(plt.ylim())])plt.xlabel('Epoch #')plt.ylabel('CE/token')plt.legend()
<matplotlib.legend.Legend at 0x7ff6f03662e0>

Neural machine translation with attention | Text | TensorFlow (15)

Translate

Now that the model is trained, implement a function to execute the full text => text translation. This code is basically identical to the inference example in the decoder section, but this also captures the attention weights.

@Translator.add_methoddef translate(self, texts, *, max_length=50, temperature=0.0): # Process the input texts context = self.encoder.convert_input(texts) batch_size = tf.shape(texts)[0] # Setup the loop inputs tokens = [] attention_weights = [] next_token, done, state = self.decoder.get_initial_state(context) for _ in range(max_length): # Generate the next token next_token, done, state = self.decoder.get_next_token( context, next_token, done, state, temperature) # Collect the generated tokens tokens.append(next_token) attention_weights.append(self.decoder.last_attention_weights) if tf.executing_eagerly() and tf.reduce_all(done): break # Stack the lists of tokens and attention weights. tokens = tf.concat(tokens, axis=-1) # t*[(batch 1)] -> (batch, t) self.last_attention_weights = tf.concat(attention_weights, axis=1) # t*[(batch 1 s)] -> (batch, t s) result = self.decoder.tokens_to_text(tokens) return result

Here are the two helper methods, used above, to convert tokens to text, and to get the next token:

result = model.translate(['¿Todavía está en casa?']) # Are you still homeresult[0].numpy().decode()
'is he still in this house ? '

Use that to generate the attention plot:

@Translator.add_methoddef plot_attention(self, text, **kwargs): assert isinstance(text, str) output = self.translate([text], **kwargs) output = output[0].numpy().decode() attention = self.last_attention_weights[0] context = tf_lower_and_split_punct(text) context = context.numpy().decode().split() output = tf_lower_and_split_punct(output) output = output.numpy().decode().split()[1:] fig = plt.figure(figsize=(10, 10)) ax = fig.add_subplot(1, 1, 1) ax.matshow(attention, cmap='viridis', vmin=0.0) fontdict = {'fontsize': 14} ax.set_xticklabels([''] + context, fontdict=fontdict, rotation=90) ax.set_yticklabels([''] + output, fontdict=fontdict) ax.xaxis.set_major_locator(ticker.MultipleLocator(1)) ax.yaxis.set_major_locator(ticker.MultipleLocator(1)) ax.set_xlabel('Input text') ax.set_ylabel('Output text')
model.plot_attention('¿Todavía está en casa?') # Are you still home
/tmpfs/tmp/ipykernel_10962/3355722706.py&colon;23&colon; UserWarning&colon; FixedFormatter should only be used together with FixedLocator ax.set_xticklabels([''] + context, fontdict=fontdict, rotation=90)/tmpfs/tmp/ipykernel_10962/3355722706.py&colon;24&colon; UserWarning&colon; FixedFormatter should only be used together with FixedLocator ax.set_yticklabels([''] + output, fontdict=fontdict)

Neural machine translation with attention | Text | TensorFlow (16)

Translate a few more sentences and plot them:

%%time# This is my life.model.plot_attention('Esta es mi vida.')
CPU times&colon; user 227 ms, sys&colon; 48.3 ms, total&colon; 275 msWall time&colon; 192 ms/tmpfs/tmp/ipykernel_10962/3355722706.py&colon;23&colon; UserWarning&colon; FixedFormatter should only be used together with FixedLocator ax.set_xticklabels([''] + context, fontdict=fontdict, rotation=90)/tmpfs/tmp/ipykernel_10962/3355722706.py&colon;24&colon; UserWarning&colon; FixedFormatter should only be used together with FixedLocator ax.set_yticklabels([''] + output, fontdict=fontdict)

Neural machine translation with attention | Text | TensorFlow (17)

%%time # Try to find out.'model.plot_attention('Tratar de descubrir.')
CPU times&colon; user 260 ms, sys&colon; 26 ms, total&colon; 286 msWall time&colon; 185 ms/tmpfs/tmp/ipykernel_10962/3355722706.py&colon;23&colon; UserWarning&colon; FixedFormatter should only be used together with FixedLocator ax.set_xticklabels([''] + context, fontdict=fontdict, rotation=90)/tmpfs/tmp/ipykernel_10962/3355722706.py&colon;24&colon; UserWarning&colon; FixedFormatter should only be used together with FixedLocator ax.set_yticklabels([''] + output, fontdict=fontdict)

Neural machine translation with attention | Text | TensorFlow (18)

The short sentences often work well, but if the input is too long the model literally loses focus and stops providing reasonable predictions. There are two main reasons for this:

  1. The model was trained with teacher-forcing feeding the correct token at each step, regardless of the model's predictions. The model could be made more robust if it were sometimes fed its own predictions.
  2. The model only has access to its previous output through the RNN state. If the RNN state looses track of where it was in the context sequence there's no way for the model to recover. Transformers improve on this by letting the decoder look at what it has output so far.

The raw data is sorted by length, so try translating the longest sequence:

long_text = context_raw[-1]import textwrapprint('Expected output:\n', '\n'.join(textwrap.wrap(target_raw[-1])))
Expected output&colon; If you want to sound like a native speaker, you must be willing topractice saying the same sentence over and over in the same way thatbanjo players practice the same phrase over and over until they canplay it correctly and at the desired tempo.
model.plot_attention(long_text)
/tmpfs/tmp/ipykernel_10962/3355722706.py&colon;23&colon; UserWarning&colon; FixedFormatter should only be used together with FixedLocator ax.set_xticklabels([''] + context, fontdict=fontdict, rotation=90)/tmpfs/tmp/ipykernel_10962/3355722706.py&colon;24&colon; UserWarning&colon; FixedFormatter should only be used together with FixedLocator ax.set_yticklabels([''] + output, fontdict=fontdict)

Neural machine translation with attention | Text | TensorFlow (19)

The translate function works on batches, so if you have multiple texts to translate you can pass them all at once, which is much more efficient than translating them one at a time:

(Video) seq2seq with attention (machine translation with deep learning)

inputs = [ 'Hace mucho frio aqui.', # "It's really cold here." 'Esta es mi vida.', # "This is my life." 'Su cuarto es un desastre.' # "His room is a mess"]
%%timefor t in inputs: print(model.translate([t])[0].numpy().decode())print()
it makes great cold here . this is my life . his room is a disaster . CPU times&colon; user 558 ms, sys&colon; 1.14 ms, total&colon; 559 msWall time&colon; 515 ms
%%timeresult = model.translate(inputs)print(result[0].numpy().decode())print(result[1].numpy().decode())print(result[2].numpy().decode())print()
it makes great cold here . this is my life . his room is a disaster . CPU times&colon; user 203 ms, sys&colon; 3.71 ms, total&colon; 206 msWall time&colon; 190 ms

So overall this text generation function mostly gets the job done, but so you've only used it here in python with eager execution. Let's try to export it next:

Export

If you want to export this model you'll need to wrap the translate method in a tf.function. That implementation will get the job done:

class Export(tf.Module): def __init__(self, model): self.model = model @tf.function(input_signature=[tf.TensorSpec(dtype=tf.string, shape=[None])]) def translate(self, inputs): return self.model.translate(inputs)
export = Export(model)

Run the tf.function once to compile it:

%%time_ = export.translate(tf.constant(inputs))
2022-10-07 12&colon;34&colon;08.140302&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.141232&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.142064&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.142894&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.143737&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.144600&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.145450&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.146281&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.147236&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.148111&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.148984&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.149824&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.150880&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.151700&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.152510&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.153347&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.154168&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.154985&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.155807&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.156684&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.157634&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.158460&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.159276&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.160092&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.160908&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.161734&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.162547&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.163369&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.164196&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.165034&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.165902&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.166779&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.167691&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.168516&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.169397&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.170269&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.171146&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.171950&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.172754&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.173574&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.174391&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.175222&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.176044&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.176940&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.177852&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.179135&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.179963&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.180783&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.181599&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;34&colon;08.182421&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }CPU times&colon; user 58.6 s, sys&colon; 16 ms, total&colon; 58.6 sWall time&colon; 57.8 s
%%timeresult = export.translate(tf.constant(inputs))print(result[0].numpy().decode())print(result[1].numpy().decode())print(result[2].numpy().decode())print()
it makes great cold here . this is my life . his room is a disaster . CPU times&colon; user 121 ms, sys&colon; 14.8 ms, total&colon; 136 msWall time&colon; 91.2 ms

Now that the function has been traced it can be exported using saved_model.save:

%%timetf.saved_model.save(export, 'translator', signatures={'serving_default': export.translate})
WARNING&colon;absl&colon;Found untraced functions such as embedding_3_layer_call_fn, embedding_3_layer_call_and_return_conditional_losses, embedding_4_layer_call_fn, embedding_4_layer_call_and_return_conditional_losses, cross_attention_2_layer_call_fn while saving (showing 5 of 32). These functions will not be directly callable after loading.INFO&colon;tensorflow&colon;Assets written to&colon; translator/assetsINFO&colon;tensorflow&colon;Assets written to&colon; translator/assetsCPU times&colon; user 1min 16s, sys&colon; 834 ms, total&colon; 1min 17sWall time&colon; 1min 17s
%%timereloaded = tf.saved_model.load('translator')_ = reloaded.translate(tf.constant(inputs)) #warmup
2022-10-07 12&colon;35&colon;42.645464&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;43.249070&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;43.764957&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;43.787206&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;43.799150&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;44.312037&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;44.409231&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;45.411470&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;45.603596&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;45.968807&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 14 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;46.074637&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;46.085935&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;46.298029&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;46.692121&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;46.786546&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 14 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;46.799081&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;47.421844&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;47.433491&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;47.694100&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;47.796299&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;47.949917&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;48.419971&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;49.742156&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;49.753949&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;49.848792&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 14 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;49.860129&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;50.049607&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;50.061855&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;50.534763&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;50.711333&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;51.029568&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;51.126984&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;51.138847&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;51.469777&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;51.573148&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 14 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;52.522837&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;52.534515&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;54.045030&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 14 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;54.058082&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;54.077334&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;54.141264&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;54.443277&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 14 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;54.455690&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;54.573280&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;54.584691&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;54.686470&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;54.698888&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;54.773718&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;54.785050&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;55.010368&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 14 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;55.022561&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;55.069033&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 14 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;55.080786&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;55.093825&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;55.105492&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;55.207144&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;55.219649&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;55.264306&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;55.275556&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;55.686711&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;55.698431&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;55.813247&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;55.825236&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;55.928737&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;55.941026&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;55.959880&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;55.971077&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;56.320954&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;56.523369&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 14 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;56.535610&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;56.631004&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;56.653929&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;56.665134&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;56.846604&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;56.858771&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;57.361488&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;57.705245&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;57.716908&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;57.791844&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;57.971008&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;57.983157&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;58.196002&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;58.208589&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;58.312797&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;58.324074&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;58.371506&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 14 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;58.383218&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;58.510843&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;58.522510&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;58.712950&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;58.725032&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;58.914023&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;58.926078&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;59.052205&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 14 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;59.064129&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;59.076966&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 14 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;59.088391&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;59.170490&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 14 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;59.182642&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;59.226717&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;59.239845&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;59.251471&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;59.366905&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;59.379876&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;59.624205&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;59.635881&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;59.649282&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;59.660786&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;59.679774&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;59.691087&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;59.790042&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;35&colon;59.801936&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;00.092139&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;00.103930&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;00.179856&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;00.191580&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;00.210465&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;00.270704&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;00.282553&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;00.532618&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;00.545757&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;00.777202&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 14 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;01.031293&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;01.093620&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;01.107086&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;01.118801&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;01.189323&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 14 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;01.201380&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;01.273660&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;01.285517&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;01.405824&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;01.425131&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;01.489287&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;01.501795&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;01.591080&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;01.603126&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;01.622784&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;01.634192&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;01.717227&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;01.729033&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;01.749479&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;01.760951&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;01.927862&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;01.939949&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;03.226052&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 14 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;03.239105&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;36&colon;14.849919&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.850813&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.851644&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.852511&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.853411&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.854272&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.855157&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.856029&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.856902&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.857806&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.858689&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.859665&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.860869&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.861813&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.862681&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.863548&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.864392&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.865239&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.866074&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.866904&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.867740&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.868583&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.869609&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.870542&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.871429&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.872257&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.873082&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.873979&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.874871&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.875688&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.876516&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.877374&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.878209&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.879137&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.880051&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.880933&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.881779&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.882620&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.883500&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.884413&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.885314&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.886160&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.886991&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.887856&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.888766&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.890364&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.891210&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.892099&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.892969&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }2022-10-07 12&colon;36&colon;14.893831&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }CPU times&colon; user 54.3 s, sys&colon; 892 ms, total&colon; 55.2 sWall time&colon; 54 s
%%timeresult = reloaded.translate(tf.constant(inputs))print(result[0].numpy().decode())print(result[1].numpy().decode())print(result[2].numpy().decode())print()
it makes great cold here . this is my life . his room is a disaster . CPU times&colon; user 128 ms, sys&colon; 11.7 ms, total&colon; 140 msWall time&colon; 93.9 ms

[Optional] Use a dynamic loop

It's worth noting that this initial implementation is not optimal. It uses a python loop:

for _ in range(max_length): ... if tf.executing_eagerly() and tf.reduce_all(done): break

The python loop is relatively simple but when tf.function converts this to a graph, it statically unrolls that loop. Unrolling the loop has two disadvantages:

  1. It makes max_length copies of the loop body. So the generated graphs take longer to build, save and load.
  2. You have to choose a fixed value for the max_length.
  3. You can't break from a statically unrolled loop. The tf.functionversion will run the full max_length iterations on every call.That's why the break only works with eager execution. This isstill marginally faster than eager execution, but not as fast as it could be.

To fix these shortcomings, the translate_dynamic method, below, uses a tensorflow loop:

for t in tf.range(max_length): ... if tf.reduce_all(done): break

It looks like a python loop, but when you use a tensor as the input to a for loop (or the condition of a while loop) tf.function converts it to a dynamic loop using operations like tf.while_loop.

There's no need for a max_length here it's just in case the model gets stuck generating a loop like: the united states of the united states of the united states....

On the down side, to accumulate tokens from this dynamic loop you can't just append them to a python list, you need to use a tf.TensorArray:

tokens = tf.TensorArray(tf.int64, size=1, dynamic_size=True)...for t in tf.range(max_length): ... tokens = tokens.write(t, next_token) # next_token shape is (batch, 1) ... tokens = tokens.stack() tokens = einops.rearrange(tokens, 't batch 1 -> batch t')

This version of the code can be quite a bit more efficient:

@Translator.add_methoddef translate(self, texts, *, max_length=500, temperature=tf.constant(0.0)): shape_checker = ShapeChecker() context = self.encoder.convert_input(texts) batch_size = tf.shape(context)[0] shape_checker(context, 'batch s units') next_token, done, state = self.decoder.get_initial_state(context) # initialize the accumulator tokens = tf.TensorArray(tf.int64, size=1, dynamic_size=True) for t in tf.range(max_length): # Generate the next token next_token, done, state = self.decoder.get_next_token( context, next_token, done, state, temperature) shape_checker(next_token, 'batch t1') # Collect the generated tokens tokens = tokens.write(t, next_token) # if all the sequences are done, break if tf.reduce_all(done): break # Convert the list of generated token ids to a list of strings. tokens = tokens.stack() shape_checker(tokens, 't batch t1') tokens = einops.rearrange(tokens, 't batch 1 -> batch t') shape_checker(tokens, 'batch t') text = self.decoder.tokens_to_text(tokens) shape_checker(text, 'batch') return text

With eager execution this implementation performs on par with the original:

%%timeresult = model.translate(inputs)print(result[0].numpy().decode())print(result[1].numpy().decode())print(result[2].numpy().decode())print()
it makes great cold here . this is my life . his room is a disaster . CPU times&colon; user 200 ms, sys&colon; 17.5 ms, total&colon; 218 msWall time&colon; 202 ms

But when you wrap it in a tf.function you'll notice two differences.

class Export(tf.Module): def __init__(self, model): self.model = model @tf.function(input_signature=[tf.TensorSpec(dtype=tf.string, shape=[None])]) def translate(self, inputs): return self.model.translate(inputs)
export = Export(model)

First, it's much quicker to trace, since it only creates one copy of the loop body:

%%time_ = export.translate(inputs)
2022-10-07 12&colon;36&colon;34.182501&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }CPU times&colon; user 4.11 s, sys&colon; 1.65 ms, total&colon; 4.11 sWall time&colon; 4.03 s

The tf.function is much faster than running with eager execution, and on small inputs it's often several times faster than the unrolled version, because it can break out of the loop.

%%timeresult = export.translate(inputs)print(result[0].numpy().decode())print(result[1].numpy().decode())print(result[2].numpy().decode())print()
it makes great cold here . this is my life . his room is a disaster . CPU times&colon; user 46.9 ms, sys&colon; 1.32 ms, total&colon; 48.2 msWall time&colon; 30.2 ms

So save this vesion as well:

%%timetf.saved_model.save(export, 'dynamic_translator', signatures={'serving_default': export.translate})
WARNING&colon;absl&colon;Found untraced functions such as embedding_3_layer_call_fn, embedding_3_layer_call_and_return_conditional_losses, embedding_4_layer_call_fn, embedding_4_layer_call_and_return_conditional_losses, cross_attention_2_layer_call_fn while saving (showing 5 of 32). These functions will not be directly callable after loading.INFO&colon;tensorflow&colon;Assets written to&colon; dynamic_translator/assetsINFO&colon;tensorflow&colon;Assets written to&colon; dynamic_translator/assetsCPU times&colon; user 31.1 s, sys&colon; 0 ns, total&colon; 31.1 sWall time&colon; 31 s
%%timereloaded = tf.saved_model.load('dynamic_translator')_ = reloaded.translate(tf.constant(inputs)) #warmup
2022-10-07 12&colon;37&colon;07.272360&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 14 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;07.460591&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 14 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;07.534112&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 14 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;08.086077&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;08.822539&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;08.834104&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;09.335490&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 14 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;09.462881&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;09.474200&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;09.994199&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;10.126905&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 14 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;10.138737&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;10.414792&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;10.426421&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;10.994929&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;13.593610&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;13.662054&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;13.674026&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;13.964650&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 14 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;13.977729&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;13.998116&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 14 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;14.009892&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;14.277101&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 14 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;14.289429&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;14.437561&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;14.547737&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 14 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;14.560517&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;14.595278&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;14.606648&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;15.262265&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 14 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;15.274611&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;15.574108&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 14 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;15.586366&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;15.664362&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;15.914102&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;15.944335&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 14 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;15.955725&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;16.282297&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 14 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;16.295336&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;16.530156&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 14 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;16.542156&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;16.687953&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 14 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;16.699555&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;16.767500&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 13 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;16.778981&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;17.001373&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond/while' has 14 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;17.013342&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 48 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;17.063163&colon; W tensorflow/core/common_runtime/graph_constructor.cc&colon;805] Node 'cond' has 4 outputs but the _output_shapes attribute specifies shapes for 46 outputs. Output shapes may be inaccurate.2022-10-07 12&colon;37&colon;18.575753&colon; W tensorflow/core/grappler/costs/op_level_cost_estimator.cc&colon;690] Error in PredictCost() for the op&colon; op&colon; "Softmax" attr { key&colon; "T" value { type&colon; DT_FLOAT } } inputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } } device { type&colon; "GPU" vendor&colon; "NVIDIA" model&colon; "Tesla P100-PCIE-16GB" frequency&colon; 1328 num_cores&colon; 56 environment { key&colon; "architecture" value&colon; "6.0" } environment { key&colon; "cuda" value&colon; "11020" } environment { key&colon; "cudnn" value&colon; "8100" } num_registers&colon; 65536 l1_cache_size&colon; 24576 l2_cache_size&colon; 4194304 shared_memory_size_per_multiprocessor&colon; 65536 memory_size&colon; 16023093248 bandwidth&colon; 732160000 } outputs { dtype&colon; DT_FLOAT shape { unknown_rank&colon; true } }CPU times&colon; user 13.7 s, sys&colon; 0 ns, total&colon; 13.7 sWall time&colon; 13.5 s
%%timeresult = reloaded.translate(tf.constant(inputs))print(result[0].numpy().decode())print(result[1].numpy().decode())print(result[2].numpy().decode())print()
it makes great cold here . this is my life . his room is a disaster . CPU times&colon; user 36.8 ms, sys&colon; 0 ns, total&colon; 36.8 msWall time&colon; 19.7 ms

Next steps

  • Download a different dataset to experiment with translations, for example, English to German, or English to French.
  • Experiment with training on a larger dataset, or using more epochs.
  • Try the transformer tutorial which implements a similar translation task but uses transformer layers instead of RNNs. This version also uses a text.BertTokenizer to implement word-piece tokenization.
  • Visit the tensorflow_addons.seq2seq tutorial, which demonstrates a higher-level functionality for implementing this sort of sequence-to-sequence model, such as seq2seq.BeamSearchDecoder.

FAQs

What is attention in machine translation? ›

Attention in neural machine translation provides the possibility to encode relevant parts of the source sentence at each translation step. As a result, attention is considered to be an alignment model as well.

What is attention in seq2seq? ›

attention applied: between decoder RNN state and prediction for this step. Attention is used after RNN decoder step before making a prediction. State used to compute attention and its output . Then is combined with to get an updated representation , which is used to get a prediction.

What does attention do in NLP? ›

The attention mechanism is a part of a neural architecture that enables to dynamically highlight relevant features of the input data, which, in NLP, is typically a sequence of textual elements. It can be applied directly to the raw input or to its higher level representation.

How does attention work in neural networks? ›

In artificial neural networks, attention is a technique that is meant to mimic cognitive attention. The effect enhances some parts of the input data while diminishing other parts — the motivation being that the network should devote more focus to the small, but important, parts of the data.

What are the 3 types of attention? ›

Focused Attention: Refers to our ability to focus attention on a stimulus. Sustained Attention: The ability to attend to a stimulus or activity over a long period of time. Selective Attention: The ability to attend to a specific stimulus or activity in the presence of other distracting stimuli.

What are the 4 types of attention? ›

There are four main types of attention that we use in our daily lives: selective attention, divided attention, sustained attention, and executive attention.

How is attention score calculated? ›

Steps to calculating Attention

Take the query vector for a word and calculate it's dot product with the transpose of the key vector of each word in the sequence — including itself. This is the attention score or attention weight . 2. Then divide each of the results by the square root of the dimension of the key vector.

Why is attention layer used? ›

attention layer can help a neural network in memorizing the large sequences of data. If we are providing a huge dataset to the model to learn, it is possible that a few important parts of the data might be ignored by the models.

How does attention module work? ›

The idea behind the attention mechanism was to permit the decoder to utilize the most relevant parts of the input sequence in a flexible manner, by a weighted combination of all the encoded input vectors, with the most relevant vectors being attributed the highest weights.

What are the 3 main qualities of attention? ›

(i) Attention is always changing. (ii) Attention is always an active center of our experience. (iii) It is selective.

What are the three functions of attention? ›

Attention can be categorized into three main components depending on their different functions: (a) activation (alertness, sustained attention), (b) visual–spatial orientation (overt attention, visual search), and (c) selective executive components (divided attention, inhibitory control, and flexibility).

What problem does attention solve? ›

Attention is proposed as a solution to the limitation of the Encoder-Decoder model encoding the input sequence to one fixed length vector from which to decode each output time step. This issue is believed to be more of a problem when decoding long sequences.

What is the example of attention process? ›

Selective attention involves being able to choose and selectively attend to certain stimuli in the environment while at the same time tuning other things out. 4 For example, you might selectively attend to a book you are reading while tuning out the sound of your next-door neighbor's car alarm going off.

What is the process of attention? ›

Broadly, the attention process can be described as selective concentration on salient environmental features while ignoring other aspects.

What are the two models of attention? ›

Selection models of attention theorize how specific stimuli gain our awareness. Early selection models emphasize physical features of stimuli are attended to, while late selection models argue that semantic features are what determine our current focus of attention.

What are the two factors of attention? ›

What is attention?
  • Amplitude: the quantity of information that we can pay attention to at the same time and the quantity of tasks that we can do simultaneously. ...
  • Intensity: is understood as the amount of attention resources which are paying attention to a given stimulus.
19 Sept 2018

How do you measure attention? ›

The use of an eye-tracker is probably the most widely used tool for attention measurement. The idea is to use a device which is able to precisely measure the eyes gaze which obviously only provide information concerning covert attention. The eye-tracking technology highly evolved during time.

What are attention skills? ›

How are Attention Skills Currently Defined? Sustained Attention. The ability to attend to a task for extended periods of time without losing focus or concentration. Selective Attention. The ability to focus and concentrate on a task even when distractions are present.

How does attention affect memory? ›

For example, attention can assist in selectively encoding items into visual memory. Attention appears to be able to influence items already stored in visual memory as well; cues that appear long after the presentation of an array of objects can affect memory for those objects (Griffin & Nobre, 2003).

What is attention in motor learning? ›

A person's focus of attention (concentration) has an important influence on both performance and learning. An external focus on the intended movement effect results in greater movement effectiveness and efficiency, compared with an internal focus on body movements.

What is attention process? ›

Attention-related processes include three functional sub-components: alerting, orienting, and inhibition. We investigated these components using EEG-based, brain event-related potentials and their neuronal source activations during the Attention Network Test in typically developing school-aged children.

What is attention in image? ›

With an Attention mechanism, the image is first divided into n parts, and we compute an image representation of each When the RNN is generating a new word, the attention mechanism is focusing on the relevant part of the image, so the decoder only uses specific parts of the image.

What is attention in encoder decoder? ›

Self-attention in the Encoder — the input sequence pays attention to itself. Self-attention in the Decoder — the target sequence pays attention to itself. Encoder-Decoder-attention in the Decoder — the target sequence pays attention to the input sequence.

What are the two main types of attention? ›

The Four Types of Attention
  • Selective Attention. Sounds interesting? ...
  • Divided Attention. We use divided attention while simultaneously paying attention to two or more tasks. ...
  • Alternating Attention. ...
  • Sustained Attention.
15 Apr 2021

How many stages are there in attention? ›

These two stages are the preattentive stage and the focused attention stage.

What are the factors that affect attention? ›

Determinants of Attention:
  • Nature of the stimulus: All types of stimuli are not able to bring the same degree of attention. ...
  • Intensity and size of the stimulus: ...
  • Contrast, change and variety: ...
  • Repetition of stimulus: ...
  • Movement of the stimulus: ...
  • Interest: ...
  • Motives: ...
  • Mind set:

Why is attention is important? ›

Why is attention important? Effective attention is what allows us to screen out irrelevant stimulation in order to focus on the information that is important in the moment. This also means that we are able to sustain attention which then allows us to engage in a task for long enough to repeatedly practice it.

What is attention value? ›

The principle is that as the perceived value of a product increases, the degree of attention or mindshare you need from a customer in order to take notice decreases.

Videos

1. Neural Machine Translation : Everything you need to know
(Crazymuse)
2. Tensorflow neural machine translation with attention (Theory Part)
(yingshaoxo's lab)
3. Hadi Abdi Khojasteh - Sequential Attention-Based Neural Machine Translation | PyData Yerevan 2022
(PyData)
4. Let's Recreate Google Translate! | Neural Machine Translation
(Edan Meyer)
5. Effective Approaches To Attention Based Neural Machine Translation - Paper Explained
(Halfling Wizard)
6. KRS Ep05: Neural Machine Translation with Attention Mechanism
(Kharagpur Data Analytics Group)
Top Articles
Latest Posts
Article information

Author: Melvina Ondricka

Last Updated: 01/24/2023

Views: 6799

Rating: 4.8 / 5 (48 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Melvina Ondricka

Birthday: 2000-12-23

Address: Suite 382 139 Shaniqua Locks, Paulaborough, UT 90498

Phone: +636383657021

Job: Dynamic Government Specialist

Hobby: Kite flying, Watching movies, Knitting, Model building, Reading, Wood carving, Paintball

Introduction: My name is Melvina Ondricka, I am a helpful, fancy, friendly, innocent, outstanding, courageous, thoughtful person who loves writing and wants to share my knowledge and understanding with you.