-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standard layers (like tf.layers.dense
) fail to report their trainable weights, when used from within a custom layer
#8253
Comments
So, it does look like The other stuff is still a problem, so far as I can see. |
You haven't registered the weights from the child layers to the parent later In the build method. this.trainableWeights = [...this.childLayer.trainableWeights] Same goes for nonTrainableWeights We really need some documentation written up for the layers API, especially the rnncell base class |
Thanks for the suggestion, @lukemovement. Your suggestion does work, however you also need to build each of the child layers: this.inProj.build(inputShape)
this.outProj.build(inputShape) I actually discovered this trick previously, and it also has problems. The issue relates to restoring a model from disk:
When you add child layers to
The end result causes issues when attempting to restore those child layers from a checkpoint. I thought it might be possible to fix this with the getWeights() {
return this.trainableWeights.map((weights) => weights.read())
}
setWeights(weights) {
this.inProj.kernel.write(weights[0])
this.inProj.bias.write(weights[1])
this.outProj.kernel.write(weights[2])
this.outProj.bias.write(weights[3])
} Anyway, I hope that additional context helps. If you can point me to somewhere I could contribute to the docs, I might be able to write a tutorial or something. I've probably built 100 custom layers at this point. |
I'm unsure as to where this would best be placed. The weights are always in the same order on the layer so I use this. It works as long as you don't reach the maximum string length. import * as tf from "@tensorflow/tfjs";
import { mkdir, readFile, writeFile } from "fs/promises";
import { resolve } from "path";
export const SaveModel = async ({
model,
dir,
}: {
model: tf.LayersModel;
dir: string;
}) => {
for (const layer of model.layers) {
const name = layer.name;
if (0 === layer.weights.length) {
continue;
}
const weights = JSON.stringify(
layer.getWeights().map((weight) => weight.arraySync()),
);
await mkdir(dir, { recursive: true });
await writeFile(resolve(dir, `${name}.json`), weights);
}
console.log(`Saved to ${dir}`);
};
export const LoadModel = async ({
model,
dir,
}: {
model: tf.LayersModel;
dir: string;
}) => {
for (const layer of model.layers) {
const name = layer.name;
if (0 === layer.weights.length) {
continue;
}
try {
const weights = JSON.parse(
await readFile(resolve(dir, `${name}.json`), "utf-8"),
);
const tensors = weights.map((weight: number[]) => tf.tensor(weight));
layer.setWeights(tensors);
} catch (e) {
console.log(layer.name, (e as Error).message);
}
}
console.warn(`Loaded from ${dir}`);
}; |
System information
Describe the current behavior
When building custom layers, it is often useful to use "standard" layer types like
tf.layers.dense
andtf.layers.LSTM
, from inside of that layer. However, layers added in this way have 2 major problems:model.summary()
.model.save()
.This is problematic for obvious reasons. The alternative is to use the
this.addWeight()
API; however, weights added in this way also have problems:this.addWeight()
cannot use string activations, likemish
andswish
.If there is already a supported way to integrate the weights from a standard layer like
tf.layers.dense
, from within a custom model - the method is not clear, from any of the documentation I've seen.Describe the expected behavior
I would expect weights used by the computational graph to be included in the
model.summary()
's "trainable parameters" report. But, they are not.___________________________________________________________________________________________________________________ Layer (type) Input Shape Output shape Param # Receives inputs =================================================================================================================== inp-t0B (InputLayer) [[null,null]] [null,null] 0 ___________________________________________________________________________________________________________________ emb-gza (SharedEmbedding) [[null,null]],[[null,null,2 multiple 5091328 inp-t0B[0][0] mlp-adG[0][0] ___________________________________________________________________________________________________________________ enc-RC2 (SinusoidalPositio [[null,null,256]] [null,null,256] 0 emb-gza[0][0] ___________________________________________________________________________________________________________________ attn-FBz (SelfAttention) [[null,null,256]] [null,null,256] 0 enc-RC2[0][0] ___________________________________________________________________________________________________________________ mlp-3kL (MultiLayerPercept [[null,null,256]] [null,null,256] 0 attn-FBz[0][0] ___________________________________________________________________________________________________________________ attn-VZK (SelfAttention) [[null,null,256]] [null,null,256] 0 mlp-3kL[0][0] ___________________________________________________________________________________________________________________ mlp-Jfy (MultiLayerPercept [[null,null,256]] [null,null,256] 0 attn-VZK[0][0] ___________________________________________________________________________________________________________________ attn-j0b (SelfAttention) [[null,null,256]] [null,null,256] 0 mlp-Jfy[0][0] ___________________________________________________________________________________________________________________ mlp-oyK (MultiLayerPercept [[null,null,256]] [null,null,256] 0 attn-j0b[0][0] ___________________________________________________________________________________________________________________ attn-L1y (SelfAttention) [[null,null,256]] [null,null,256] 0 mlp-oyK[0][0] ___________________________________________________________________________________________________________________ mlp-9r1 (MultiLayerPercept [[null,null,256]] [null,null,256] 0 attn-L1y[0][0] ___________________________________________________________________________________________________________________ attn-Yha (SelfAttention) [[null,null,256]] [null,null,256] 0 mlp-9r1[0][0] ___________________________________________________________________________________________________________________ mlp-GV8 (MultiLayerPercept [[null,null,256]] [null,null,256] 0 attn-Yha[0][0] ___________________________________________________________________________________________________________________ attn-R5D (SelfAttention) [[null,null,256]] [null,null,256] 0 mlp-GV8[0][0] ___________________________________________________________________________________________________________________ mlp-adG (MultiLayerPercept [[null,null,256]] [null,null,256] 0 attn-R5D[0][0] =================================================================================================================== Total params: 5091328 Trainable params: 5091328 Non-trainable params: 0
Standalone code to reproduce the issue
Add the following custom layer to any model, then call
model.compile()
, thenmodel.summary()
. You will see that it reports 0 trainable parameters:Other info / logs
If there is a supported way to add the trainable parameters from
tf.layers.dense()
to my custom layer, please let me know!The text was updated successfully, but these errors were encountered: