PyTorch-Lightningのチュートリアルがちゃんと動かない

チュートリアルが動かない？意味ねえじゃん

結論から言うと最初の実行セルを

!pip install segmentation-models-pytorch
# !pip install pytorch-lightning==1.5.4
!pip install pytorch-lightning==1.9.5

と変更すればよかった。

以下のチュートリアルをgoogle colabで動かそうとした。

セグメンテーションタスクのサンプルとして以下のサイトで紹介されていた。

note-tech.com

https://colab.research.google.com/github/qubvel/segmentation_models.pytorch/blob/master/examples/binary_segmentation_intro.ipynb

!pip install segmentation-models-pytorch
!pip install pytorch-lightning==1.5.4

import os
import torch
import matplotlib.pyplot as plt
import pytorch_lightning as pl
import segmentation_models_pytorch as smp

from pprint import pprint
from torch.utils.data import DataLoader

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-2-495d7f80e44f> in <cell line: 4>()
      2 import torch
      3 import matplotlib.pyplot as plt
----> 4 import pytorch_lightning as pl
      5 import segmentation_models_pytorch as smp
      6 

4 frames
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/utilities/apply_func.py in <module>
     28 if _TORCHTEXT_AVAILABLE:
     29     if _compare_version("torchtext", operator.ge, "0.9.0"):
---> 30         from torchtext.legacy.data import Batch
     31     else:
     32         from torchtext.data import Batch

ModuleNotFoundError: No module named 'torchtext.legacy'

importでこけるんかい

ひとまずtorchtextを追加でインストールしてみる。

!pip install segmentation-models-pytorch
!pip install pytorch-lightning==1.5.4
!pip install torchtext

同じエラーで進まず。

----> 4 import pytorch_lightning as pl
ModuleNotFoundError: No module named 'torchtext.legacy'

ググって見つけた情報を順番に試す。

www.datasciencelearner.com

!pip install segmentation-models-pytorch
!pip install pytorch-lightning==1.5.4
!pip install torchtext==0.10.0

ERROR: Could not find a version that satisfies the requirement torchtext==0.10.0 (from versions: 0.1.1, 0.2.0, 0.2.1, 0.2.3, 0.3.1, 0.4.0, 0.5.0, 0.6.0, 0.12.0, 0.13.0, 0.13.1, 0.14.0, 0.14.1, 0.15.1, 0.15.2)
ERROR: No matching distribution found for torchtext==0.10.0

💢

!pip install segmentation-models-pytorch
!pip install pytorch-lightning==1.5.4
!pip install torchtext==0.14.0

- 中略 -

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchaudio 2.0.2+cu118 requires torch==2.0.1, but you have torch 1.13.0 which is incompatible.
torchdata 0.6.1 requires torch==2.0.1, but you have torch 1.13.0 which is incompatible.
torchvision 0.15.2+cu118 requires torch==2.0.1, but you have torch 1.13.0 which is incompatible.
Successfully installed torch-1.13.0 torchtext-0.14.0

エラーが出ているが0.14.0がインストールできているようなので続行してみる。

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-2-495d7f80e44f> in <cell line: 4>()
      2 import torch
      3 import matplotlib.pyplot as plt
----> 4 import pytorch_lightning as pl
      5 import segmentation_models_pytorch as smp
      6 

4 frames
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/utilities/apply_func.py in <module>
     30         from torchtext.legacy.data import Batch
     31     else:
---> 32         from torchtext.data import Batch
     33 else:
     34     Batch = type(None)

ImportError: cannot import name 'Batch' from 'torchtext.data' (/usr/local/lib/python3.10/dist-packages/torchtext/data/__init__.py)

うーん、深みにハマってしまっている気がする。
一旦初期状態に戻してみる。

pytorch-lightningのversionを変更してみた。

!pip install segmentation-models-pytorch
# !pip install pytorch-lightning==1.5.4
!pip install pytorch-lightning==1.9.5

これでimportのエラーは出なくなった。
この後のtrainningも動いたのでよしとする。

ちなみに

バージョンを指定せずにinstallすると...

!pip install segmentation-models-pytorch
# !pip install pytorch-lightning==1.5.4
!pip install pytorch-lightning

importではエラーは出ないが

trainer = pl.Trainer(
    gpus=1, 
    max_epochs=5,
)

trainer.fit(
    model, 
    train_dataloaders=train_dataloader, 
    val_dataloaders=valid_dataloader,
)

のセルを実行すると

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-9-f1c61f5ae996> in <cell line: 1>()
----> 1 trainer = pl.Trainer(
      2     gpus=1,
      3     max_epochs=5,
      4 )
      5 

/usr/local/lib/python3.10/dist-packages/pytorch_lightning/utilities/argparse.py in insert_env_defaults(self, *args, **kwargs)
     67 
     68         # all args were already moved to kwargs
---> 69         return fn(self, **kwargs)
     70 
     71     return cast(_T, insert_env_defaults)

TypeError: Trainer.__init__() got an unexpected keyword argument 'gpus'

pytorch-lightning v2から仕様が変わっているらしい。

stackoverflow.com

このサイトに従って

trainer = pl.Trainer(max_epochs=5,accelerator="auto")

trainer.fit(
    model, 
    train_dataloaders=train_dataloader, 
    val_dataloaders=valid_dataloader,
)

INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:IPU available: False, using: 0 IPUs
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs

gpuは認識できたっぽいが

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-10-dbbdb5087c44> in <cell line: 3>()
      1 trainer = pl.Trainer(max_epochs=5,accelerator="auto")
      2 
----> 3 trainer.fit(
      4     model,
      5     train_dataloaders=train_dataloader,

5 frames
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/configuration_validator.py in __verify_train_val_loop_configuration(trainer, model)
     77     # check legacy hooks are not present
     78     if callable(getattr(model, "training_epoch_end", None)):
---> 79         raise NotImplementedError(
     80             f"Support for `training_epoch_end` has been removed in v2.0.0. `{type(model).__name__}` implements this"
     81             " method. You can use the `on_train_epoch_end` hook instead. To access outputs, save them in-memory as"

NotImplementedError: Support for `training_epoch_end` has been removed in v2.0.0. `PetModel` implements this method. You can use the `on_train_epoch_end` hook instead. To access outputs, save them in-memory as instance attributes. You can find migration examples in https://github.com/Lightning-AI/lightning/pull/16520.