Developing new machine learning code is often error prone and takes many iterations of the write-run-debug loop. In this context, I specifically refer to saving time fixing errors that crash the program--not than those that cause models to be incorrect in subtle ways (for that, see Andrej Karpathy's blog post). Here are a few tricks I use to preserve my sanity while developing new model code written in python.
A quick outline of the tips:
- Debuggers
- Document tensor shapes
- Verbose logging with
logging
module - Debugging dataset
- Unit tests
Use Debuggers
Modern machine learning code often contains several abstraction levels---a good thing!---which unfortunately makes it more difficult to dig deep into plumbing to fix data loading or tensor shape errors. Debuggers are exceptionally useful in these cases. I use them in one of two ways.
If I know that the program fails then you can start a debugger on failure by using one of:
import pdb;pdb.set_trace()
,breakpoint()
on the line that fails- If it fails for only some iterations (e.g., one data point is bad), then
python -m pdb mytrainingscript.py
to start a session and pressc
to start the program. The interpreter will drop you into a debug session when the failure occurs. - For
allennlp
specifically, this command works well:ipython -m ipdb (which allennlp) -- train config.jsonnet
Sidenote: I use ipdb
instead of pdb
since its more similar to the ipython
terminal.
Document Tensor Shapes
Its extremely helpful to know tensor shapes during development and helps reduce time when looking at code again. Here is a sample forward pass of a pytorch model with shape annotations:
def forward(self, text, length):
output = {}
# (batch_size, seq_length)
text = text['tokens'].cuda()
# (batch_size, 1)
length = length.cuda()
# (batch_size, seq_length, word_dim)
text_embed = self._word_embeddings(text)
# (batch_size, word_dim)
text_embed = self._word_dropout(text_embed.sum(1) / length)
# (batch_size, hidden_dim)
hidden = self._encoder(text_embed)
# (batch_size, n_classes)
output['logits'] = self._classifier(hidden)
Verbose Logging to Terminal and Files
I often see print statements, but not much usage of the python logging
module in model code.
Although it takes some setup, there are several benefits to using logging.info
over print
.
- Timestamps are logged "for free" which is helpful to understanding where most of the execution time is spent.
- Logging can be configured to output the module a statement is from which makes debugging faster.
- Logging can also be configured to write to a file. This has saved me a few times when I didn't expect to need
print
output when I ran the model, but later needed it. - This leads me to: be verbose in what you log. I love that the logging in allennlp includes things like model parameters (Sample Log).
I typically include this code in my package for logging to the standard error and a file
# In a file like util.py
import logging
def get(name):
log = logging.getLogger(name)
if len(log.handlers) < 2:
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
fh = logging.FileHandler('mylogfile.log')
fh.setLevel(logging.INFO)
fh.setFormatter(formatter)
sh = logging.StreamHandler()
sh.setLevel(logging.INFO)
sh.setFormatter(formatter)
log.addHandler(fh)
log.addHandler(sh)
log.setLevel(logging.INFO)
return log
# In my model file
from mypackage import util
log = util.get(__name__)
log.info("hello world")
Small Debug Dataset
Another common issue is debugging an error when dataset loading takes a long time.
Its very annoying to debug shape issues when your dataset takes twenty minutes to load.
One trick---aside from using pdb
---is to make a small debug dataset and testing with this before using the full dataset.
If your dataset is in a line delimited format like jsonlines, then it may be as easy as $ cat mydata.jsonl | head -n 100 > debug_dataset.jsonl
Unit Tests
Last but not least, writing a few unit tests is often helpful. Specifically, I like writing unit tests for data or metric code that is not obviously correct by inspection. PyTest has worked very well for this purpose since its easy to use and configure.
Here is a simple example of my configuration (pytest.ini
)
[pytest]
testpaths = awesomemodel/tests/
filterwarnings =
ignore::DeprecationWarning
ignore::PendingDeprecationWarning
A sample test in awesomemodel/tests/test_zero.py
:
from numpy.testing import assert_almost_equal
import pytest
from awesomemodel.numbers import AlmostZero
def test_zero():
zero = AlmostZero()
assert_almost_equal(0.0, zero())
And running the test
$ pytest
======================================================================================= test session starts ========================================================================================
platform linux -- Python 3.8.0, pytest-5.3.1, py-1.8.0, pluggy-0.13.1
rootdir: /tmp/src, inifile: pytest.ini, testpaths: awesomemodel/tests/
collected 1 item
awesomemodel/tests/test_zero.py . [100%]
======================================================================================== 1 passed in 0.13s =========================================================================================
For reference, the directory structure
$ tree
.
├── awesomemodel
│ ├── __init__.py
│ ├── tests
│ │ ├── __init__.py
│ │ └── test_zero.py
│ └── zero.py
└── pytest.ini
2 directories, 5 files
Hopefully you'll find some of these tricks helpful for efficiently developing your own models! Thanks to Joe Barrow for the discussion inspiring the post and to Shi Feng for edits and comments. In my next post I'll briefly describe how I use Semantic Scholar for writing literature reviews or related work sections in papers.