Here are a bunch of software/tips/resources I've found. I like to have them in a publicly accessible webpage, but hope that its helpful to others as well. I've framed some of these as FAQs to improve discoverability.
What are useful python cheatsheets?
- PDB Cheatsheet from https://github.com/nblock/pdb-cheatsheet
- Pandas Cheatsheet from https://pandas.pydata.org/
Software
What software is useful for writing research papers?
- Paperpile for organizing research papers
- Altair for figures
- Plotnine for figures
- draw.io for diagrams
How do you do X in Plotnine?
Operating Systems
- I use Arch Linux on machines I own.
- For cloud instances, I use Ubuntu
What are some awesome (rust-based) command line tools?
- bat: cat replacement
- exa: ls replacement
- Glances, a better top/htop (be sure to
pip install nvidia-ml-py3
for GPU support) - fd: find replacement
- starship: make a better prompt
- dust: check disk usage
- ripgrep: grep replacement
- watchexec: run commands on file changes
How can I install software on linux without root?
What is some software for managing ML experiments?
What are ways to make git better?
What are good python debugging tools/tricks?
ipdb
andpdb
are fantastic for command line debugging- To start debugger on if allennlp errors:
ipython -m ipdb (which allennlp) -- train config.jsonnet
and pressc
to continue when terminal starts - Anaconda pip installations from source packages causing g++ errors like "file format not recognized", rename anaconda's
ld
told_
so that pip uses the system version https://github.com/pytorch/pytorch/issues/16683#issuecomment-459982988
How can I search for types of Wikipedia pages?
What is some software for data analytics/distributed computing?
- Apache Spark: SQL-based analytics and distributed computing
- Dask: pure python distributed computing
- In general Slurm is very good
- For Slurm, submitit is a useful tool
What are good python libraries for creating websites?
- For small APIs, FastAPI or websites that you don't need/want pre-made user system
- For more "out of the box", but more opinionated use Django
- For static sites Static site (like this page) Pelican
What are some good NLP libraries?
- Allennlp is an amazing library for research in natural language processing, use it! *Spacy: Fantastic, easy to use tools for tokenization, dependency parsing, named entity recognition and more, often used in other NLP software.
What data formats should I use?
- Unless you have a very good reason and have purely numerical data, never use csv; saying a file is csv format is insufficient information to be able to parse the file
- Default to json
- For large json files that are table-like (the root object is an array, and looks like rows), consider JSON lines/jsonl. Large JSON objects can be expensive to parse, and make it difficult to run parallel jobs (eg Apache Spark uses line delimited rows from text files)
- For data you expect to analyze, you might consider creating a read-only SQlite database and running analysis in SQL.
Hardware
How can I improve my ergonomics and avoid repetetive strain injuries?
- I bought a vari electric sit/stand desk and love it. It encourages me to stretch, improves my posture, and gets me moving throughout the day
How can I improve my home internet?
- Option 1: Use a very long ethernet cable with No Damage Wall Hangers
- Option 2: Ethernet over Powerline Network Adaptors are magical and work extremeley well. Personally, I bought the "TP-Link AV2000 Powerline Adapter"
AllenNLP Tips
allennlp
sets random seeds deterministically which helps improve reproducibility of experiments. Occasionally, when doing things like running multiple trials of identical hyper parameters, this behavior causes results for each trial to be identical. In these cases, its helpful to manually specify a random seed; for example using the trial number as the random seed.
Tips from Others
Docs
Where do you keep your configuration files for applications?
Where can I find resources related to the UMD CLIP lab?
Links
- CLIP Homepage
- CLIP Wiki
- UMD Themed Google Slides, I created these based on graphics in a UMD Powerpoint theme. To use, make a new presentation and import the theme from this sample slide deck
Where can I store files at UMD?
UMIACS offers long term file storage and hosting through object stores using a set of s3-like utilities.
Specific to the clip-quiz
group, you should mirror the layout of /fs/clip-quiz
and the contents of the clip-quiz
bucket to make storing/restoring files easy.
For example, moving a file /fs/clip-quiz/code/old-big-project/
could be done using: cpobj -V -r -f /fs/clip-quiz/code/old-big-project clip-quiz:code/
LaTeX
What format should figures be in?
- Create PDF version of figures
What is ~
? Non-breaking space in latex?
- LaTeX will not break lines between alpha and beta in
alpha~beta