Writing

Archive | Tags

How to Run Sbatch in Slurm

March 21, 2023  •  slurm | sbatch | compute | remote server
How to create and run an Sbatch job on Slurm

Flaregun - A Tiny PyTorch Helper Library

March 20, 2023  •  pytorch | ml
Figure out how much GPU memory is available on an Nvidia device in real-time, how many params are in a PyTorch model, etc.

Notes for Harvard's CS 287 NLP Course

March 14, 2023  •  AI | nlp | lecture notes

NOTE: These are all taken from Chris Tanner’s great NLP course CS 287 (taught at Harvard). None of this is my own work. This is just a collection of screenshots and notes from my own reading of the course’s lecture slides for my own reference and understanding. I would highly recommend reading the full lecture slides available here.

Helpful Linux Commands

February 28, 2023  •  linux | terminal
A list of helpful Linux commands, such as checking GPU memory usage, listing processes running on a specific port, etc.

How to Analyze Memory Usage of Folders/Files on your Mac

January 3, 2023  •  Mac | memory

Three ways to view memory usage (i.e. “disk usage”) on your Mac, in descending order of preference.

Walkthrough of the OMOP CDM (Part 1)

December 30, 2022  •  omop | databases | health IT | ehrs | healthcare
Defining core concepts in the OMOP CDM

In Part 1 of this series on the OMOP CDM, we explain what the OMOP CDM is, define key terms such as “concept”, “source value”, “vocabulary”, and “domain”, and describe how they related to each other.

Python Line-by-Line Profiling of a Program's Speed

December 12, 2022  •  python | profiling | devops
How to profile your Python program line-by-line in 0 lines of code, 2 lines of shell commands

How long does each line in your Python program take to run?

How are Stanford's STARR database,the OMOP Common Data Model, and Epic's EHR Related?

December 1, 2022  •  omop | ehrs | stanford | epic | health IT
Overview of Epic, OMOP CDM, and Stanford's STARR database, and how can to use them for EHR research

In this post, I explain what Epic (Chronicles, Clarity, etc.), OMOP CDM, and Stanford STARR are, how they are all related, and what the benefits/features of each are.

How to connect VSCode to your remote server via SSH

October 21, 2022  •  devops | vscode | ssh
Plus debugging tips for common error messages and pitfalls.

If you’ve ever had to ssh into a server to run programs, you may be taking an unnecessary productivity hit each time you relegate yourself to coding in a Jupyter notebook on localhost:8000.

Plotting the Distribution of MLB Batting Statistics Over Time

September 13, 2022  •  baseball | visualization | matplotlib
What is a "good" batting average? An "average" OBP? An "elite" SLG?

Most people know that a batting average over .300 is the mark of a great hitter, and that hitting .200 will land you on the bench.

How to Publish Jupyter .ipynb Notebooks to a Jekyll Static Blog

September 13, 2022  •  jupyter | devops | blogging
How to convert .ipynb to .md files and publish them on your static blog (with images, SVGs, etc.)

Goal: Publish a .ipynb on my Jekyll static site as painlessly as possible.

Matplotlib Tips + Tricks

August 13, 2022  •  python | visualization | matplotlib
Lesser-known matplotlib tricks for frequent users of the Python library

These are all taken from this great 3-hr YouTube tutorial by Ben Root from SciPy 2018. I’ve condensed the main takeaways of the talk into the following list of key concepts / tricks that I hadn’t previously been aware of.

Combining ROC Curves with Indifference Curves to Measure an ML Model's Utility

August 10, 2022  •  AI | model evaluation | statistics

For the below examples, assume we have a binary classification task where the class label is $y \in {0, 1}$, and the model’s predictions are $\hat{y} \in {0, 1}$

Sensitivity + Specificity + PPV + TP/FP/TN/FN Formulas

August 2, 2022  •  statistics | AI | models | probability
Formulas for sensitivity, specificity, PPV, NPV, TPR, FPR, prevalence, etc.

A brief cheat sheet / reference guide containing the definitions, formulas, and explanations of the most commonly used model evaluation metrics for binary classification tasks.

What is Tidy Data?

January 25, 2022  •  data | databases
The simplest definition of tidy data.

The definition of “tidy data” is simple:

Notes on Vim Tutor

January 3, 2022  •  vim | cheatsheet | terminal
Cheatsheet of keyboard shortcuts for Vim, learned from vimtutor.

I’ve always been intimidated by vim. After taking MIT’s Missing Semester Course (a free online course that I’d highly recommend), I learned about a built-in utility called vimtutor that automatically comes installed with vim.

Tips to Free Up Mac Storage

December 24, 2021  •  Mac | memory
How I tripled the available storage on my Mac by deleting useless files.

tl;dr – If you have XCode installed, then you could easily be wasting up to 10% of your disk space.

Malcolm Gladwell v. Chess

October 28, 2021  •  Malcolm Gladwell | chess | LSAT
What can competitive chess teach us about "fast" v. "slow" thinking skills?

In Season 4, Episode 3 of his podcast Revisionist History, Malcolm Gladwell argues against the use of the LSAT in law school admissions. A full transcript of the episode can be read here.