Archive | Tags


September 27, 2023  •  llms | ai | data exploration | pandas | python
Trying out LIDA, an LLM-based data exploration tool from Microsoft Research

Diffusion Models from Scratch

July 1, 2023  •  diffusion models | machine learning | python | tutorial
Beginner's tutorial on how diffusion models work, with Python code + mathematical derivations and explanations

Python Argparse Cheatsheet

June 16, 2023  •  python | cli | argparse
Basic argparse template for Python 3

Productivity Tips for Jupyter Notebook

June 15, 2023  •  jupyter | python | vscode
How to be more productive using Jupyter notebook

Publish Python Package on PyPI with Poetry

April 27, 2023  •  python | poetry | packaging

How to package a Python library using Poetry and publish it on PyPI.

Screen utility - Enable scrolling by default

March 23, 2023  •  screen | linux | cli
Add `termcapinfo xterm* ti@:te@` to your `~/.screenrc` to enable normal scrolling

The default screen utility annoyingly requires you to hit Ctrl+A [ to scroll back through your terminal’s output buffer (by switching to “copy mode”). Otherwise, scrolling will cause you to cycle past your previous commands instead of scrolling back up in your terminal’s output.

Flaregun - A Tiny PyTorch Helper Library

March 20, 2023  •  pytorch | ml
Figure out how much GPU memory is available on an Nvidia device in real-time, how many params are in a PyTorch model, etc.

Notes for Harvard's CS 287 NLP Course

March 14, 2023  •  AI | nlp | lecture notes

NOTE: These are all taken from Chris Tanner’s great NLP course CS 287 (taught at Harvard). None of this is my own work. This is just a collection of screenshots and notes from my own reading of the course’s lecture slides for my own reference and understanding. I would highly recommend reading the full lecture slides available here.

Helpful Linux Commands

February 28, 2023  •  linux | terminal
A list of helpful Linux commands, such as checking GPU memory usage, listing processes running on a specific port, etc.

How to Analyze Memory Usage of Folders/Files on your Mac

January 3, 2023  •  Mac | memory

Three ways to view memory usage (i.e. “disk usage”) on your Mac, in descending order of preference.

Walkthrough of the OMOP CDM (Part 1)

December 30, 2022  •  omop | databases | health IT | ehrs | healthcare
Defining core concepts in the OMOP CDM

In Part 1 of this series on the OMOP CDM, we explain what the OMOP CDM is, define key terms such as “concept”, “source value”, “vocabulary”, and “domain”, and describe how they related to each other.

Python Line-by-Line Profiling of a Program's Speed

December 12, 2022  •  python | profiling | devops
How to profile your Python program line-by-line in 0 lines of code, 2 lines of shell commands

How long does each line in your Python program take to run?

How are Stanford's STARR database,the OMOP Common Data Model, and Epic's EHR Related?

December 1, 2022  •  omop | ehrs | stanford | epic | health IT
Overview of Epic, OMOP CDM, and Stanford's STARR database, and how can to use them for EHR research

In this post, I explain what Epic (Chronicles, Clarity, etc.), OMOP CDM, and Stanford STARR are, how they are all related, and what the benefits/features of each are.

How to connect VSCode to your remote server via SSH

October 21, 2022  •  devops | vscode | ssh
Plus debugging tips for common error messages and pitfalls.

If you’ve ever had to ssh into a server to run programs, you may be taking an unnecessary productivity hit each time you relegate yourself to coding in a Jupyter notebook on localhost:8000.

Plotting the Distribution of MLB Batting Statistics Over Time

September 13, 2022  •  baseball | visualization | matplotlib
What is a "good" batting average? An "average" OBP? An "elite" SLG?

Most people know that a batting average over .300 is the mark of a great hitter, and that hitting .200 will land you on the bench.

How to Publish Jupyter .ipynb Notebooks to a Jekyll Static Blog

September 13, 2022  •  jupyter | devops | blogging
How to convert .ipynb to .md files and publish them on your static blog (with images, SVGs, etc.)

Goal: Publish a .ipynb on my Jekyll static site as painlessly as possible.

Matplotlib Tips + Tricks

August 13, 2022  •  python | visualization | matplotlib
Lesser-known matplotlib tricks for frequent users of the Python library

These are all taken from this great 3-hr YouTube tutorial by Ben Root from SciPy 2018. I’ve condensed the main takeaways of the talk into the following list of key concepts / tricks that I hadn’t previously been aware of.

Combining ROC Curves with Indifference Curves to Measure an ML Model's Utility

August 10, 2022  •  AI | model evaluation | statistics

For the below examples, assume we have a binary classification task where the class label is $y \in {0, 1}$, and the model’s predictions are $\hat{y} \in {0, 1}$

Sensitivity + Specificity + PPV + TP/FP/TN/FN Formulas

August 2, 2022  •  statistics | AI | models | probability
Formulas for sensitivity, specificity, PPV, NPV, TPR, FPR, prevalence, etc.

A brief cheat sheet / reference guide containing the definitions, formulas, and explanations of the most commonly used model evaluation metrics for binary classification tasks.

What is Tidy Data?

January 25, 2022  •  data | databases
The simplest definition of tidy data.

The definition of “tidy data” is simple: