Quantisation Notes

Lie mao blog very detailed with code https://iq.opengenus.org/basics-of-quantization-in-ml/ See for code for activation aware quantization Hugging face quantisation [Pytorch quantisation api docs] There are 4 methods to optimise a model, GPTQ, activation aware quantized training, bits and bytes , packages like huggingfuace optimum, or pytorch api itself. Scripting the model can make inference faster for a few reasons: Reduced Overhead: Scripted models can have lower overhead compared to their original Python counterparts because the script represents a more optimized version of the model’s forward pass.

Distributed training Notes

Data Parrallel Each GPU holds the full copy of the model Each GPU/Process/worker gets a different copy of the data to train on After each backword pass the master node will average out the model parameters . this averaged model will be shared between the workers again. Distributed Data Parrallel after a forward pass, gradients are calculated for each worker and the master node then averages out the gradients , calculates the new model weights and shares these with the workers.

Portfolio

Hi, after my extensive work on Content Moderation Systems at Sharechat, I consider myself a generalist who can train, finetune, deploy Deep Learning Models across different modalites (Vision, NLP, Audio) in scalable production environments. My everyday work has included everything from Microsoft Excel to Kubernetes and working together with folks from Product, Operation, Data Scientists and Engineers alike. With a base in Computer Vision, I have a thorough understanding of how to train/finetune LLM’s and how to build RAG applications with them as well.

Deploying Models at Scale using Torchserve

AI Shit TorchServe is a flexible and easy to use tool for serving PyTorch machine learning (ML) models at scale. It is part of the PyTorch ecosystem and was developed in collaboration with AWS to facilitate the deployment of PyTorch models in production environments. TorchServe simplifies the process of deploying PyTorch models by providing a straightforward and standardized way to package and serve them. It supports multiple types of models, including those for image and text classification, object detection, and more.

Pushing New Rows to BigQuery Table in GCP using Go

In this blog post, we’ll explore how to push new rows into a BigQuery table using Go. BigQuery, a serverless and highly-scalable data warehouse, is a part of Google Cloud Platform (GCP). We will be using the cloud.google.com/go/bigquery package for Go to interact with BigQuery. Introduction Prerequisites Before diving into the code, make sure you have the following set up: A GCP project with BigQuery enabled Service account credentials in a JSON file Go installed on your machine Template package main import ( "context" "fmt" "log" "time" "cloud.

Dev Containers: Open Vscode editor inside any Docker image

Let’s say you have a docker image for an application and you want to run some test or experiment/add some new feature to that application. Normally I would build that image locally and mount the application directory as volume when I run that container. But something better exists Using Dev Containers is better because It gives the VS code experience for any docker image. Different VS code extensions can be used here like Linting, Copilot etc.

The Annotated LLaMA

Foreword Welcome to “The Annotated LLaMA”. One of the most brilliant and well-explained articles I have read is The Annotated Transformer and the Annotated DETR. It introduced Attention like no other post. The simple idea was to present an “annotated” version of the paper Attention is all you need along with code. I have tried to do something similar with the LLaMA models by Meta Research, without which the commercial use of many Large Language models would not have been possible.

A Self-Supervised Descriptor for Image Copy Detection - Review

A Self-Supervised Descriptor for Image Copy Detection - Review [Paper][Code] They have built upon the work of SimCLR and successfully tackled its limitations. Do give this paper a read if you are looking for a way of generating powerful embeddings/descriptors for your image dataset. Good things about the paper It Introduces regularisation term based on Entropy which is used to make the descriptors more sparse. Which means that negative images wont be as “close” to each other as they used to be in SimCLR.

TMUX for Machine Learning Engineers

What is Tmux TMUX (Terminal Multiplexer) is a program which helps create and manage various terminal sessions created from a terminal itself. We can detach these newly created terminal which helps in asyncronously running multiple programs. These terminal will keep on executing a particular command in the background untill we explicitly stop it after attaching it to an active terminal session. We can create multiple terminal sessions and view and manage them in the same window by toggling between them.

Docker Cheatsheet

But it works on my machine !!?? The above sentence is exactly the problem docker solves - Earlier there was no way to run 2 applications (different OS) on the same machine. VMware solved this problem by introducing Virtual Machines. But we would have to separately assign RAM and storage for our second machine. This was still a bottleneck as we can’t ship applications effectively with this, which is why Docker was invented.

Mobile-VIT [Paper Summary]

Papers With Code Observations Theres a global inductive bias in CNN’s (invariance to shift and scale) which is why CNN’s have comparable performance w.r.t Transformers (Reference to this statement is in the Transformer survey paper). Transformer models overcome this with the help of extensive training regimes, large datasets and larger models. (It will be good if we mention this in the paper somewhere) CoreML library was used to perform testing on I- phone 12 Good things about the paper the paper has two significant contributions A novel architecture which combines convolution block from MobileNetV2 and the self attention block.

About Me

Who am I? Hi, my name is Nishant Bhansali. I’m working as a Machine Learning Engineer at Sharechat,working remotely from Ahmedabad,India. I have majorly worked on solving Computer Vision and Digital Image Processing based problems, ands thats where I would say my expertise lies. Be it recent transformer architectures or archaic Image processing algorithms, I have my my hands dirtied by almost everything vision based. My work at Sharechat has been around Image Enhancement and Image Quality assessement.