Anshumaan Chauhan

Projects 💻

Guided Conditional Image Generation with Conditional Flow Matching

The project innovatively integrates Conditional Optimal Transport into an attention-based UNet model for both conditional and unconditional image generation tasks. Utilizing a Classifier Free Guidance (CFG) mechanism ensures a unified model’s proficiency across tasks. Addressing the descriptive limitations of the CIFAR10 dataset, the BLIP2 FLAN T5 model is employed for image captioning, enhancing the conditioning process. The self and cross attention mechanism, incorporating timestep and tokenized text, facilitates conditioning. Extensive experimental analysis leads to an optimized architecture with a FID score of 105.54 for unconditional generation and CLIPScore/FID scores of 22.19/385.56 for conditional generation. The research highlights the model’s potential, suggesting further improvements through architectural refinements and extended training.


Patient Tracker System

The Patient Tracker System is a comprehensive solution for medical institutions to efficiently manage patient information, doctor details, appointments, and medical cases. It offers a user-friendly interface for doctors and staff to streamline their workflow and enhance patient care.


Visual Story Generation

Massive Large Language Models such as GPT2, GPT3, PaLM and Llama are rated highly on the task of text generation, however when we explore story generation - the task of generating synthetic coherent and fluent story , then these models often suffer from the problems such as inconsistency, adding new facts such as characters and plot out of nowhere, and moving away from the storyline. To overcome these facts, we are proposing a framework called Visual Story Telling, which comprises of a text generation model and Stable Diffusion Model. Text Generation model is fine tuned on a custom created dataset for the task of content conditioned story generation which is inspired from Plan based/ Hierarchical Story Generation methodology. We proposed a dataset called Plot Summary Dataset which contains information such as Title, Plot, Characters, Inter-Character Relations and Genre, which are used to condition the output of DistilGPT and T5. This generated story is then utilized by Stable Diffusion models for the task of visual conversion in a sentence by sentence format.


Recipe Infusion

This project introduces Recipe Infusion, a framework designed to generate style-infused recipes. The framework consists of two main components: Recipe Generation and Style Infusion. In the Recipe Generation component, a distilgpt2 model is fine-tuned on a processed custom dataset. This dataset is created by combining RecipeBox and RecipeNLG data sources. The fine-tuned distilgpt2 model demonstrates the ability to generate coherent and sensible recipes. Moving on to the Style Infusion component, the project focuses on fine-tuning a conditional generation model called T5 small for the purpose of style transfer. Due to the unavailability of parallel datasets specific to the selected celebrities’ styles, the project utilizes back translation as an approach to create a parallel dataset. This parallel dataset is generated by translating styled sentences back and forth between languages. The resulting parallel dataset is then used to train the T5 model. Once trained, the T5 model is employed to perform style transfer on the generated recipes. By leveraging the learned style representations, the framework enables the infusion of different styles into the recipe content, providing users with recipe variations that reflect specific styles associated with the selected celebrities or other sources. Overall, the Recipe Infusion framework offers a comprehensive approach to generating style-infused recipes, combining both recipe generation and style transfer techniques. The project’s results demonstrate the effectiveness of the approach and its potential to enhance recipe personalization and creativity.


VisionNet

There are several brain-inspired characteristics that have gained a lot of popularity in the past few years, either due to their ability of performing computations efficiently - spiking neurons, or because of their better performance on tasks - attention mechanism. In the field of computer vision, the way Convolutional Neural Networks (CNNs) processes the images differs significantly from how a brain process a vision. We perform an investigative study, that incorporates brain-inspired features such as i) Attention, ii) Multi-Feature Extraction and iii) Lateral Connections into a CNN architecture and observe the affects of these features on the performance metrics (accuracy) of the architecture on the task of image classification. Experiments show that, brain-inspired characteristics in the architecture lead to improvement in performance on the task of image classification on CIFAR10 and CIFAR100 by 1.6% and 3.35% respectively.


CheMapBERT

In this project, we present a novel approach for ingredient matching in cosmetic products using a knowledge-infused language model. We fine-tuned a large language model that is pretrained on domain-specific corpora in order to generate (or match) the label, i.e intended cosmetic use given a list of cosmetic ingredients. An additional step that we did is to incorporate an external source of knowledge - list of possible matches into the model. We show how introducing the external knowledge can affect the performance of the model on this downstream task, quantitatively and qualitatively.


Human Vs AI Sarcasm Detection

The binary text classification task was to differentiate whether the sarcastic sentence made by a Human or an AI. The reason behind choosing this task was - we usually are not capable of easily figuring out whether the text is saracstic or not. Now considering that it is sarcastic, it is even harder to differentiate between a sarcastic sentence by a Human and an AI. Though there are some characteristics that can be used to figure out whether a sentence is made by an AI, but LLMs such as ChatGPT usually do not reflect such behavior in short sentences. Maybe this is evident in a paragraph or a short story.


Scalability Check for Machine Learning System Predicting Flight Delays

Machine learning algorithms have made tremendous progress recently and have been applied to various real-world problems. One of the applications also includes the task of predicting delays in Flight timings, which is one of the serious problems faced by the Airline business. However, the problem with machine learning models involving deep learning is that they need - high computational power systems to train and store the model. Additionally, the system needs to be made end to end scalable and is ignored by the trends in recent research. This paper discusses the approach we have used to predict the delay in flight by framing this as a prediction problem. Most of the research in machine learning goes into the part of predictive modeling. However, we focus on the end-to-end aspect of the problem by using industry-standard high-performance systems such as MySQL and SparkSQL. We show that our solution can not only be used for predictive modeling but also provides an end-to-end explanation of the whole product with faster real-time predictions and scalability.


Airline Booking Database Management

This project aimed to imitate the functionality of Google’s Flight Booking System. An artificial flight dataset was created manually which was later queried for searching and booking with the help of JDBC and MySQL. Queries were performed based on the user preferences specified through the designed Graphical User Interface (GUI).


I have linked the implementations of the above projects available on my Github.