MolSearchGPT - A chemical similarity search using AI-generated embeddings

Through my work at Frontier Medicines, an early-stage covalent drug discovery company, I developed a similarity search tool for a space of 15 billion synthesizable molecules.

To do this, I represented molecules as vector embeddings using a proprietary transfomer model fine-tuned according to Frontier’s use case. In addition, I featurized every molecule in the search space to allow multiple search fields and relevant filters by running computational workloads on distributed clusters, and stored this rich feature information using a data warehouse on AWS.

The end product was offered to chemists using a UI I built with Streamlit that kickstarted an AWS SageMaker pipeline job. The pipeline job ran each component of the workflow, including the parsing of query molecules, the vector similarity search, metadata retrieval from the data warehouse, and appropriate clustering and filtering.

More details about how I leveraged vector databases for semantic search can be found on the Pinecone website, detailing Frontier’s adoption of their serverless product in an early-access program.

A bite-sized version of the project intended as a proof-of-concept can be found on my GitHub.

First-author publication in computational neuroscience

I’m excited to announce the publication of a first-author paper under mentorship of Dean Buonomano at UCLA and in collaboration with Ash Tanwar. This paper was the culmination of years of work using deep learning models trained on tasks designed to tease out explanations for how the brain represents timing and memory through the intrinsic dynamics of neural circuits.

The paper can be read in full here. Here are the highlights:

  • A recurrent neural network (RNN) can implement timing, working memory (WM), and the comparison of intervals.
  • RNN units (neurons) exhibit mixed selectivity (i.e. not tuned only to timing or WM, but tuned to both)
  • Units contain more information about the timing of the interval than the WM of the interval
  • Our work predicts how a human brain could represent timing and WM information

About This Website

I am currently using this website as a portfolio. You’ll find a short bio and a link to my Github on the About page. The links to my publication and presentations can be found on the Research page.