Chetna Khanna

Master of Science - Data Analytics Engineering, Northeastern University

I am an educationist (with over 5+ years of experience in EduTech domain) who is transitioning into tech as a data professional. I am in love with the field of data science and analytics. Reason - Data has a way to surprise me always with the patterns I could not even think of and there comes the role of science to find the unknown. Being a Mathematics student, I have always loved finding the unknowns (x ๐Ÿ˜‰ ). The โ€œwhyโ€ associated with these patterns helps create a connection between my analytical and creative self and helps me gripped with my work.

I consider myself a problem solver and a forever learner. ๐Ÿ“˜ I love to read, I love to write and I love to share my knowledge.

To provide me a better understanding of how patterns are found from data and decisions are made, I decided to extend my previous master's in mathematics to data science. I am a proud husky of Northeastern University. ๐ŸŽ“

Experience

NLP Engineer Co-op

Jeevahealth.ai (Medufin LLC) - MA, USA

Jeevahealth is a health, wellness, and fitness startup using artificial intelligence to build mental health solutions for school and college-level students

  • Open Source Project: Designed DialogFlow conversion process to simplify chatbot building using one-line terminal command that provides full support - input & output context setting, parameter saving, lifespan setting, entity tagging
  • Built command-line utilities to extract data from online forums like Reddit, Discord & counseling websites
  • Working on building a conversational AI using RNN & deploying using Flask, Gunicorn & Docker on AWS EC2 instance
  • Performed data cleaning on corpora extracted from various online forums & counseling websites
  • Coordinated with team members across the globe for server setup, local installations & pipeline setup
  • Did market research of health & wellness companies already present & the solutions they provide
Technologies Used: Python, TensorFlow, HuggingFace, DialogFlow, Git, GitLab, Jira, Unix, Docker, Node.js, Flask, AWS

Feb 2021 - Present


Machine Learning Engineer Intern

CITAP - Seattle, WA, USA
Guided by: Parag Kulkarni (SVP SaaS Engineering, Nutanix) and Ankur Rastogi (SDE, Amazon AWS)

Developed an end-to-end AI-enabled customer support chatbot as a service integrated with slack & deployed on cloud [Github]

  • Scrapped & processed the Nutanix public documentation to build knowledge base in JSON format
  • Used keywords extraction & LSTM neural network model to predict responses & related article headings & provided links along with some portion of article content
  • Chatbot listens to customer queries in passive mode, answers customer queries that can be answered with high confidence/accuracy (Responds to the query if the prediction probability surpasses a threshold value)
  • Customers can also explicitly wake the chatbot using the word "@missionx"
  • Uses thread to organize discussions
  • Chatbot will be rolled out to 14000+ enterprise customers

Technologies Used: Python, Flask, Google Cloud Platform, Neural Networks, NumPy, BeautifulSoup, NLTK, JSON, Git, Github

Jun 2020 - Aug 2020

Subject Matter Expert

E-Learning Companies - Pearson Education, S Chand, Meritnation - India

  • Designed Profit & Loss statements to help in commisioning profitable educational products
  • Performed weekly analysis of per-user data of Ask and Answer forum (with queries of over 6.5 million users) & ensured answer submission within 24 hours leading to an increase in user registration by almost 10%
  • Researched international curriculums (leading websites, publishers and books) to gain insight into the business model
  • Understood stakeholder's requirements & worked with cross-functional teams to assimilate the requirements
  • Conducted product presentations at the school and national sales level
Aug 2011 - Aug 2018

Education

Northeastern University

Masters of Science - Data Science, GPA: 3.95/4.00
Coursework:
  • Deep Learning and Neural Networks
  • Natural Language Processing
  • Data Mining in Engineering
  • Data Management and Database Design
  • Probability and Statistics for Engineering
  • Computation and Visualization for Analytics
  • Special Topics in Artificial Intelligence
Research Work: Automated an e-learning clientโ€™s FAANGโ€™s interview preparation system - business understanding, data cleaning, building rule-based and machine learning models, evaluating performance (Under guidance of Professor Raman Chandrasekar and Professor Ian Gorton)
Sep 2019 - Dec 2021

Hindu College, University of Delhi

Master of Science - Mathematics, GPA: 3.75/4.00
Relevant Coursework:
Probability Theory, Statistics, Linear Algebra, Complex Analysis, Optimization Techniques & Control Theory, Mathematical Programming, Operations Research
Jul 2009 - Apr 2011

Ramjas College, University of Delhi

Bachelor of Science - Mathematics, GPA: 3.67/4
Relevant Coursework:
Probability & Mathematical Statistics, Algebra, Analysis, Linear Programming & Game Theory, Numerical Analysis & Computer Programming, Calculus
Jul 2006 - Apr 2009

Projects

CIFAR-100 Object Recognition

CIFAR-10 Object Recognition Using Convolutional Neural Networks [Github]

  • Built a CNN model from scratch using TensorFlow to recognize and classify colored images into one of the 100 classes & achieved an accuracy of 59% using image augmentation, max pooling, zero padding, dropout, early stopping
  • Trained a CNN model using current state-of-the-art EfficientNetB0 with image net weights & achieved accuracy of 82% using image augmentation, early stopping, reduce learning rate on plateau & hyper parameter tuning techniques
  • Trained the transfer learning model on NVIDIA GPU and 8 vCPUs machine provided by Google Cloud Platform
Technologies Used: Python, Jupyter Notebook, GCP, Git, Github, GPU

Text Summarization

CNN/Daily Mail, Media Dataset Summarization using Long Short Term Memory

  • Designed a neural attentional sequence-to-sequence LSTM model that can generate abstractive summaries of text
Technologies Used: Python, Jupyter Notebook, Git, Github



Predict Member Churn

Leading US Wholesale Corporation, Customer Dataset

  • Processed 10 GB industry data to predict member churn using Logistic Regression, k-NN, Naive Bayes, Decision Tree & Random Forest
  • Using model selection - cross-validation technique achieved highest accuracy of 72.2% from Random Forest model
Technologies Used: Python, Jupyter Notebook, Git, Github



COVID-19 MONITOR

Developed a COVID-19 dashboard built using R Shiny to monitor geospatial trends of COVID-19 across the globe [App]

  • Monitor displayed live data to showcase the total confirmed, deaths, active & recovered cases using animated & non-animated visualizations
  • Data was fetched through an automated ETL pipeline connected to John Hopkins COVID-19 data repository on GitHub
Technologies Used: R, R Shiny

Property Management System

Developed database to store information related to rental services and maintenance for residential property, residents, and staff as a one-stop solution [Github]

  • Developed database schema, tables, table-level check constraints, computed column & encryption using SQL server
  • Created views to know the apartment portfolio availability, pet owners information, upcoming expiring leases & pending maintenance requests
  • Fetched data from the database to present business reports using Tableau
Technologies Used: Dbeaver, SQL, SQL Server, MySQLWorkbench, Tableau

Oyewiki (Blogging Website)

Developed a DIY tool for authors to write & publish articles, & earn per view (YouTube for readers & writers)

  • Extracted data from MySQL database stored in AWS S3 as SQL files using Python script to perform data analysis [Github]
  • Analyzed data of 10,000 users, 22,000 articles, millions of views to find top read articles, categories & users
  • Did marketing via Quora, email, text messages to increase the website traffic
  • Tracked the fake views generated on articles by users
Technologies Used: Python, Jupyter Notebook, MySQL, SQL, AWS

Conferences COVID-19

Analyzed Kaggle dataset to understand how COVID19 affected the conferences worldwide using graphs and stylized tables with custom backgrounds using PyData tools [Github] [Dataset]

Technologies Used: Jupyter Notebook

Students Performance in Exams

Prepared statistical report with exploratory analysis and hypothesis testing using R to identify factors and extent these factors affect the academic performance of students [Github] [Dataset]

Skills

  • Language: Python, R, SQL, JAVA (Basic), HTML (Basic)
  • Library: Pandas, NumPy, Scikit-Learn, Matplotlib, Seaborn, TensorFlow, Keras, BeautifulSoup, NLTK, spaCy, Flask, R Shiny, ggplot2, dplyr, tidyverse
  • Database: SQL Server, PostgreSQL, MySQL
  • Software/Tool: Git, Github, Gitlab, GCP, AWS (Basic), Tableau, Jupyter Notebook, Colab, Excel, Docaker, PowerPoint, Jira, PyCharm
  • Algorithm: Regression, Classification, Clustering, Machine Learning, Deep Learning / Neural Networks
  • Statistical Methods: Hypothesis Testing, Confidence Interval

Interests

Apart from being a technology enthusiast, I enjoy going on long drives with my family ๐Ÿš˜. Traveling is my hobby and its fun, adventure, excitement, and happiness for me ๐Ÿ˜‡. I love to capture the beauty of nature through my photography ๐Ÿ“ท. I love to meet new people ๐Ÿ‘‹๐Ÿป and explore the beauty of this world ๐Ÿ ๐Ÿ—ป ๐ŸŒˆ .
I love to cook ๐Ÿฝ ๐ŸŒฎ and anything related to art and craft excites me a lot ๐ŸŽจ.

Awards

  • Deanโ€™s Seattle Leaders & Scholars Award, Northeastern University, 2019
  • Ace Award (Best employee award), Vikas Publishing, 2018
  • Star Performer, Vikas Publishing, 2017
  • Steve Jobs (Best creative idea), Pearson Education, 2016
  • Pinnacle (Best Math series), Pearson Education, 2016
  • Bright Spot, Pearson Education, 2015
  • Sitaram Jindal Foundation Scholarship, 2009-11
  • Scholarship for Class X achievements, CBSE and DAV School, 2005
  • Laureate Certificate and Certificate of merit (100% marks in Math), 2005
  • Pratibha Sammaan (More than 85% marks in Class X), 2005