Vincent Wang Sr. Data Scientist @ IBM | Northwestern M.S. in Analytics | AWS and Azure Certified Practitioner

About Me

Vincent is a senior data scientist at IBM services and advanced analytics practice in North America distribution-industry, helping IBM's clients embrace AI and Big Data capabilities. Vincent graduated from M.S. in Analytics at Northwestern University with emphasis on Predictive Analytics, Machine Learning, Deep Learning, Data Visualization, Databases & Information Retrieval, etc. He is passionate about combining the power of data science and wild creativity to solve real-world business challenges.

Journey so far

mountains

Airport Authority Hong Kong

Data Science Intern - Mobile Application Analytics 2016

mountains

Audi

Analytics Intern - Audi Innovation Research (AIR) 2017

mountains

BP North America

Data Science Practicum Consultant 2017

mountains

KPMG US

Data Science Intern, Cognitive Lab - Lighthouse 2018

mountains

IBM Global Business Services

Sr. Data Scientist, Advanced Analytics - Distribution Industry 2019-Present

Programming

Python, R, SQL, Java, Hadoop, Spark, Hive, C++, Javascript, HTML, D3.js, Docker

Coursework

Predictive Analytics I-II

Optimization and Heuristics

Data Mining

Deep Learning

Analytics for Big Data

Data Visualization

Data Warehousing

Text Analytics

Tools

AWS, Azure, Tableau, MySQL, Postgres, Hadoop, Spark, Hive, Apache, Linux,

Certifications

AWS Certified Developer - Associate

Azure Certified Specialist

Associate Certified Analytics Professional (INFORMS)

Lean Six Sigma - Green Belt

Publication

A Team Based Player Versus Player Recommender

Systems Framework For Player Improvement

in: NSW, Australia, ACSW, 2019

Check it out

Education

mountains

BEng in Industrial Engineering, The Hong Kong Polytechnic University

3.78 GPA, 2013-2017

mountains

International Student Exchange Program, University of Pittsburgh

3.90 GPA, 2015-2016

mountains

Master of Science in Analytics, Northwestern University

3.86 GPA, 2017-2018

Featured Projects

mountains

Multinational Investment Bank

  • NLP & Document Digitization Pipeline

Launched end-to-end document digitization pipelines, which automate data extraction from PDF documents to digital insights using OCR, text classification & annotation, and Kafka, which reduced costs of routine data collection task by 90%.

mountains

Global Consumer Goods Company

  • Customer Marketing Analytics

Launched a proof-of-concept insight platform for Men's grooming brands, wining $500K+ contract for implementation.

Demonstrated 20% increase of product revenue by applying GMM to segment customer clusters and classification models to identify loyal customers given their psychometry, demography review sentiments, and geography data.

mountains

State Department of Labor

  • Unemployment Analytics

Built data pipeline to automate ETL process of unemployment claims data from Salesforce using Python, SOQL, and Apex.

Curated unemployment data with COVID-19 and economic data to extract insights and operationalize it for business processes.

mountains

Field Technician Virtual Agent

  • Chatbot Development and Web Scraping

Worked with a Fortune 10 telecom to deliver virtual agents to production through integrating Watson Assistant, Service Orchestration Engine (Node.js), Cloudant DB, and client services (API and RPA) to improve field technician service efficiency and accuracy.

Developed web crawlers in Python (scrapy and splash) to automate data collection processes and enhance Watson capabilities.

mountains

Azure Digital Big Data Foundation

  • Cloud Data Pipeline Development

Worked with a Fortune 5 oil and gas company to develop, orchestrate, and implement a Azure based data ingestion platform that harmonized data from multiple sources with multiple formats using Hadoop Data Platform in Azure Cloud (Hortonworks, Event Hub, Docker, Kubernetes, Logic App).

Developed Neo4J graph database optimizes storage of multiple data source. Performed NLP and AI capabilities to discover data relations from legacy data warehouse.

mountains

Autonomous Driving - Object Detection and Segmentation

  • Deep Learning

We utilized Mask R-CNN Model to detect and classify road objects in images captured by front-facing car cameras. We also developed our own script to evaluate the model based on IOU and MAP metricsWe specifically focused on 6 classes of objects: car, motorcycle, bicycle, pedestrian, truck, and bus.

Check it out
mountains

FIFA18 Soccer Player Transfer Value Prediction

  • Cloud Application Development

Developed AWS Web App using Flask Framework to predict FIFA18 Player Transfer Value. The web app was finally deployed on AWS Beanstalk environment.

Check it out
mountains

Destiny II Gaming Analytics

  • Gaming Recommender System

Develop a hybrid content and knowledge based team-profile recommendation framework for player-versus- player (PvP) online multiplayer shooter game Destiny II, a multiplayer first-person shooter, with data provided by the game’s developer Bungie Studio

Check it out
mountains

Shoprunner Customer Survival Analysis

  • Repurchasing Analysis

Conducted discrete time survival analysis with logistic regression to predict customer repurchase likelihood on top 15 retailers from Shoprunner.

Check it out
mountains

Customer Segmentation

  • Clustering and CLV Modeling

Apply clustering methods to segment customers based on purchasing patterns identified in transaction data along with geographical, competitor, and other external data sources; create end-user Tableau report for marketing team

Check it out
mountains

Chicago Bontanic Garden - Visualization and Network Analysis

  • D3 and Tableau

The Network Analysis and Visualization on Chicago Botanic Garden Plants of Concern (POC) program, which has monitored 283 endangered, threatened, and rare species at over 300 sites in 1,315 populations throughout the Chicago Wilderness region since their founding in 2000.

Check it out
mountains

Yelp Data Challenge

  • Text Analytics

Converted raw Yelp review data (3.46G) to vector space for Natural Language Processing in Python (pandas, sklearn, tf-idf); Developed classification models (Logistic Regression, Naive Bayes, and Random Forest) based on restaurants’ ratings; Clustered users and identified the common users’ preferences by inspecting the cluster centroids

Check it out
mountains

U.S. Labor Market Intelligence

  • Spatial Visualizations and Correlations

Performed EDA on Greenwich.hr database to identify geospatial labor data for U.S. healthcare and IT labor markets. Created four interactive dashboards in Tableau to visualize job opportunities, salaries, regional affordability, and capital income.

Check it out
mountains

Corporación Favorita Grocery Sales Prediction

  • Predictive Analytics

We analyzed grocery sales data from Corporación Favorita, a large Ecuadorian-based grocery retailer, and aimed to forecast products’ unit sales. After performing 11 different supervised learning models, we found that the random forest model demonstrated the best predicting power for unit sales.

Check it out
mountains

Zillow House Price Prediction

  • Kaggle Competition

Performed feature engineering and visualization (shiny) given 59 features from 2.9M raw housing data in R. Optimized Zillow’s predictive models by fitting linear regression model and tree-based models (Random Forest, XGBoost).

Check it out