Vincent is a senior data scientist at IBM services and advanced analytics practice in North America distribution-industry, helping IBM's clients embrace AI and Big Data capabilities. Vincent graduated from M.S. in Analytics at Northwestern University with emphasis on Predictive Analytics, Machine Learning, Deep Learning, Data Visualization, Databases & Information Retrieval, etc. He is passionate about combining the power of data science and wild creativity to solve real-world business challenges.
Data Science Intern - Mobile Application Analytics 2016
Analytics Intern - Audi Innovation Research (AIR) 2017
Data Science Practicum Consultant 2017
Data Science Intern, Cognitive Lab - Lighthouse 2018
Sr. Data Scientist, Advanced Analytics - Distribution Industry 2019-Present
Python, R, SQL, Java, Hadoop, Spark, Hive, C++, Javascript, HTML, D3.js, Docker
Predictive Analytics I-II
Optimization and Heuristics
Data Mining
Deep Learning
Analytics for Big Data
Data Visualization
Data Warehousing
Text Analytics
AWS, Azure, Tableau, MySQL, Postgres, Hadoop, Spark, Hive, Apache, Linux,
AWS Certified Developer - Associate
Azure Certified Specialist
Associate Certified Analytics Professional (INFORMS)
Lean Six Sigma - Green Belt
A Team Based Player Versus Player Recommender
Systems Framework For Player Improvement
in: NSW, Australia, ACSW, 2019
Check it out3.78 GPA, 2013-2017
3.90 GPA, 2015-2016
3.86 GPA, 2017-2018
Launched end-to-end document digitization pipelines, which automate data extraction from PDF documents to digital insights using OCR, text classification & annotation, and Kafka, which reduced costs of routine data collection task by 90%.
Launched a proof-of-concept insight platform for Men's grooming brands, wining $500K+ contract for implementation.
Demonstrated 20% increase of product revenue by applying GMM to segment customer clusters and classification models to identify loyal customers given their psychometry, demography review sentiments, and geography data.
Built data pipeline to automate ETL process of unemployment claims data from Salesforce using Python, SOQL, and Apex.
Curated unemployment data with COVID-19 and economic data to extract insights and operationalize it for business processes.
Worked with a Fortune 10 telecom to deliver virtual agents to production through integrating Watson Assistant, Service Orchestration Engine (Node.js), Cloudant DB, and client services (API and RPA) to improve field technician service efficiency and accuracy.
Developed web crawlers in Python (scrapy and splash) to automate data collection processes and enhance Watson capabilities.
Worked with a Fortune 5 oil and gas company to develop, orchestrate, and implement a Azure based data ingestion platform that harmonized data from multiple sources with multiple formats using Hadoop Data Platform in Azure Cloud (Hortonworks, Event Hub, Docker, Kubernetes, Logic App).
Developed Neo4J graph database optimizes storage of multiple data source. Performed NLP and AI capabilities to discover data relations from legacy data warehouse.
We utilized Mask R-CNN Model to detect and classify road objects in images captured by front-facing car cameras. We also developed our own script to evaluate the model based on IOU and MAP metricsWe specifically focused on 6 classes of objects: car, motorcycle, bicycle, pedestrian, truck, and bus.
Check it outDeveloped AWS Web App using Flask Framework to predict FIFA18 Player Transfer Value. The web app was finally deployed on AWS Beanstalk environment.
Check it outDevelop a hybrid content and knowledge based team-profile recommendation framework for player-versus- player (PvP) online multiplayer shooter game Destiny II, a multiplayer first-person shooter, with data provided by the game’s developer Bungie Studio
Check it outConducted discrete time survival analysis with logistic regression to predict customer repurchase likelihood on top 15 retailers from Shoprunner.
Check it outApply clustering methods to segment customers based on purchasing patterns identified in transaction data along with geographical, competitor, and other external data sources; create end-user Tableau report for marketing team
Check it outThe Network Analysis and Visualization on Chicago Botanic Garden Plants of Concern (POC) program, which has monitored 283 endangered, threatened, and rare species at over 300 sites in 1,315 populations throughout the Chicago Wilderness region since their founding in 2000.
Check it outConverted raw Yelp review data (3.46G) to vector space for Natural Language Processing in Python (pandas, sklearn, tf-idf); Developed classification models (Logistic Regression, Naive Bayes, and Random Forest) based on restaurants’ ratings; Clustered users and identified the common users’ preferences by inspecting the cluster centroids
Check it outPerformed EDA on Greenwich.hr database to identify geospatial labor data for U.S. healthcare and IT labor markets. Created four interactive dashboards in Tableau to visualize job opportunities, salaries, regional affordability, and capital income.
Check it outWe analyzed grocery sales data from Corporación Favorita, a large Ecuadorian-based grocery retailer, and aimed to forecast products’ unit sales. After performing 11 different supervised learning models, we found that the random forest model demonstrated the best predicting power for unit sales.
Check it outPerformed feature engineering and visualization (shiny) given 59 features from 2.9M raw housing data in R. Optimized Zillow’s predictive models by fitting linear regression model and tree-based models (Random Forest, XGBoost).
Check it out