Data Science · Baseball Analytics · Computational Chemistry

Thomas Felton

Undergraduate data scientist at William & Mary working at the intersection of baseball analytics, machine learning, and computational science.

B.S. Data Science, AI Concentration · Class of 2027 GPA 3.79 · Dean's List × 6

01

About

I'm a fourth-year student at the College of William & Mary studying data science with a concentration in artificial intelligence and a minor in chemistry. My technical work outside of class spans two worlds I genuinely care about: building predictive models for baseball and developing computational pipelines for physical chemistry research.

In Dr. Tyler Meldrum's Spin Lab, I've spent two summers developing end-to-end molecular dynamics pipelines to simulate single-sided NMR T₂ relaxation for binary mixtures, with a manuscript in preparation. On the baseball side, I serve as Student Manager for Offensive Analytics & Advanced Scouting for William & Mary Baseball, during the season I build scouting pipelines and produce reports for 10–20 opponents a season. In the offseason I build analytical models like the stuff+ project. I'm also pitching coach and president of W&M Club Baseball.

My goal after I finish graduation is an entry level role as a data scientist. My dream position is in an MLB front office, yet I am open to any opportunity. Outside of my work, I enjoy fishing, hiking, exploring and discovering new places, and pitching whenever possible. I am also a semi-professional blitzballer.

02

Experience

Laboratory Assistant

Meldrum Physical Chemistry Lab · William & Mary

  • Developed an end-to-end molecular dynamics pipeline using OpenMM and MDAnalysis to compute and validate T₂ relaxation times, orchestrating 1,000+ SLURM job submissions on W&M's HPC cluster via MPI and Bash.
  • Engineered a validation suite comparing simulated vs. experimental T₂ values and automated publication-quality figures for diffusion and T₂ profiles using Matplotlib, SciPy, and LaTex.
  • Manuscript in preparation: Computational Modeling of T₂ Relaxation via Single-Sided NMR (target submission: August 2026).

Student Manager — Offensive Analytics & Advanced Scouting

William & Mary Baseball

  • Built a semi-automated scouting pipeline in Python that scrapes and aggregates opposing-pitcher data from Synergy and TrackMan, eliminating manual data entry.
  • Produced integrated scouting reports — combining Google Sheets analytics and video — via python integration, for 10–20 opponents per season, delivered to coaching staff and players ahead of each series.

Pitching Coach & President

William & Mary Club Baseball

  • Provide individualized mechanical and pitch-development coaching while directing in-game strategy based on batter–pitcher matchups and high-leverage situations.
  • Design structured practice programs driving team-wide development; recognized as NCBA 1st Team All-Region Pitcher, Mid-Atlantic North.

03

Projects

MLB Pitch Quality (Stuff+) Evaluation System

In progress

Modeling pure stuff quality — independent of location — across 700K+ Statcast pitches.

Independently replicated and extended the FanGraphs Stuff+ methodology across two end-to-end pipelines trained on 700K+ Statcast pitches (2015–2025): a cascaded ensemble (Random Forest → contact classifier → LightGBM regressor) and a PyTorch MLP (256→128→64, GELU, BatchNorm) that isolates pure stuff quality from 12 physical features with no location confounders.

Applied per-pitch-type normalization (average = 100 per type), GroupKFold cross-validation grouped by pitcher to prevent leakage, and a gradient-based stability regularizer penalizing year-over-year inconsistency. Validated via Spearman correlation and quartile persistence metrics. Deployed on W&M's HPC cluster (Sciclone) via SLURM with automated multi-year data ingestion, model serialization, and leaderboard generation.

  • Python
  • PyTorch
  • LightGBM
  • scikit-learn
  • pybaseball
  • Statcast
  • SLURM

Computational Modeling of T₂ Relaxation via Single-Sided NMR

Manuscript in prep

Simulating molecular T₂ relaxation for binary mixtures from first principles.

A research project in the Meldrum Spin Lab predicting T₂ relaxation times of substances and molecular mixtures from physics-based first principles. Simulates molecular behavior in the presence of a single-sided NMR magnet using molecular dynamics, with target manuscript submission in August 2026.

Built an automated simulation pipeline combining OpenMM, MDAnalysis, SLURM, MPI, and tmux on W&M's HPC cluster, with a validation suite comparing simulated vs. experimental T₂ values across a panel of test mixtures.

  • OpenMM
  • MDAnalysis
  • SLURM
  • MPI
  • Bash
  • Python
  • SciPy

Automated Google Business Profile Pipeline

Shipped · Client work

Replacing daily manual social posting with a fully automated content pipeline for a Northern Virginia realtor.

Designed and deployed an end-to-end content automation system for Andrew Capuano, a realtor and certified appraiser serving the Gainesville and Bristow area. The pipeline replaces roughly 20 minutes of daily manual work with a fully hands-off system that publishes professional, localized Google Business posts every morning.

On a daily schedule, the workflow draws a randomized hook–topic pair from a curated content library, selects an image at random from a pool of 20 pre-sized assets, generates the post copy via a ChatGPT step keyed to the day's inputs, and publishes the formatted post directly to the Google Business Profile API through Zapier. The output is consistent, on-brand, and indistinguishable from manual posting.

  • Zapier
  • ChatGPT API
  • Google Business Profile
  • Workflow Automation
  • Content Systems

04

Skills

Languages

Python · R · SQL · Bash · LaTeX

Machine Learning & Data Science

PyTorch · scikit-learn · LightGBM · CatBoost · statsmodels · SciPy · pandas · NumPy · Matplotlib · Seaborn

Statistics & Modeling

Regression · neural networks · random forests & gradient boosting · generative models (diffusion, VAE, transformers) · cross-validation with leakage controls

Baseball Analytics

Statcast (pitch- and event-level) · pybaseball · MLB Stats API · TrackMan · Synergy · Baseball Savant · pitch-quality modeling (Stuff+) · xwOBA & run-value frameworks · advanced scouting workflows

Simulation & HPC

OpenMM · CHARMM · MDAnalysis · SLURM · MPI · TCSH · tmux · W&M Sciclone HPC cluster

Data Engineering & Tools

JSON · CSV · NumPy binary (.npz) · molecular simulation formats (.dcd, .pdb, .mol2) · Git · Jupyter · Google Sheets · Excel · Cinema4D

05

Coursework

Selected courses completed or in progress at William & Mary, ordered by relevance.

DATA 301

Applied Machine Learning

MATH 351

Probability & Statistics for Scientists

MATH 352

Statistical Data Analysis

DATA 302

Databases

DATA 303

Data Visualization

DATA 201

Intro to Data Science

DATA 209

Applied Linear Algebra & Calculus

DATA 440

Supercomputing for Science

DATA 446

Generative AI · in progress

DATA 442

Neural Networks & Deep Learning · Fall 2026

DATA 448

Reinforcement Learning · Fall 2026

DATA 451

AI Systems · Fall 2026

MATH 212

Multivariable Calculus

CSCI 141

Computational Problem Solving

06

Contact

Open to internships, research collaborations, and conversations about baseball analytics or computational science. The fastest way to reach me is email.