Mathematics and economics graduate from Pomona College. Currently PhD student at New York University (NYU). Interested in data science and data-intensive computing with applications in sports, health and business.
I'm deeply interested in the design, development and evaluation of cost-effective computer architectures for data science workloads and in the development of innovative algorithms, techniques and software to make data science more impactful. I maintain applied interests in sports, public health and economics.
Exposure in a research laboratory, professional sports and venture capital
I am working with my advisor, Professor Claudio Silva, as part of the Visualization Imaging and Data Analysis Center.
Enjoyed my time in Southern California as a double major in mathematics and economics at Pomona College. Graduated in May 2018. Wrote my mathematics thesis on an original decision tree induction algorithm and my economics senior exercise on the relationship between drug overdose death rates and economic conditions. My advisor was Professor Pierangelo De Pace. I took the following coursework:
I also was part of the student investment club, where I led the healthcare and technology group; Pomona Ventures, where I helped student run startups expand and pitch to investors, and in particular advised Social Cipher, which won the 2018 Sage Tank competition; Pomona Sports Analytics Club, which I founded my junior year to provide an outlet for sports statistics research at the Claremont Colleges.
Contrary to banking or consulting internships that one typically looks for their junior year, I joined the Philadelphia Phillies baseball club as a quantitative analyst. On the baseball side, I wrote player analysis for the general manager, created player acquisition shortlists and assisted in the draft process. On the data science side, I expanded upon a variety of different classification, regression and clustering algorithms to do things I can't quite say on here.
Selected as part of the Silicon Valley Program (SVP) from Claremont McKenna College, I worked full-time as an analyst at CrunchFund, a seed-stage venture capital firm in San Francisco, while also taking 3 courses. I analyzed the investment opportunities of startups as well as follow-on investment analysis. The work included talking to the companies, meeting with founders, reviewing decks and conducting diligence.
Was part of the Advanced Data and Workflows Group inside the Oak Ridge Leadership Computing Facility. Conducted research on computer architectures for big data analysis. Work led to two publications, one of which as first author.
J. Harney, S.H. Lim, S. Sukumar, D. Stansberry, P. Xenopoulos, "On-Demand Data Analytics in HPC Environments at Leadership Computing Facilities: Challenges and Experiences", 2016 IEEE International Conference on Big Data [December 2016]
P. Xenopoulos, J. Daniel, M. Matheson, S. Sukumar, "Big Data Analytics on HPC Architectures: Performance and Cost", 3rd Workshop on Advances in Software and Hardware for Big Data to Knowledge Discovery (ASH), part of 2016 IEEE International Conference on Big Data [December 2016]
Spent my first summer in college working at The Company Lab, a startup and business accelerator in Chattanooga, Tennessee. Utilizing Chattanooga's gigabit network and growing small business community, the accelerator was an intensive bootcamp to help companies expand, pitch to investors and ultimately stay in the Chattanooga ecosystem. Learned quite a bit about small (particularly technology enabled) businesses and how a small city grows an entrepreneurial community.
"Research is formalized curiosity. It is poking and prying with a purpose." - Zora Neale Hurston
On-Demand Data Analytics in HPC Environments at Leadership Computing Facilities: Challenges and Experiences
J. Harney, S.H. Lim, S. Sukumar, D. Stansberry, P. Xenopoulos, 2016 IEEE International Conference on Big Data
Big Data Analytics on HPC Architectures: Performance and Cost
P. Xenopoulos, J. Daniel, M. Matheson, S. Sukumar, 2016 IEEE International Conference on Big Data
Time Series Analysis of Resource Failure in HPC Environments - I want to use the same data set as A Large-Scale Study of Failures in High-Performance Computing Systems, but take a time series approach to analyzing the data (suggested in the conclusion).
Large Scale Analysis of Hard Drive Failures - In line with a growing interest in building reliable systems, I found gigabytes worth of hard drive data from Backblaze's data centers and plan to analyze the data set.
Performance Variability in the Cloud and Effects on Data Intensive Workloads - With this project I hope to quantify the variability in performance between the cloud and traditional HPC infrastructures and quantify the performance effects of resource contention.
Class imbalance and decision tree induction techniques - Based off of my math thesis
Drug Overdose Death Rates and Economic Conditions - Based off of my senior economics work
Expected Points in Soccer - With this project I hope to create a similar metric to Expected Goals (xG) and use the Skellam distribution to determine a team's expected points total.
This is a collection of data sets I have amassed and cleaned over the years