Peter Xenopoulos

Solving problems with data Interested in data science, computing, sports and health Learn More
Peter Xenopoulos Headshot
Peter Xenopoulos

I'm Peter Xenopoulos

Mathematics and economics graduate from Pomona College. Currently PhD student at New York University (NYU). Interested in data science and data-intensive computing with applications in sports, health and business.

I'm deeply interested in the design, development and evaluation of cost-effective computer architectures for data science workloads and in the development of innovative algorithms, techniques and software to make data science more impactful. I maintain applied interests in sports, public health and economics.

My History

Exposure in a research laboratory, professional sports and venture capital

Education
2018 - Current

I am working with my advisor, Professor Claudio Silva, as part of the Visualization Imaging and Data Analysis Center.

2014 - 2018

Enjoyed my time in Southern California as a double major in mathematics and economics at Pomona College. Graduated in May 2018. Wrote my mathematics thesis on an original decision tree induction algorithm and my economics senior exercise on the relationship between drug overdose death rates and economic conditions. My advisor was Professor Pierangelo De Pace. I took the following coursework:

  • Parallel & High-Performance Computing
  • Big Data Platforms
  • Computational Statistics
  • Statistical Theory
  • Probability
  • Econometrics
  • Time Series Econometrics (Graduate)

I also was part of the student investment club, where I led the healthcare and technology group; Pomona Ventures, where I helped student run startups expand and pitch to investors, and in particular advised Social Cipher, which won the 2018 Sage Tank competition; Pomona Sports Analytics Club, which I founded my junior year to provide an outlet for sports statistics research at the Claremont Colleges.

Experience
2017 - 2018

Contrary to banking or consulting internships that one typically looks for their junior year, I joined the Philadelphia Phillies baseball club as a quantitative analyst. On the baseball side, I wrote player analysis for the general manager, created player acquisition shortlists and assisted in the draft process. On the data science side, I expanded upon a variety of different classification, regression and clustering algorithms to do things I can't quite say on here.

2017, Jan-May

Selected as part of the Silicon Valley Program (SVP) from Claremont McKenna College, I worked full-time as an analyst at CrunchFund, a seed-stage venture capital firm in San Francisco, while also taking 3 courses. I analyzed the investment opportunities of startups as well as follow-on investment analysis. The work included talking to the companies, meeting with founders, reviewing decks and conducting diligence.

2016, Summer

Was part of the Advanced Data and Workflows Group inside the Oak Ridge Leadership Computing Facility. Conducted research on computer architectures for big data analysis. Work led to two publications, one of which as first author.

J. Harney, S.H. Lim, S. Sukumar, D. Stansberry, P. Xenopoulos, "On-Demand Data Analytics in HPC Environments at Leadership Computing Facilities: Challenges and Experiences", 2016 IEEE International Conference on Big Data [December 2016]

P. Xenopoulos, J. Daniel, M. Matheson, S. Sukumar, "Big Data Analytics on HPC Architectures: Performance and Cost", 3rd Workshop on Advances in Software and Hardware for Big Data to Knowledge Discovery (ASH), part of 2016 IEEE International Conference on Big Data [December 2016]

2015, Summer

Spent my first summer in college working at The Company Lab, a startup and business accelerator in Chattanooga, Tennessee. Utilizing Chattanooga's gigabit network and growing small business community, the accelerator was an intensive bootcamp to help companies expand, pitch to investors and ultimately stay in the Chattanooga ecosystem. Learned quite a bit about small (particularly technology enabled) businesses and how a small city grows an entrepreneurial community.

Research Interests

"Research is formalized curiosity. It is poking and prying with a purpose." - Zora Neale Hurston

1. Data Intensive Computing

I believe that in the future, we need to develop a strong understanding not only of how we conduct data science but where we conduct it as well. Computer architectures have massive implications on the run time of data science workloads, and there are multiple hardware layers that affect performance, such as the compute, memory and network layers. My research aim is to understand how these hardware layers affect the performance of data intensive workloads.

Keywords: data intensive computing, high-performance computing, cloud computing, performance evaluation, benchmarking

2. Imbalanced Classification

Many machine learning examples fail to adequately mention class imbalance: the scenario when the distribution of classes is severly non-uniform. This occurs in many real world problems, such as detecting credit card fraud or rare diseases. Unfortunately, many algorithms we use are not immune to the effects of class imbalance. I'm primarily interested in resampling, ensemble and feature selection techniques and in providing open-source implementations of my work.

Keywords: imbalanced learning, resampling, ensemble methods, feature selection

3. Large Scale Data Analytics

Lots of the theory simply stays theory unless put into practice. Likewise, I think that practical problems we encounter can heavily influence directions we take to develop foundational theory. Thus, I maintain a variety of applied interests, especially in sports, health and business.

Keywords: sports analytics, sports visualization, public health, fraud detection, finance

Publications

Published

[Link], On-Demand Data Analytics in HPC Environments at Leadership Computing Facilities: Challenges and Experiences
J. Harney, S.H. Lim, S. Sukumar, D. Stansberry, P. Xenopoulos, 2016 IEEE International Conference on Big Data

[Link], Big Data Analytics on HPC Architectures: Performance and Cost
P. Xenopoulos, J. Daniel, M. Matheson, S. Sukumar, 2016 IEEE International Conference on Big Data

[Link] [arXiv] , Introducing DeepBalance: Random Deep Belief Network Ensembles to Address Class Imbalance
P. Xenopoulos, 2017 IEEE International Conference on Big Data

Accepted

Coming soon

In Progress/Ideas

Time Series Analysis of Resource Failure in HPC Environments - I want to use the same data set as A Large-Scale Study of Failures in High-Performance Computing Systems, but take a time series approach to analyzing the data (suggested in the conclusion).


Large Scale Analysis of Hard Drive Failures - In line with a growing interest in building reliable systems, I found gigabytes worth of hard drive data from Backblaze's data centers and plan to analyze the data set.


Performance Variability in the Cloud and Effects on Data Intensive Workloads - With this project I hope to quantify the variability in performance between the cloud and traditional HPC infrastructures and quantify the performance effects of resource contention.


Class imbalance and decision tree induction techniques - Based off of my math thesis


Drug Overdose Death Rates and Economic Conditions - Based off of my senior economics work


Expected Points in Soccer - With this project I hope to create a similar metric to Expected Goals (xG) and use the Skellam distribution to determine a team's expected points total.


Data

This is a collection of data sets I have amassed and cleaned over the years

Blog

Here are some recent posts from my Medium

Machine learning is a lot like teenage sex

To play off of Dan Ariely, machine learning is a lot like teenage sex. Everybody talks about it. Only some really know how to do it. Everyone thinks everyone else is doing it. So, everyone claims they’re doing it. What does it mean to train an algorithm? What steps does a neural network actually take to being able to actually predict something? Machine learning often leave us with more questions than answers... Read more

Why is machine learning happening now?

We recently learned that machine learning was a lot like teenage sex. There’s no denying that everybody is talking about it and is claiming they do it. But, even the hottest topic in machine learning today, deep learning, is almost as old as some teenagers. The secret’s been out for a while. Why is machine learning happening now? Why are people doing it? It all comes down to two things: possibility and... Read more

5 must have R programming tools

R, along with Python, is one of the most popular tools for conducting data science. Propelled by a historically strong open-source developer community (R is about 25 years old — older than some data scientists), R is now strongly sought after by employers eyeing data scientists. Although R by itself is extremely powerful, there exist a few other (crucial) tools any R users should become familiar with. Now, in no particular order, we have... Read more

Let's start a conversation

Reach out for research or professional opportunities

I'm interested in both opportunities to collaborate on research or in professional opportunities in sports. You can find my email on my resume. You can also find me on Twitter or LinkedIn .