Peter Xenopoulos

Solving problems with data Interested in data, visualization and sports Learn More
Peter Xenopoulos Headshot
Peter Xenopoulos

I'm Peter Xenopoulos

Computer Science PhD student at NYU researching machine learning and visualization techniques with applications to sports data. Previously a data scientist at a sports betting startup, a quantitative analyst with a professional baseball team, and an analyst at a Silicon Valley venture fund. Currently interested in mining esports data, exploring new methods of spatiotemporal visualization and applying collaborative filtering techniques to matchup prediction.

I'm currently looking for internships and consulting work in sports, tech and finance, with a particular focus on developing and deploying machine learning models.

My History

Experience with a sports betting startup, a professional sports team and a Silicon Valley venture fund

2018 - Present

I'm currently a second-year PhD student at NYU, working with Professor Claudio Silva as part of the VIDA group. My main research interests revolve around applying machine learning and visualization methods to sports data. In particular, I am researching new visualization and interaction methods, along with unique applications of machine learning to sports.

2014 - 2018

Before NYU, I was a double major in mathematics and economics at Pomona College, where I graduated in May 2018. I wrote my mathematics thesis on an original feature selection algorithm and my economics senior exercise on the relationship between drug overdose death rates and economic conditions. My advisor was Professor Pierangelo De Pace. These were some of the courses I took:

  • Parallel & High-Performance Computing
  • Big Data Platforms
  • Computational Statistics
  • Statistical Theory
  • Probability
  • Econometrics
  • Time Series Econometrics (Graduate)

I also was part of the student investment club, where I led the healthcare and technology group as a Sophomore; Pomona Ventures, where I helped student run startups expand and pitch to investors, and in particular advised Social Cipher, which won the 2018 Sage Tank competition; Pomona Sports Analytics Club, which I founded my junior year to provide an outlet for sports statistics research at the Claremont Colleges.

Jun-Aug, 2018

At Big League Advance, I was a data scientist intern working on engineering and deploying complex pipelines to support sports betting operations. In particular, I worked on projection systems, data ingestion, data scraping and a full end-to-end modeling process for American football. To date, said American football model has been profitable, providing tremendous value for Big League Advance.

2017 - 2018

For two summers, I worked as a quantitative analyst associate in the baseball research & development group at the Philadelphia Phillies. I worked on a variety of projects, such as writing reports for front office staff and coaches, to developing and integrating machine learning models into the Phillies' team operations. In particular, I took the initiative to lead over 5 projects that pushed baseball analytics boundaries, using data sources such as text and audio.

Jan-May, 2017

For the spring semester of my junior year, I was selected to partake in Claremont McKenna College's Silicon Valley Program. As part of the program, I worked full time, while taking a full course load on the weekend. Specifically, I worked with CrunchFund, now known as Tuesday Capital, where I met with over 30 prospective startups, alongside partners, principals and associates. I wrote over 20 reports that guided the investment of millions of dollars. Additional duties included meeting with founders, reviewing decks, conducting diligence and supporting portfolio company operations.

May-Aug, 2016

At Oak Ridge National Laboratory, I was part of the Advanced Data and Workflows Group, where I produced two research papers, as a sophomore, one of which I first authored:

J. Harney, S.H. Lim, S. Sukumar, D. Stansberry, P. Xenopoulos, "On-Demand Data Analytics in HPC Environments at Leadership Computing Facilities: Challenges and Experiences", 2016 IEEE International Conference on Big Data [December 2016]

P. Xenopoulos, J. Daniel, M. Matheson, S. Sukumar, "Big Data Analytics on HPC Architectures: Performance and Cost", 3rd Workshop on Advances in Software and Hardware for Big Data to Knowledge Discovery (ASH), part of 2016 IEEE International Conference on Big Data [December 2016]

May-Jul, 2015

At the Company Lab, a non-profit startup accelerator in Chattanooga, TN, I worked closely with participating startups to refine their strategy and investment pitch decks. The startups that I worked with in the Summer 2015 class ultimately raised over a million dollars in local and national funding.

Research Interests

My current research interests revolve around machine learning interpretability, data visualization and sports analytics

    Data Visualization

After taking two graduate visualization courses, as well as joining a visualization group, the field has interested me deeply. One of the highlights is my d3 project on visualizing the connections between computer science research communities over time. Visuals can be found in this Twitter thread. I've also started work on ggViz, a system for visualizing esports player trajectories.

    Sports Analytics

I have been doing work in sports data for almost five years, covering baseball, American football, soccer and esports. In particular, my focus has shifted towards mining low level data sources, such as tracking data, particularly for sports like American football, soccer and esports. Recently, I've been working on end-to-end systems for valuing CSGO players using low level spatiotemporal data. Out of this project, I created the csgo Python package to parse CSGO data. I've also created the soccer-parse Python package for parsing soccer event data.

    ML Interpretability

One of my recent focuses this past year has been on machine learning interpretability. In particular, I am interested in global interpretability, and I find the work on concept activation vectors (CAVs) to be especially exciting. I implemented this work in Keras as part of an easy to use Python package, called cav-keras. Specifically, I'll explore transitioning the CAV methodology to audio and text classification, as well as investigate imbalanced data's effect on interpretability.



[Link], On-Demand Data Analytics in HPC Environments at Leadership Computing Facilities: Challenges and Experiences
J. Harney, S.H. Lim, S. Sukumar, D. Stansberry, P. Xenopoulos, 2016 IEEE International Conference on Big Data

[Link], Big Data Analytics on HPC Architectures: Performance and Cost
P. Xenopoulos, J. Daniel, M. Matheson, S. Sukumar, 2016 IEEE International Conference on Big Data

[Link] [arXiv], Introducing DeepBalance: Random Deep Belief Network Ensembles to Address Class Imbalance
P. Xenopoulos, 2017 IEEE International Conference on Big Data


Win Probability Model for Counter-Strike: Global Offensive
P. Xenopoulos, H. Doraiswamy and C. Silva

In Progress

What's He Throwing? Deep Neural Networks for Baseball Pitch Classification
P. Xenopoulos and M. Mandic

Playing Matchmaker: Collaborative Filtering for Sports Matchup Prediction
P. Xenopoulos and M. Mandic


This is a collection of data sets I have amassed and cleaned over the years


Here are some recent posts from my Medium

Machine learning is a lot like teenage sex

To play off of Dan Ariely, machine learning is a lot like teenage sex. Everybody talks about it. Only some really know how to do it. Everyone thinks everyone else is doing it. So, everyone claims they’re doing it. What does it mean to train an algorithm? What steps does a neural network actually take to being able to actually predict something? Machine learning often leave us with more questions than answers... Read more

Why is machine learning happening now?

We recently learned that machine learning was a lot like teenage sex. There’s no denying that everybody is talking about it and is claiming they do it. But, even the hottest topic in machine learning today, deep learning, is almost as old as some teenagers. The secret’s been out for a while. Why is machine learning happening now? Why are people doing it? It all comes down to two things: possibility and... Read more

5 must have R programming tools

R, along with Python, is one of the most popular tools for conducting data science. Propelled by a historically strong open-source developer community (R is about 25 years old — older than some data scientists), R is now strongly sought after by employers eyeing data scientists. Although R by itself is extremely powerful, there exist a few other (crucial) tools any R users should become familiar with. Now, in no particular order, we have... Read more

Let's start a conversation

Reach out for research or professional opportunities

I'm interested in research, internship or consulting opportunities. You can find my email on my resume. You can also find me on LinkedIn or Twitter for less formal requests.