Peter Xenopoulos

Solving problems with data Interested in machine learning for interpretability, visualization and sports Learn More
Peter Xenopoulos Headshot
Peter Xenopoulos

I'm Peter Xenopoulos

Computer Science PhD student at NYU. Previously, mathematics and economics student at Pomona College. Experience with a professional sports team, sports betting, a Silicon Valley venture fund and a world class research laboratory. Currently interested in machine learning, particularly for applications in interpretability, visualization and sports.

I'm currently looking for internships and consulting work in sports, tech and finance, with a particular focus on developing and deploying machine learning models.

My History

Experience with a professional sports team, a Silicon Valley venture fund and the world's largest supercomputer

Education
2018 - Present

I'm currently a second-year PhD student at NYU, working with Professor Claudio Silva as part of the VIDA group. My main research interests include interpretability of machine learning models, visualization and sports analytics.

2014 - 2018

Before NYU, I was a double major in mathematics and economics at Pomona College, where I graduated in May 2018. I wrote my mathematics thesis on an original feature selection algorithm and my economics senior exercise on the relationship between drug overdose death rates and economic conditions. My advisor was Professor Pierangelo De Pace. These were some of the courses I took:

  • Parallel & High-Performance Computing
  • Big Data Platforms
  • Computational Statistics
  • Statistical Theory
  • Probability
  • Econometrics
  • Time Series Econometrics (Graduate)

I also was part of the student investment club, where I led the healthcare and technology group as a Sophomore; Pomona Ventures, where I helped student run startups expand and pitch to investors, and in particular advised Social Cipher, which won the 2018 Sage Tank competition; Pomona Sports Analytics Club, which I founded my junior year to provide an outlet for sports statistics research at the Claremont Colleges.

Experience
Jun-Aug, 2018

At Big League Advance, I was a data scientist intern working on engineering and deploying complex pipelines to support sports betting operations. In particular, I worked on projection systems, data ingestion, data scraping and a full end-to-end modeling process for American football. To date, said American football model has been profitable, providing tremendous value for Big League Advance.

2017 - 2018

For two summers, I worked as a quantitative analyst associate in the baseball research & development group at the Philadelphia Phillies. I worked on a variety of projects, such as writing reports for front office staff and coaches, to developing and integrating machine learning models into the Phillies' team operations. In particular, I took the initiative to lead over 5 projects that pushed baseball analytics boundaries, using data sources such as text and audio.

Jan-May, 2017

For the spring semester of my junior year, I was selected to partake in Claremont McKenna College's Silicon Valley Program. As part of the program, I worked full time, while taking a full course load on the weekend. Specifically, I worked with CrunchFund, now known as Tuesday Capital, where I met with over 30 prospective startups, alongside partners, principals and associates. I wrote over 20 reports that guided the investment of millions of dollars. Additional duties included meeting with founders, reviewing decks, conducting diligence and supporting portfolio company operations.

May-Aug, 2016

At Oak Ridge National Laboratory, I was part of the Advanced Data and Workflows Group, where I produced two research papers, as a sophomore, one of which I first authored:

J. Harney, S.H. Lim, S. Sukumar, D. Stansberry, P. Xenopoulos, "On-Demand Data Analytics in HPC Environments at Leadership Computing Facilities: Challenges and Experiences", 2016 IEEE International Conference on Big Data [December 2016]

P. Xenopoulos, J. Daniel, M. Matheson, S. Sukumar, "Big Data Analytics on HPC Architectures: Performance and Cost", 3rd Workshop on Advances in Software and Hardware for Big Data to Knowledge Discovery (ASH), part of 2016 IEEE International Conference on Big Data [December 2016]

May-Jul, 2015

At the Company Lab, a non-profit startup accelerator in Chattanooga, TN, I worked closely with participating startups to refine their strategy and investment pitch decks. The startups that I worked with in the Summer 2015 class ultimately raised over a million dollars in local and national funding.

Research Interests

My current research interests revolve around machine learning interpretability, data visualization and sports analytics

    ML Interpretability

One of my focuses this past year has been on machine learning interpretability. In particular, I am interested in global interpretability, and I find the work on concept activation vectors (CAVs) to be especially exciting. I implemented this work in Keras as part of an easy to use Python package, called cav-keras. Specifically, I wish to transition the CAV methodology to audio and text classification, as well as investigate imbalanced data's effect on interpretability.

    Data Visualization

After taking two graduate visualization courses, as well as joining a visualization group, the field has interested me deeply. One of the highlights is my d3 project on visualizing the connections between computer science research communities over time. Visuals can be found in this Twitter thread. Going forward, I wish to build visualizations for machine learning interpretability applications as well as for sports data, such as soccer or esports.

    Sports Analytics

I have been doing work in sports data for almost five years, covering baseball, American football, soccer and esports. In particular, my focus has shifted towards mining low level data sources, such as tracking data, particularly for sports like American football, soccer and Counter-Strike: Global Offensive (an esport). Recently, I've been working on developing win probability models for Counter-Strike, parsing massive amounts of soccer event data, engineering deep neural networks for baseball video classification and maintaining an open-source Python library for college football data.

Publications

Published

[Link], On-Demand Data Analytics in HPC Environments at Leadership Computing Facilities: Challenges and Experiences
J. Harney, S.H. Lim, S. Sukumar, D. Stansberry, P. Xenopoulos, 2016 IEEE International Conference on Big Data

[Link], Big Data Analytics on HPC Architectures: Performance and Cost
P. Xenopoulos, J. Daniel, M. Matheson, S. Sukumar, 2016 IEEE International Conference on Big Data

[Link] [arXiv], Introducing DeepBalance: Random Deep Belief Network Ensembles to Address Class Imbalance
P. Xenopoulos, 2017 IEEE International Conference on Big Data

Accepted

In Progress

Who Will Win? Win Probability Models for Counter-Strike: Global Offensive
N. Latshaw and P. Xenopoulos

What's He Throwing? Deep Neural Networks for Baseball Pitch Classification
P. Xenopoulos and M. Mandic

Data visualization and mining methods for: (1) soccer, using event data, (2) Counter-Strike: Global Offensive, developing an open-source and easy to use game-parser and (3) American football, using tracking data provided by the NFL

Developing TCAV-esque methods for model interpretability for text and audio classification tasks, understanding the effect of class imbalance on interpretability methods

Data

This is a collection of data sets I have amassed and cleaned over the years

Blog

Here are some recent posts from my Medium

Machine learning is a lot like teenage sex

To play off of Dan Ariely, machine learning is a lot like teenage sex. Everybody talks about it. Only some really know how to do it. Everyone thinks everyone else is doing it. So, everyone claims they’re doing it. What does it mean to train an algorithm? What steps does a neural network actually take to being able to actually predict something? Machine learning often leave us with more questions than answers... Read more

Why is machine learning happening now?

We recently learned that machine learning was a lot like teenage sex. There’s no denying that everybody is talking about it and is claiming they do it. But, even the hottest topic in machine learning today, deep learning, is almost as old as some teenagers. The secret’s been out for a while. Why is machine learning happening now? Why are people doing it? It all comes down to two things: possibility and... Read more

5 must have R programming tools

R, along with Python, is one of the most popular tools for conducting data science. Propelled by a historically strong open-source developer community (R is about 25 years old — older than some data scientists), R is now strongly sought after by employers eyeing data scientists. Although R by itself is extremely powerful, there exist a few other (crucial) tools any R users should become familiar with. Now, in no particular order, we have... Read more

Let's start a conversation

Reach out for research or professional opportunities

I'm interested in research, internship or consulting opportunities. You can find my email on my resume. You can also find me on LinkedIn or Twitter for less formal requests.