Meet the data analytics team behind Northeastern baseball’s record-setting season
Author: Matty Wasserman
Date: 05.23.23
Ask Northeastern’s student baseball managers, and they can immediately rattle off which of their opponents are using advanced analytics and which are not — simply by looking at their starting lineups.
To the managers, utilizing data to maximize efficiency is a clear benefit. But while the sabermetrics revolution has transformed strategy and player evaluation in Major League Baseball over the past 20 years, the analytics movement is still relatively new in college baseball. The gap is particularly pronounced at the mid-major level at which Northeastern competes, where there are fewer cutting-edge resources.
But thanks to Khoury College students Justin Chen, Reece Calvin, and Tim Clay, Northeastern baseball is utilizing analytics to stay ahead of the competition — and reaping the rewards on the field. The Huskies have shattered their program record for wins this season, and while much of the success is due to their fresh hitting talent and elite starting pitching, the Khoury trio is also providing a competitive edge.
“A lot of teams are still behind the curve in college baseball, and there are a lot of challenges with getting analytics off the ground,” said Calvin, a second-year data science and economics major. “But as teams embrace it — and here at Northeastern, the staff has really been willing to listen to us — you’ll start to see it pay big dividends.”
Building a mini analytics department
The team’s six student managers spend their days feeding pitching machines, shagging balls during practice, scouting opponents, and tracking pitches with a radar gun during practices and games. But the three Khoury-based managers also spend time compiling pitching and hitting data using a new professional-grade system, Yakkertech, that the team installed at Friedman Diamond back in March. They search for trends in everything from pitch selection to hitters’ decision-making to bunt attempts, and share the findings with Head Coach Mike Glavine and his staff.
Developing the student-led operation has been a years-long undertaking, both to improve technical effectiveness and build trust with the coaching staff. When Chen joined as a freshman in 2020, student manager Jake Sauberman — now the Los Angeles Angels’ coordinator of baseball analytics — had begun using data and building smaller-scale models for the team, albeit with sparse resources.
As Sauberman got the operation off the ground, Chen built his technical skill through his classes and outside work. His first baseball-centric project came in his “Data Science Foundations” course, where he built a predictive model to grade batted-ball contact based on underlying metrics such as exit velocity and launch angle.
“My early stuff wasn’t all effective, but it established a base of knowledge, and I knew what I needed to learn in the coming semesters to build these tools,’” Chen said. “By the spring of last year, once I took machine learning classes, I started getting much better at predictive modeling.”
In addition, Chen co-oped with the Baltimore Orioles’ analytics department in 2021, where he got a crash course in everything from data scraping and cleaning to model building to visualizations and presentation skills. The experience allowed him to bring back knowledge of how an MLB analytics department functioned and apply it to Northeastern’s operation.
Calvin joined the analytics team last fall without much experience in statistical modeling, but with a lifelong passion for baseball and an eagerness to learn. He peppered Chen with questions and observed Chen’s work while simultaneously improving his technical foundation through Khoury data classes.
“I learned how to separate data, find trends, and more technical stuff. It really unlocked everything else for me,” Calvin said. “I could analyze what people had already done at the major-league level and in publicly available projects, and learned how to apply it to the college level.”
Tim Clay chose to attend Northeastern largely to work as a student manager for a Division I team after reading about the opportunity online. He joined the analytics team after arriving on campus this past fall and has since leaned on Chen and Calvin for advice for everything from managing tips to class projects, and has eagerly contributed to the cause. For example, when given a simple data-entry task, he took initiative to streamline the larger tagging system.
“I saw where the sheet could be improved, how we could make the flow of the chart a little bit better,” Clay said. “Over time, I just kept doing that and they wound up trusting me more and more.”
Putting the data to use
Data analytics skill is still only half the battle. Without buy-in from the coaching staff to actually value the managers’ insights, their work would be irrelevant.
“I’m not the guy who played baseball my whole life,” Calvin said. “It’s difficult to convince a coach to listen to you when they’ve been doing this longer than you’ve been alive. But I think you just have to prove your value and show that you can help the team win.”
The team’s commitment to analytics is evident in the recently purchased Yakkertech system. Before it, the managers’ only data-gathering tool was a portable machine that could only be set up for practices — meaning they couldn’t access in-game data. Now, their state-of-the-art setup includes four permanent cameras and gives the managers access to detailed pitching data during practices and games.
While the system was a necessary step in Chen’s long-term vision, plenty of competing teams have installed similar setups, or could soon. The onus now falls on the managers to create an edge.
“Teams are now investing in it, and coaches are now using it as a selling point to recruit players,” Chen said. “Everyone gets a ton of data, but if you don’t have anyone actually going through and aggregating it, looking for trends and using it properly, it’s pretty useless.”
After compiling the Yakkertech data and tagging the pitches with attributes such as speed, location, pitch type, and spin rate, the trio insert the findings into their trained models and tailor the observations to individual players.
“We can assess which guys are commanding their pitches the best, and which guys are getting value out of their pitches based on how they move compared to other pitches thrown in those locations,” Chen said. “So aggregating those two, you make sure that everyone is throwing the right pitches. Then if not, we’ll give suggestions, like, ‘Hey, we should have this guy throw a slider more instead of his fastball.”
Likewise, the managers have worked to refine a “run value metric” tailored to college baseball, which averages historical scoring trends by isolating hitting results from situational context such as runners on base and the number of outs. But while run value and other advanced metrics can improve on-field strategy, the data-driven concepts can be complex and difficult to explain. Therefore, the managers often translate their findings into more common metrics such as on-base and slugging percentage, as well as highlight specific in-game scenarios.
“When you explain a project in a data science class, you’re talking to people who already know exactly what you’re talking about,” Calvin said. “So trying to switch over to someone who isn’t taking the same classes as me and is very baseball focused, it’s harder to get your points across. You have to be creative.”
All those lessons are translating to success for Northeastern, which won 41 of 53 regular-season games this year — by far the most in the program’s century-old history. But the managers are breaking new ground for themselves as well. Chen will intern this summer as a quantitative analysis assistant for the Philadelphia Phillies, and he’s hoping to land a full-time role in an MLB analytics department after graduating in December. Additionally, Calvin will start a co-op in July for the Hiroshima Toyo Carp, a team in Japan’s top league.
“They don’t really have an analytics department right now, so they are starting one from scratch,” Calvin said. “Because of all the stuff we’re building here at Northeastern, I can just plug in the new data and hit the ground running.”
Clay will intern this summer at Sports Reference, the premier website and database for current and archived statistics in every major professional sport. He hopes to translate his growth into a larger role with the Huskies next season, and to lead the next generation of Northeastern’s baseball analytics department to continued success.
“Having this year under my belt to learn from Justin and Reece has been huge,” Clay said. “The foundation is all there, and we’re definitely building towards big things.”