Sportico is proud to partner with The Harvard Sports Analysis Collective, a student-run organization dedicated to the quantitative analysis of sports strategy and management, to bring our readers the excellent work coming from some of the brightest young minds in the country. To read the full analysis, go here.
Consider this Cooperstown curiosity: Among the 20 Hall of Famers inducted as members of the Red Sox or the Reds, only two have been pitchers (Pedro Martinez, 2015; Eppa Rixey, 1963). Meanwhile, since their 1958 move to Los Angeles, the only Dodgers players enshrined have been pitchers. Just a cursory look at a franchise’s very best suggests that even through changes in ownership, management and player personnel, some teams simply gravitate to an overarching identity over time.
Utilizing Fangraph data from 1969–2019, I used k-means clustering, a machine learning algorithm, to broadly distinguish team types—offense-focused, pitching-reliant, neutral—then analyzed them by franchise to see if trends emerged over the course of an organization’s history.
Offensive vs. Pitching Clustering
I began with a very broad clustering of Hitting vs. Pitching for every team with a record above .500. Shown in the plot above, most franchises fall somewhere within 20% of the mean; this means they have fielded winning teams that have pitched well, but also hit well in other years. But there are some fascinating outliers that appear to exhibit tendencies—perhaps inspired by organizational philosophies—to build successful teams in a certain type of way. Let’s take a look at three examples scattered across the spectrum.
Beginning with the Cubs, a franchise squarely categorized as neutral, their focus over time seems erratic, bouncing between short stretches of good hitting or good pitching in their quest to break the Billy Goat Curse. The Mets, on the other hand, have only had a handful of winning seasons behind a hitting-oriented team. From the 1969 Miracle Mets, led by Tom Seaver, to today’s team, anchored by dual aces Jacob deGrom and Noah Syndergaard, the Amazin’s have often been built around strong pitching. Finally, when you think about the Reds, the Big Red Machine of the 1970s should automatically come to mind. Looking at their history, the Reds have mostly seen success behind offensively dominant teams, particularly in the ’70s and the second half of the ’90s. The notable dearth of plot points in recent years is because Cincinnati has only had a few winning seasons since 2000. But with the huge investments they made in signing Nick Castellanos and Mike Moustakas (4 years, $64 million each) this offseason, the Reds appear to be returning to the high-powered offenses that once defined the franchise.
While the broad distinctions are illustrative, I wanted to drill down on specific identifying philosophies, so I broke down each category even further.
Considering three major measures of hitting—ability to get on base (OBP), ability to put the ball in play (BABIP) and hitting for power (HR/ISO)—my algorithm differentiated several types of offense. I plotted a 3-dimensional chart (above), which displays the six fairly distinguishable offensive clusters.
Below are highlighted team and player archetypes for each cluster defined by the algorithm.
|Cluster||Description||Team Example||Player Example|
|1||Contact Dependent||1982 Cardinals||Ichiro Suzuki|
|2||Pure Hitting||2018 Red Sox||Joey Votto|
|3||Below Average||2012 Astros||Jackie Bradley Jr.|
|4||Power Dependent||2010 Blue Jays||Khris Davis|
|5||Three True Outcomes||2018 Yankees||Joey Gallo|
|6||Avg All Around||1981 Dodgers||Adam Eaton|
Some clusters are intuitive, like the Contact Dependent cluster, which includes the Whiteyball Cardinals of the 1980s, or the Power Dependent cluster, which features the 2010 Blue Jays, who led the league in home runs but were below average in most other offensive categories. A few of the others were harder to define and differentiate. The Pure Hitting cluster, which is marked by high values across the board, could best be personified by vintage Joey Votto. On a team level, that means a lineup that gets on base with their contact skills and plate discipline, with some pop mixed in. The Three True Outcomes cluster also had high values in power and OBP, but they tended to be average in BABIP. Also, though I did not include strikeouts as a variable, this cluster generally saw high strikeout rates, hence labeling it Three True Outcomes (HR, W, SO).
To simplify the data, I organized the six clusters onto a spectrum, ranging from contact dependent to power dependent.
|Cluster||Cluster Description||Power Index|
|5||Three True Outcomes||0.75|
|6||Average All Around||0.50|
(Note: This index has nothing to do with being successful as an offense, and the higher values do not signify better teams. The index only takes into account how important power was to the offense.)
In the graphic above, notice the trend lines and how they relate to 0.5, which represents neutral in regards to power vs. contact. Many organizations fluctuate or remain somewhere in the middle signifying no distinct trend when it comes to a reliance on power; however, others stay well above or below the line for the majority of their histories. Organizations such as the Orioles, Athletics and Blue Jays seem to have power-driven offenses, while the Royals, Cardinals and Pirates are more prone to success through contact and getting on base.
Another fascinating result, given where they play, is the Colorado Rockies’ consistently being at or below 0.5 on the power index for the majority of their existence. They have, perhaps surprisingly, never led the majors in home runs, and they have only finished in the Top 5 a handful of times. Instead of building lineups around pure power hitters, the Rockies tend to develop and acquire players who make consistent, hard contact, believing the power numbers will come as a built-in byproduct of playing in high-altitude Denver. Rockies legends Larry Walker and Todd Helton are excellent examples.
Distinguishing philosophies around pitching was more of a challenge. An entire staff will never be uniform, usually consisting of a healthy mix of pitching styles. Thus, at the risk of oversimplification, I differentiated teams by starting rotations vs. bullpens over their histories.
I found there is much more variability with pitching than with hitting. It was not uncommon to see the bullpen dominate one year, and then see the team carried by great starting pitching the next. That’s understandable given how challenging it is to fill the hole of losing an ace or a lights-out reliever to injury.
That said, there were still some teams that showed a penchant for one over the other. The Dodgers, Diamondbacks and Mets were more often starting-pitching oriented, while the Yankees, Mariners and Athletics seem to have built stronger bullpens over their respective histories. However, even in making these observations, I am a bit wary. The Yankees have had some all-time bullpens, including Hall of Famers Goose Gossage and Mariano Rivera, but they haven’t exactly lacked in the starters department either: Andy Pettitte, Roger Clemens, CC Sabathia and now Gerrit Cole.
Very interestingly, the Dodgers are the only franchise never to have had a season where its pitching staff classified as below average in both the rotation and bullpen by my clustering analysis. High-quality pitching seems to be in the Dodgers’ DNA. They boast a long list of some of the best starters in MLB history (Sandy Koufax, Don Drysdale, Don Sutton and Clayton Kershaw). They have also featured relievers who were unhittable in their heydays, like Eric Gagne and Kenley Jansen.
|Aggregate Franchise Focuses from 1969-2019|
|Franchise||Overall Focus||Overall %||Offensive Focus||Offensive %||Pitching Focus||Pitching %|
|Angels||Neutral (Pitching)||0.58||Neutral (Power)||0.55||Starters||0.46|
|Cubs||Neutral (Pitching)||0.52||Neutral (Power)||0.52||Starters||0.57|
|Diamondbacks||Neutral (Pitching)||0.54||Neutral (Power)||0.52||Starters||0.65|
|Indians||Neutral (Offense)||0.55||Neutral (Contact)||0.51||Starters||0.46|
|Rangers||Neutral (Offense)||0.57||Neutral (Power)||0.57||Starters||0.40|
|Red Sox||Offense||0.83||Neutral (Contact)||0.56||Starters||0.51|
|White Sox||Neutral (Offense)||0.52||Power||0.65||Starters||0.46|
|Neutral refers to falling within the 40-60% range where there is not enough evidence to distinguish between the categories.|
Aggregating the data, I created team profiles that broadly characterize organizational philosophies. Of course, there are ebbs and flows over the course of 50 years, in talent and strengths; however, it is interesting to find that the A’s and Orioles, for instance, can be characterized by power over contact in 80% of their seasons dating back to 1969.
Though the pitching analysis did not produce as significant results, it did show that most of the pitching-oriented franchises are in the National League; one would presume this has a lot to do with how the DH affects the American League.
Though this analysis highlighted certain tendencies, it can’t really explain why, say, the Orioles tend to be power reliant. Certainly, home ballparks must play a role, especially for teams like the Rockies at Coors Field or the Padres at Petco Park. But is it the park that is leads teams to perform better in certain areas or are decision-makers building their rosters with their home ballpark in mind?
The trends we see may be a function of coaching staffs at both the major and minor league levels, with strengths in developing certain types of players or skills. GMs and owners can understandably have a strong influence over the identity of a team, as can a franchise’s own history. It’s common for club legends to take on special advising roles in the front office or to find them at spring training, lending advice to current players.
With some organizations, history does seem to repeat itself. For some, that’s a recipe for success. For others, looking at their past might be a key to turning their fortunes around.