
Sportico is proud to partner with The Harvard Sports Analysis Collective, a student-run organization dedicated to the quantitative analysis of sports strategy and management, to bring our readers the excellent work coming from some of the brightest young minds in the country. To read the summary, go here.
—
Consider this Cooperstown curiosity: Among the 20 Hall of Famers inducted as members of the Red Sox or the Reds, only two have been pitchers (Pedro Martinez, 2015; Eppa Rixey, 1963). Meanwhile, since their 1958 move to Los Angeles, the only Dodgers players enshrined have been pitchers. Just a cursory look at a franchise’s very best suggests that even through changes in ownership, management and player personnel, some teams simply gravitate to an overarching identity over time.
Utilizing Fangraph data from 1969-2019, I used k-means clustering, a machine learning algorithm, to broadly distinguish team types—offense-focused, pitching-reliant, neutral—then analyzed them by franchise to see if trends emerged over the course of an organization’s history.
Now, the run-scoring environment of 1969 and the early 1970s is drastically different than it is today. Taking this into account, I standardized the statistics on a per year basis. For example, the Red Sox led the league in home runs with 197 in 1969, while the Twins led the league in 2019 with 307. In my cluster analysis, I will not be looking at these nominal values but rather their relative value compared to other teams during that particular season. Thus, when these two home run totals are standardized by year, both will have a significantly high z-scores and these two teams can be clustered similarly.
Offensive vs. Pitching Clustering
I began with a very broad clustering of Hitting vs. Pitching for every team with a record above .500. I chose .500 as a benchmark, assuming teams with a record of .500 were relatively competitive, and it did not make sense to label a team as a good hitting or pitching team if they had a losing record. From the plot above, we can almost draw a line where y=x with points above the line being better at pitching (red) and those below being better hitting teams (blue).
Shown in the plot above, most franchises fall somewhere within 20% of the mean; this means they have fielded winning teams that have pitched well and also hit well in other years. But there are some fascinating outliers that appear to exhibit tendencies—perhaps inspired by organizational philosophies—to build successful teams in a certain type of way. Let’s take a look at three examples scattered across the spectrum.
Beginning with the Cubs, a franchise squarely categorized as neutral, their focus over time seems erratic, bouncing between short stretches of good hitting or good pitching in their quest to break the Billy Goat Curse. The Mets, on the other hand, have only had a handful of winning seasons behind a hitting-oriented team. From the 1969 Miracle Mets, led by Tom Seaver, to today’s team, anchored by dual aces Jacob deGrom and Noah Syndergaard, the Amazin’s have often been built around strong pitching. Finally, when you think about the Reds, the Big Red Machine of the 1970s should automatically come to mind. Looking at their history, the Reds have mostly seen success behind offensively dominant teams, particularly in the 1970s and the second half of the 1990s. The notable dearth of plot points in recent years is because Cincinnati has only had a few winning seasons since 2000. But with the huge investments they made in signing Nick Castellanos and Mike Moustakas this offseason, the Reds appear to be returning to the high-powered offenses that once defined the franchise.
While the broad distinctions are illustrative, I wanted to drill down on specific identifying philosophies, so I broke down each category even further.
Offensive Clustering
Beginning with offense, I chose variables to highlight different aspects of hitting: particularly, the ability to get on base (OBP), the ability to make contact/putting the ball in play (BABIP), and the ability to hit for power (HR, ISO). With a k = 6, signifying six clusters, I was able to explain 70% of the variation in offenses from 1969-2019.
The k-means clustering yielded fairly distinguishable categories of offenses based on z-scored statistics. The k-means algorithm defined six cluster centers as follows where each value represents the average z-score for the variable within the cluster. In the table below, high positive values correspond to being very good, in regards to that particular offensive category, while negative numbers correspond to being below average.
Cluster | Cluster Description | Team Example | Player Example | Z-Scores | |||
HR | ISO | BABIP | OBP | ||||
1 | Contact Dependent | 1982 Cardinals | Tim Anderson | -0.94 | -0.70 | 0.93 | 0.47 |
2 | Pure Hitting | 2018 Red Sox | Joey Votto | 0.52 | 0.77 | 1.54 | 1.47 |
3 | Below Average | 2012 Astros | Jackie Bradley Jr. | -0.47 | -0.47 | -0.33 | -0.47 |
4 | Power Dependent | 2010 Blue Jays | Khris Davis | 0.93 | 0.70 | -0.96 | -0.42 |
5 | Three True Outcomes | 2018 Yankees | Joey Gallo | 1.59 | 1.48 | 0.09 | 0.97 |
6 | Avg All Around | 1981 Dodgers | Adam Eaton | 0.30 | 0.39 | 0.31 | 0.48 |
Some clusters are intuitive and easy to define, like the Contact Dependent cluster, which includes the Whiteyball Cardinals of the 1980s, or the Power Dependent cluster, which features the 2010 Blue Jays, who led the league in home runs but were below average in most other offensive categories. A few of the others were harder to define and differentiate. I did not include strikeouts in the k-means clustering, but very interestingly, the average z-score for strikeouts in Cluster 5 was positive, which is why I labeled the group as Three True Outcomes (HR, W, K). The Pure Hitting cluster, which is marked by high values across the board, could best be personified by vintage Joey Votto. On a team level, that means a lineup that gets on base with their contact skills and plate discipline, with some pop mixed in.
To simplify the data, I organized the six clusters onto a spectrum, ranging from contact dependent to power dependent.
Cluster | Cluster Description | Power Index |
4 | Power Dependent | 1.00 |
5 | Three True Outcomes | 0.75 |
6 | Avg All Around | 0.50 |
3 | Below Avg | 0.50 |
2 | Pure Hitting | 0.25 |
1 | Contact Dependent | 0.00 |
This index has nothing to do with being successful as an offense, and the higher values do not signify better teams. The index only takes into account how important power was to the offense. Even though Cluster 2 (Pure Hitting) includes teams with positive power ratings, the k-means clustering distinguished this group by its ability to get on base and get hits when they put the ball in play, whereas those in the Power Dependent and Three True Outcomes clusters are presumably more swing-for-the-fences.
In the graphic above, notice the trend lines and how they relate to 0.5, which represents neutral in regard to power vs. contact. Many organizations fluctuate or remain somewhere in the middle, signifying no distinct trend when it comes to reliance on power; however, others stay well above or below the line for the majority of their histories. Is keeping a certain trend over history the key to success? Not necessarily. The White Sox had a huge spike toward power-hitting teams in the early 2000s leading to their 2005 World Series win, the franchise’s first since 1917. Where some teams change and adapt to compete, others have stayed true to offensive identities over their histories. Organizations such as the Orioles, Athletics and Blue Jays seem to always have power driven offenses while the Royals, Cardinals and Pirates are more prone to success through contact and getting on base.
Another fascinating result, given where they play, is the Colorado Rockies’ consistently being at or below 0.5 on the power index for the majority of their existence. They have, perhaps surprisingly, never led the majors in home runs, and they have only finished in the Top 5 a handful of times over their history. Instead of building lineups around pure power hitters, the Rockies tend to develop and acquire players who make consistent, hard contact, knowing the power numbers will come as a built-in byproduct of playing in high-altitude Denver. Rockies legends like Larry Walker and Todd Helton are excellent examples.
Pitching Clusters
Distinguishing philosophies around pitching was more of a challenge. An entire staff will never be uniform, usually consisting of a healthy mix of different types of pitchers. Thus, at the risk of oversimplification, I differentiated teams by starting rotations vs. bullpens over their histories. I used two variables to cluster teams’ pitching: Starter WAR and Reliever WAR. WAR accounts for park factors, includes a leverage adjustment for relievers and Fangraphs’ version of WAR is FIP (Fielding Independent Pitching)-based. Thus, it is more dependent on what a pitcher can control without taking into account the defense that plays behind them nor the ballpark. With these two variables and five clusters, k-means clustering was able to explain 72.3% of variation.
The clusters were defined as follows:
Cluster | Cluster Description | Focus | Z-Scores | |||
Starter WAR | Reliever WAR | |||||
1 | + Starters, Below Avg Bullpen | Starters Focus | 0.18 | -0.89 | ||
2 | + Bullpen, Below Avg Starters | Bullpen Focus | -0.13 | 0.48 | ||
3 | Below Avg Pitching | Below Avg Pitching | -1.24 | -0.20 | ||
4 | + Starters, Avg Bullpen | Starters Focus | 1.40 | 0.26 | ||
5 | + Bullpen, Avg Starters | Bullpen Focus | 0.32 | 1.59 |
I found there is much more variability with pitching than with hitting. It was not uncommon to see the bullpen dominate one year, and then see the team carried by great starting pitching the next. That’s understandable given how challenging it is to fill the hole of losing an ace or a lights-out reliever to injury.
That said, there were still some teams that showed a penchant for one over the other. The Dodgers, Diamondbacks and Mets were more often starting-pitching oriented, while the Yankees, Mariners and Athletics seem to have built stronger bullpens over their respective histories. However, even in making these observations, I am a bit wary. The Yankees have had some all-time bullpens, including Hall of Famers Goose Gossage and Mariano Rivera, but they haven’t exactly lacked in the starters department either: Andy Pettitte, Roger Clemens, CC Sabathia and now Gerrit Cole.
Very interestingly, the Dodgers are the only franchise never to have had a pitching staff classified as below average in both the rotation and bullpen by my clustering analysis. High-quality pitching seems to be in the Dodgers’ DNA.
Combining Results
Aggregate Franchise Focuses from 1969-2019 | ||||||
Franchise | Overall Focus | Overall % | Offensive Focus | Offensive % | Pitching Focus | Pitching % |
Angels | Neutral (Pitching) | 0.58 | Neutral (Power) | 0.55 | Starters | 0.46 |
Astros | Pitching | 0.64 | Contact/On-Base | 0.63 | Starters | 0.52 |
Athletics | Neutral (Pitching) | 0.57 | Power | 0.85 | Bullpen | 0.49 |
Blue Jays | Offense | 0.70 | Power | 0.76 | Bullpen | 0.44 |
Braves | Pitching | 0.62 | Neutral (Power) | 0.56 | Starters | 0.53 |
Brewers | Offense | 0.75 | Neutral (Power) | 0.55 | Bullpen | 0.31 |
Cardinals | Offense | 0.68 | Contact/On-Base | 0.78 | Starters | 0.55 |
Cubs | Neutral (Pitching) | 0.52 | Neutral (Power) | 0.52 | Starters | 0.57 |
Diamondbacks | Neutral (Pitching) | 0.54 | Neutral (Power) | 0.52 | Starters | 0.65 |
Dodgers | Pitching | 0.74 | Contact/On-Base | 0.64 | Starters | 0.66 |
Giants | Offense | 0.68 | Contact/On-Base | 0.60 | Bullpen | 0.37 |
Indians | Neutral (Offense) | 0.55 | Neutral (Contact) | 0.51 | Starters | 0.46 |
Mariners | Pitching | 0.64 | Power | 0.66 | Bullpen | 0.54 |
Marlins | Pitching | 0.67 | Contact/On-Base | 0.77 | Starters | 0.46 |
Mets | Pitching | 0.77 | Neutral (Contact) | 0.51 | Starters | 0.61 |
Nationals/Expos | Pitching | 0.76 | Contact/On-Base | 0.61 | Starters | 0.46 |
Orioles | Offense | 0.64 | Power | 0.82 | Bullpen | 0.49 |
Padres | Pitching | 0.88 | Contact/On-Base | 0.67 | Bullpen | 0.38 |
Phillies | Offense | 0.67 | Neutral (Contact) | 0.58 | Starters | 0.52 |
Pirates | Neutral (Offense) | 0.57 | Contact/On-Base | 0.68 | Starters | 0.48 |
Rangers | Neutral (Offense) | 0.57 | Neutral (Power) | 0.57 | Starters | 0.40 |
Rays | Pitching | 0.75 | Neutral (Power) | 0.53 | Bullpen | 0.47 |
Red Sox | Offense | 0.83 | Neutral (Contact) | 0.56 | Starters | 0.51 |
Reds | Offense | 0.77 | Neutral (Contact) | 0.57 | Bullpen | 0.46 |
Rockies | Offense | 0.78 | Contact/On-Base | 0.70 | Bullpen | 0.35 |
Royals | Pitching | 0.68 | Contact/On-Base | 0.71 | Bullpen | 0.40 |
Tigers | Offense | 0.62 | Power | 0.65 | Starters | 0.53 |
Twins | Neutral (Pitching) | 0.52 | Contact/On-Base | 0.63 | Bullpen | 0.44 |
White Sox | Neutral (Offense) | 0.52 | Power | 0.65 | Starters | 0.46 |
Yankees | Neutral (Offense) | 0.55 | Power | 0.62 | Bullpen | 0.69 |
Neutral refers to falling within the 40-60% range where there is not enough evidence to distinguish between the categories. |
Aggregating the data, I created team profiles that broadly characterize organizational philosophies. Of course, there are ebbs and flows over the course of 50 years, in the talent and strengths; however, it is interesting to find that the A’s and Orioles, for instance, can be characterized by power over contact in 80% of their seasons dating back to 1969. Furthermore, it is intriguing how different organizations have taken different paths to achieving the same result. In the overall focus, the Blue Jays, Cardinals, and Red Sox were all classified as being offensively minded over the 1969-2019 period. However, in their more specific offensive focus, each was different with the Blue Jays being very power oriented, Cardinals being contact oriented, and the Red Sox falling in the neutral range of fielding both power hitting and contact oriented teams over their history.
Though the pitching analysis did not produce as significant results, it did show that most of the pitching-oriented franchises are in the National League; one would presume this has a lot to do with how the DH affects the American League.
Shorter Period (2000-2019)
Finally, I performed all the same k-means clustering analysis from above on the period of 2000-19. My question was if the turn of the century and the changes that have been brought about by the Moneyball Era have led to significant changes in the tendency to field similarly constructed teams.
Aggregate Franchise Focuses from 2000-2019 | ||||||
Extreme Values in Red Suggest Distinct Organizational Philosophies | ||||||
Franchise | Overall Focus | Overall % | Offensive Focus | Offensive % | Pitching Focus | Pitching % |
Angels | Pitching | 0.70 | Neutral (Contact) | 0.52 | Neutral (Bullpen) | 0.55 |
Astros | Neutral | 0.50 | Neutral (Power) | 0.52 | Starters | 0.70 |
Athletics | Pitching | 0.80 | Power | 0.69 | Neutral | 0.50 |
Blue Jays | Offense | 0.70 | Power | 0.75 | Neutral (Bullpen) | 0.55 |
Braves | Offense | 0.70 | Contact/On-Base | 0.68 | Neutral (Bullpen) | 0.55 |
Brewers | Offense | 0.90 | Neutral (Power) | 0.52 | Bullpen | 0.70 |
Cardinals | Offense | 0.75 | Contact/On-Base | 0.61 | Neutral | 0.50 |
Cubs | Neutral (Offense) | 0.55 | Neutral (Power) | 0.54 | Starters | 0.65 |
Diamondbacks | Neutral (Offense) | 0.55 | Neutral | 0.50 | Starters | 0.70 |
Dodgers | Pitching | 0.65 | Contact/On-Base | 0.61 | Starters | 0.65 |
Giants | Neutral | 0.50 | Neutral (Contact) | 0.58 | Neutral | 0.50 |
Indians | Pitching | 0.65 | Neutral (Power) | 0.52 | Starters | 0.65 |
Mariners | Pitching | 0.70 | Neutral (Contact) | 0.55 | Bullpen | 0.70 |
Marlins | Neutral | 0.50 | Contact/On-Base | 0.74 | Neutral | 0.50 |
Mets | Pitching | 0.60 | Neutral (Power) | 0.56 | Starters | 0.65 |
Nationals/Expos | Neutral | 0.50 | Neutral (Contact) | 0.52 | Neutral (Starters) | 0.55 |
Orioles | Offense | 0.65 | Power | 0.62 | Bullpen | 0.75 |
Padres | Pitching | 0.75 | Neutral (Contact) | 0.58 | Bullpen | 0.85 |
Phillies | Neutral (Offense) | 0.55 | Power | 0.60 | Starters | 0.60 |
Pirates | Neutral (Offense) | 0.55 | Contact/On-Base | 0.66 | Bullpen | 0.70 |
Rangers | Offense | 0.80 | Power | 0.62 | Bullpen | 0.65 |
Rays | Pitching | 0.60 | Neutral (Power) | 0.57 | Bullpen | 0.65 |
Red Sox | Offense | 0.75 | Neutral (Contact) | 0.55 | Starters | 0.65 |
Reds | Offense | 0.80 | Power | 0.60 | Bullpen | 0.70 |
Rockies | Offense | 0.95 | Contact/On-Base | 0.71 | Bullpen | 0.70 |
Royals | Pitching | 0.65 | Contact/On-Base | 0.75 | Bullpen | 0.85 |
Tigers | Offense | 0.65 | Contact/On-Base | 0.64 | Starters | 0.60 |
Twins | Pitching | 0.65 | Contact/On-Base | 0.69 | Neutral | 0.50 |
White Sox | Pitching | 0.65 | Power | 0.68 | Starters | 0.70 |
Yankees | Offense | 0.60 | Power | 0.66 | Bullpen | 0.70 |
Neutral refers to falling within the 40-60% range where there is not enough evidence to distinguish between the categories. |
With only 20 seasons of data for each franchise, there is as substantially smaller sample size. Thus, a string of a few years oriented in one way or another can carry a lot of weight. There are slight changes from being neutral in the 1969-2019 analysis to having a slightly higher percentage in a particular focus like the Yankees (who have moved to more offense) or the Phillies (moved more neutral). Additionally, there are many teams whose identities hold true, such as the Brewers, Dodgers, Red Sox and Blue Jays.
However, there also are some major swings in philosophies between the two periods. The Athletics, for example, are defined as a substantially pitching-oriented franchise since 2000 (80%). Many people tend to forget that the Billy Beane A’s immortalized in Moneyball were anchored by a superb starting rotation in the early 2000s. Furthermore, the A’s pitching focus is neutral, as they have also been a factory of high-caliber relievers and a leader in innovative bullpens. Another interesting change is the Giants, who have dropped from being classified as an offensive organization. In fact, this lines up with their move in April 2000 to what is now Oracle Park, which has been known to be fairly pitcher-friendly. More generally, the 2000-19 period has more significant results in regard to pitching focuses. Where I struggled to find trends in the longer period from 1969-2019, the 2000-2019 period shows that franchises have had much higher propensities to choose to focus on building out either their rotations or their bullpens.
Conclusion
Though this analysis highlighted certain tendencies, it can’t really explain why, say, the Blue Jays tend to be power reliant. Certainly, home ballparks must play a role, especially for teams like the Rockies at Coors Field or the Padres at Petco Park. But is it the park that is leads teams to perform better in certain areas or are decision-makers building their rosters with their home ballpark in mind?
The trends we see may be a function of coaching staffs at both the major and minor league levels, with strengths in developing certain types of players or skills. GMs and owners can understandably have a strong influence over the identity of a team, as can a franchise’s own history. It’s common for club legends to take on special advising roles in the front office or to find them at spring training, lending advice to current players.
With some organizations, history does seem to repeat itself. For some, that’s a recipe for success. For others, looking at their past might be a key to turning their fortunes around.