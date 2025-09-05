If you’ve read any of my recent golf stories — like this one about Tiger Woods’ peak or this one on Scottie Scheffler’s dominance — then you’ve already seen the influence of Data Golf, whether you realized it or not.

Over the past decade, the site has become an essential part of golf analysis — not only for bettors, but for fans and journalists (like me) who want to make sense of the sport in a deeper, more statistical way. Brothers Matt and Will Courchene, the creators of Data Golf, have built an analytics hub with numbers that both predict future outcomes and shape how we think about players’ performances and legacies.

I recently emailed with Matt to talk about how Data Golf got started, the challenges of modeling such a noisy sport, the effect of LIV and legalized betting on their site, and why some of golf’s conventional wisdom doesn’t actually hold up under the numbers.

⛳Neil Paine: What’s the origin story of Data Golf? When you first started the site, did you imagine it becoming what it is now?

Matt Courchene: Growing up, Will and I were obsessed with golf. We were both pretty good juniors and we both played on the golf team at Queen's University (in Kingston, Ontario). In early 2016, while I was in grad school (economics) and Will was starting his career as a data analyst, we applied for academic access to the PGA Tour's data. Will wanted to use it to work on his data-analysis skill set and portfolio, and I think I was mainly involved to provide an academic email address. We were approved for the data, and eventually started a blog—that is still online—where we dug into random questions and topics in golf. Around this time Will also started a Twitter account, taking about 15 seconds to come up with the handle "DataGolf". Pretty early on (Feb 2017), we had the first version of our model for projecting golfer performance, which forms the backbone of a lot of what is on our site today.

We never could have imagined the path Data Golf has taken when we started that blog back in 2016. Having this be a full-time occupation was not on my radar at all, and we probably take for granted how cool it is that outlets like Golf Channel or ESPN cite our work.

⛳NP: Golf seems noisy compared with other sports. What’s been your biggest challenge in trying to predict it? And what’s the biggest misconception people have about golf models, or just forecasting models in general?

MC: Golf performance is noisy, meaning that most of the variation we see in scores on a given day is not predictable (at least not as far as we know), but that doesn't necessarily mean it's hard to model. Compared to other sports, the fact that an individual golfer's performance is (mostly) unaffected by other golfers greatly simplifies the modelling problem. When analyzing past performance, a lot of the potentially complex variables—weather, course conditions, the huge variability in the courses themselves—can be mostly removed by using a relative measure (like strokes better than the field average).

The most challenging aspect to predicting golf is actually finding things that are meaningfully predictive! If you have a baseline model for projecting golfer performance that just uses a time-weighted average of their past relative scores—adjusted for field strength—this will be surprisingly hard to improve upon. In our early days, this simple setup was pretty much our model. We've added refinements over the years—adjustments to account for player-course fit, using the SG categories to extract more predictive value from scores—but all of these were hard-earned additions to the model's predictive accuracy. This relates to the biggest misconception I think lay people have with golf models, which is that they are very complex and take into account every variable imaginable.

⛳NP: I would imagine one of the top use-cases for your site is betting -- how has the runaway growth of that industry during the Data Golf Era changed your process (or has it)?

MC: When Will and I started working on Data Golf, we had no experience or interest in sports betting. But even before we started regularly publishing predictions, we noticed that a lot of the interest and feedback on our blog posts came from bettors and DFS players. When sports betting was legalized in the US in 2018, it presented us with a natural business model. In April 2019, we launched a paid subscription to our site—basically the model predictions plus a couple odds screens—which set us on the path to doing this full-time.

This impacted our site and process in a few ways. We spent more time working on improvements to the model and betting-specific pages in the first few years of the subscription product. Constantly comparing our probabilities to the betting market highlighted model deficiencies, leading to improvements like incorporating weather forecasts by tee time, adjusting players' skill when they are leading, and adding nationality-by-host-country effects (think of Americans underperforming at the Open Championship). Having to maintain odds screens—ensuring that all our sportsbook scrapers are working properly—has been a constant source of pain over the years.

It has been an interesting experience to essentially become a betting company by accident. Having hundreds or thousands of people betting based on our model's predictions is not always the most pleasant experience. But we've tried to be transparent with our model and betting results, and that has served us well. Our overarching goal has always been to make the best golf-focused website possible. Because a lot of our site is built around pieces of the model, we didn't have to pivot that much to offer a good betting-focused product. Any work that we put into the model helps the betting product, but it also often helps other parts of the site too (for example, adding pressure adjustments to the model led to the creation of the pressure tool).

⛳NP: On a related note, it seems like the other huge change in this era of golf has been the creation of LIV and top players splitting off of the PGA Tour. Was that frustrating to have to deal with, or did you see it as an opportunity? (It seems like it's been a positive for DG? Your rankings are now better than the OWGR because they factor LIV in.)

MC: LIV was very frustrating to deal with as a golf fan. Also, because we are pretty plugged into golf media and Twitter, we couldn't escape the endless LIV-defection rumour mill. From a website standpoint, it has brought more eyeballs to our rankings, although we've been open about how we don't think score-based rankings like ours are suitable as an official system (see the second section of this newsletter). Still, there's no doubt we have become a valuable resource for people looking to evaluate LIV players—especially leading up to majors. Incorporating LIV coverage into the site created more work for us initially, but I've now become used to the increased content and betting opportunities during weeks when the three main tours are in action, so I'll miss that if LIV ever goes away.

⛳NP: Has working on this project changed the way you think about golf? Have there been any surprising lessons that made you rethink parts of the game's conventional wisdom?

MC: It has impacted my thinking on a lot of different issues in golf, but I can't think of a broader theme that ties them together. One recent popular narrative is that professional golf is less skillful than it used to be—that it has become a "bomb-and-gouge" game. The idea is that advances in technology—particularly more forgiving drivers and lower-spin golf balls—have decreased the penalty for a mishit and encouraged players to swing as hard as possible off the tee. This has led to a narrowing of the skill distribution, and given driving distance an outsized role in determining success in pro golf.

This makes sense as a theory, but the data doesn't really support it. For one, it's hard to argue that the best golfers can't separate anymore when Scottie Scheffler is posting historically dominant numbers—driven primarily by his ball-striking. Second, a variance decomposition of the four strokes-gained categories in every season of the SG era (2004–present) shows a remarkable consistency in where players gain strokes over time. If the pro game had fundamentally changed, these numbers should reflect it. In particular, if it has become more bomb-and-gouge, we should probably see off-the-tee's contribution to scoring separation increasing and approach play's contribution declining.

Finally, our past analysis of driving distance, accuracy, and performance (up to 2019) did show a recent uptick in distance's importance while accuracy declined. But over the last five years, accuracy’s correlation has rebounded and distance has dropped. One reason for this may be the lack of true bombers’ paradises on the PGA Tour. It's interesting to go through the 2025 schedule and note how many courses clearly favour bombers—there aren't that many anymore.

⛳NP: You recently introduced DG Points as a way to quantify accomplishments across eras. What inspired that system, and how do you see it fitting into the ways we measure greatness across other sports with WAR, Approximate Value, etc.?

MC: The inspiration for Data Golf points was our dissatisfaction with using strokes-gained as a metric for evaluating seasons and especially careers. I would say SG is analogous to WAR in that it uses the fundamental unit of the sport (strokes; runs) but is not tied directly to results. In baseball this makes sense since individual players don't fully control a team's wins and losses, but in an individual sport like golf we have each player's result every week. We wanted a metric that used these results directly when evaluating the quality of a season or career.

Comparing the careers of Phil Mickelson and Jim Furyk provides an illustrative example. They've had very similar careers from a cumulative SG standpoint, but Phil has won a lot more tournaments and leads the major tally 6-1. As a result, Phil ranks much higher on any subjective all-time list. (On our career DG Points list since 1983, Phil is 2nd while Furyk is a distant 6th.)

The main idea behind the DG Point system is that points are assigned based on how "difficult" it is to achieve a given result—determined by the strength and size of the field. In practice, we calculate difficulty by estimating the probability that a "Top 5 player" would achieve that result. If we estimate that a Top 5 player is equally likely to finish 5th in a major as to win a LIV event, those two results earn the same points. Like most point systems, DG Points reward wins and high finishes disproportionately, and they set a floor for bad play (0 points). We've tried to make the system as objective as possible by grounding it in this difficulty-to-achieve framework.

⛳NP: What do you think about the current state of golf analytics overall? Where is it relative to some of the team sports (MLB, NBA) where it's gotten a ton of traction?

MC: Golf analytics took a huge step forward when Mark Broadie created the strokes-gained concept 20 years ago, and it seems like that has only recently been fully embraced by telecasts and golf media. I'd love to see everyone start using adjusted (for field strength) SG statistics—especially now that the PGA Tour essentially has a two-tiered schedule—but I think that is still a ways off.

In terms of an analytics community in golf, we don't have much of one compared to what exists for baseball and football. That may partly reflect the fact that golf is a smaller sport, but it's also likely because there aren't many jobs for analysts. Every MLB and NFL team now has an analytics department, whereas only a few companies and individuals analyze data for professional golfers. There are several golf-focused betting or media companies that use analytics, but they aren't generally pushing the research frontier. Another issue is that shot-level data from the PGA Tour is no longer easily accessible—the academic program I mentioned earlier was discontinued after the data started being licensed to sportsbooks.

I think there's still a ton of work to be done with shot-level data. For various reasons, most of the work on our site is at the round level—it's easier to present, the main stats are available across all major tours, and [shot-level analysis] is not that additive for predicting round-level outcomes. But for deeper questions around topics like player performance (e.g. refining SG to account for more granular data on each shot), optimal strategy (e.g. are players too aggressive to tucked pins?), and course fit (e.g. why did Oakmont's rough favour bombers?), there's huge potential for new insights from shot-level analysis.

⛳NP: Around the missing shot-level data from the PGA Tour — for things like optimal strategy or course fit, do you see any value in substituting other sources, like TrackMan data or the strokes-gained tracking apps that amateurs use? (If you could get a hold of that data?) Or is the gap between pros and regular players so large that anything gleaned from the latter just doesn’t translate to Tour-level insights, even at the same courses?

MC: I should clarify that the shot-level data is still scrapeable from the PGA Tour (or other sources like IMG Arena's shot tracker, depending whether having just the play-by-play text is sufficient), it's just not as easy to access as it once was. Regarding the amateur data, I think it would be useful for developing a framework to understand topics but I doubt the specific insights would be too applicable to the pros.

Like, for course fit on a specific hole, if it favours longer amateurs I wouldn't assume it favours longer pros, just because of how different they might play the hole (e.g. par 5s that are reachable for pros but not ams). Similar thing with strategy, I can think of a lot of things amateurs do strategically that pros don't (e.g. ams chronically under-club on approach shots). I'm also personally a lot more interested in the pro insights—e.g. is Justin Thomas specifically a more aggressive iron player than average?—so the amateur data doesn't get me quite as excited.

⛳NP: Do you have a favorite stat or model output on the site — one that’s underrated or doesn’t get enough love? Relatedly, are there any "weirdo" players who break your models?

MC: I really enjoy the insights that come from our amateur data. An important concept on our site is adjusted or "true" SG, which is just strokes-gained—the number of strokes you beat the field by—adjusted for field strength. This relies on the connectivity of professional golf: as long as two players can be connected through a chain of opponents, then their performances can be fairly compared (in theory). Because top amateurs often compete in professional events, particularly on minor tours, we are able to extend the adjusted SG metric to our amateur data (which covers all WAGR-sanctioned events). Looking at Jon Rahm's data, for instance, we can see he was already performing at the level of a top-50 player when he turned professional. Our all-time amateur ranking is also a fascinating list for golf nerds.

There are specific types of players that our model struggles with. One example we highlighted recently in our live blog at the Open Championship is Justin Rose. In recent seasons, Rose has tended to put together a few great weeks alongside a number of mediocre or poor ones. Because our model evaluates performance at the round level rather than the event level, it doesn't credit Rose for the fact that his good rounds came in bunches. This leads our model to be much lower than the betting market on Rose's probability of winning tournaments. Essentially the disagreement comes down to how much predictive value there is in Rose's within-tournament streakiness. One way to capture this would be modelling uncertainty in player skill at the event level—i.e. Rose might show up this week as a +0.5 SG player or a +1.5 player—rather than only at the round level.

⛳NP: Thank you for this, this was outstanding. There's so much here — I really appreciate your thoughtful responses.

MC: Glad you found the answers interesting. I find these big-picture, introspective kinds of questions useful to think about, but also surprisingly hard to answer! Thanks for using our work in some of your articles, and hopefully we'll be in touch again soon.

