Vintage data analysis

Published

July 9, 2025

Please note that this document is not optimised and is difficult to read on mobile.

To access the other parts of the book on mobile, click here. Table Of Content location on mobile

1 Acknowledgements

Thanks to Badaro for his data collection work and the archetype parser.
Thanks to Aliquanto for his initial analysis, which served as an inspiration for me.
Thanks to Jiliac for integrating Qonfused data and maintaining Format data.

2 Data include

This book contains data in Vintage format since 26/08/2024. If the period in question contains ban announcements, decks containing non-legal cards have been excluded, as have the associated matchups. Decks containing more than 30 copies of a single card have also been excluded. For decklist analyses, only decks valid for the format have been included (number of main deck cards > 60 and number of side cards <= 15).

3 Principal chapter

3.1 Metagame

3.1.1 Presence archetype

This chapter shows the representation of differences over time. Leagues are excluded from this analysis.

File in 3 parts :

The first shows the presence curves over time for each archetype or archetype base (weeks are expressed in 2 last digits years.weeks of the year). Archetypes with too low a presence are deactivated by default but can be reactivated by clicking on the desired decks. By default, certain Archetypes are hidden if their number is less than 0.25%.

Leagues are includes in this part

The second part shows the presence barchart of the different archetypes (archetype is define as other if their number is less than the minimum of 50 or 1%) and base archetypes for different time intervals:

  • all data

  • one moth

  • two weeks

  • one week

Additional information is available in tooltip (for Archetype and base archetype):

  • Number of copies of the deck

  • The delta in percent compared to the upper time interval

  • Deck rank and its evolution compared to the previous time interval

  • Win rates and confidence interval

    • The confidence interval graphs show the averages and 95% confidence intervals (calculated using the Agresti-Coull method).

    • The vertical red line represents the mean of the winrates and the dotted blue lines represent the mean of the upper and lower bounds of the confidence interval.

    • Top player 10 (top 10 lower CI winrate bound) average win rate and CI (A player need at least 20 rounds for Archetype and 10 for base archetype) result are show above error barIn particular, the publication of the top32 only for results from MTGO led to an overestimation of the winrates, the winrates were centred.

The last part present :

  • The représentation of each colors combinations in the format

  • Number of target for 2 cmc black removal.

  • The presence of different cards in the format.

Leagues are includes in this part

3.1.2 Matrix WR

This chapter focuses on the data for which we know the result of each match and the Archetype of the opponent.

In order to be included, an archetype must be represented more than 50 times in the dataset.

  • Matrix considers the matches as a whole (for example, a 2-1 score counts as 1 game won).

Part one focus on Archetype (aggragated) and part two on base archetype (parser archetype).

They are built on the following model (additionnal information in tooltip) one tab for all data and one tab for 1 month data:

  • Summary of data.

  • Bar chart shows the presence of each archetype and base archetype, as well as their win rate and some additional information in tooltips.

  • The confidence interval graphs show the averages winrates (without miror matchs) and 95% confidence intervals (calculated using the Agresti-Coull method). The vertical red line represents the mean of the winrates and the dotted blue lines represent the mean of the upper and lower bounds of the confidence interval. - Top player 10 (top 10 lower CI winrate bound) average win rate and CI (A player need at least 20 rounds for Archetype and 10 for base archetype) result are show above error bar

  • A complete matrix with all the information.

  • Multiple table with win rate of cards per matchup (only valid deck and matchup with more than 10 games are presented) : the first concentrates on the aggregated maccro archetypes and the second presents the sub-archetypes. Each part is organised in the same way, repeated 2 times, one for the maindeck and one for the sideboard.

  • Base cards: These are the cards present in decks almost exclusively in a given number of copies (deck numbers without the most common count less than 10).

  • Side/Mainboard cards: Cards present in variable numbers in the decks The third part explores the notion of the best deck according to a given metagame using the winrates obtained using the complete games obtained on the data set and the presence of each archetype over time (weeks are expressed in 2 last digits years.weeks of the year).

In order to determine an expected number of victories 2 criteria are used the average winrate and the lower bounds of the confidence interval.Please note that this part is still under construction as some decks with too few matchups are included.

3.2 Deck winrate

3.2.1 Card win rate table

Presents the win rate of each card in each archetype in the form of multiple tables.

The definition of each column is given in a tooltip accessible by passing the cursor over the column names.

This file is split into 2 parts : the first concentrates on the aggregated maccro archetypes and the second presents the sub-archetypes.

Each part is organised in the same way, repeated 2 times, one for the maindeck and one for the sideboard.

  • Base cards: These are the cards present in decks almost exclusively in a given number of copies (deck numbers without the most common count less than 10).

  • Side/Mainboard cards: Cards present in variable numbers in the decks

3.2.2 Cards WR models

This analysis attempts to use regression to determine the cards with the best performance inside archetype or base archetype.

A binomial regression is initially trained on a set of decks. In order to be included in this analysis the archetype must be present at least 50 times in the dataset.

In order to be considered a card must be included at least 50 times in either the main deck or the sideboards, one or the other being considered separately. In models comparing the number of copies of each card, when a number of copies is less than 50 it is grouped with an adjacent number of copies. For example, a card that is present 32 times in 1 copy 200 times in 2 copies, 15 times in 3 copies and 47 times in 4 copies would lead to the following result 1/2 : 232 and 3/4 : 62. The formulation 2-4 indicates that the numbers of copies 2, 3 and 4 have been grouped together.

Be careful, this part leads to results that I’m not really sure of. The interpretation of the regression coefficients seems really questionable, particularly in relation to the collinearity problem and the very large number of variables with sometimes small sample sizes. I would therefore encourage you to be very careful.

Templates are created separately for the maindeck and the sideboard and maindeck and side board pull together (Total 75) according to the following scheme :

  • Base Cards cards systematically present in decks with an almost fixed number of copies less than 50 decks that do not have the most common number of copies. decks with zero copies are grouped with the majority class) contained in the decks, for which the number of copies varies, quasibinomial regression models are created using the wins and losses of each deck :

    • Comparing for each card presence Most common count vs absence Other

    • Comparing each card count with a sufficient sample size Most common count vs 1 vs 3-4 for example

  • Uncommon Cards, These cards are not always included in decks, quasibinomial regression models are created using the wins and losses of each deck :

    • Comparing for each card presence +1 vs absence 0

    • Comparing each card count with a sufficient sample size 0 vs 1 vs 3-4 for example.

3.3 Best performing deck

3.3.1 Best deck analysis

This analysis attempts to use regression to determine the decks with the best performance inside archetype or base archetype.

A binomial regression is initially trained on a set of decks. In order to be included in this analysis the archetype must be present at least 50 times in the dataset.

In order to be considered a card must be included at least 25 times in either the main deck or the sideboards, one or the other being considered separately. In models comparing the number of copies of each card, when a number of copies is less than 25 it is grouped with an adjacent number of copies. For example, a card that is present 32 times in 1 copy 200 times in 2 copies, 15 times in 3 copies and 47 times in 4 copies would lead to the following result 1/2 : 232 and 3/4 : 62. The formulation 2-4 indicates that the numbers of copies 2, 3 and 4 have been grouped together. Be careful, this part leads to results that I’m not really sure of. The interpretation of the regression coefficients seems really questionable, particularly in relation to the collinearity problem and the very large number of variables with sometimes small sample sizes. I would therefore encourage you to be very careful.

A total of 6 quasibinomial regression models are created using the wins and losses of each deck:.

  • Two models using the deck as a whole (maindeck and sideboard)

    • Comparing for each card presence +1 vs absence 0.

    • Comparing each card count with a sufficient sample size 0 vs 1 vs 3-4 for example

  • Four separate models 2 for maindeck and 2 for sideboard

    • Comparing for each card presence +1 vs absence 0

    • Comparing each card count with a sufficient sample size 0 vs 1 vs 3-4 for example

These different models are then used to determine the 7 complete decks (maindeck and sideboard) with the highest probability of victory for each archetype (weeks are expressed in 2 last digits years.weeks of the year).

As well as the 7 maindecks and 7 sideboards with the highest probability of victory are presented for each archetype.Warning: this second part can lead to inconsistent combinations. It seemed useful if you want explore the maindecks and sides separately.

Table shows the top7 decks:

  • Firstly base cards (present in all decklist).

  • Variables cards are present as card name average number of cards[minimum; maximum number of cards]number of base cards* (if this card is also in base cards)

3.3.2 Top 8 deck

This chapter is divided by week over the last 3 weeks (weeks are expressed in 2 last digits years.weeks of the year). For each week the different tournaments with more than64players.-For each tournament, a bar graph shows the presence of each archetype and base archetype, as well as their win rate and some additional information in tooltips.

  • A table shows the top8 decks, their basic archetype Archetype the player (which is a link to the decklist), and the decklist itself.

3.4 New card

This chapter focuses on the cards that have recently entered the format (the latest 5 months). The aim is to present the number of times they have been included in decks and their winrates. The file is split into 3 parts:

  • A first part aggregating all the cards whether they are maindeck or sideboard and whatever the archetypes.

  • The second part is stratified by archetype and shows the presence and winrate of new cards when they are present in the main deck.

  • The third part is stratified by archetype and shows the presence and winrate of new cards when they are present in the sideboard.

For parts 2 and 3, the win rates of the cards are only described in situations with a number of wins and losses (excluding 5-0 leagues), but the presence of a card also includes 5-0 leagues.

4 Archetype aggregation

For the grouping of decks, the analyses are mainly centred around 2 concepts: archetype and base archetype. Base archetypes are very close to the archetypes returned by the XXX parser. The archetypes are a personal construction to try to solve two problems:

  • giving more flexibility to predict certain decks considered unknown by the parser
  • Group together decks with a small number of players that would be very close to a deck with a larger number of players.

Deck with banned cards or with 40 copies or more of a single card are excluded.

4.1 Predict model

5 models were trained on decks with a defined archetype over the last 6 months, or over the entire period of interest if it was longer than 6 months, with cross-validation on 5 folds. The hyper parameters of each model were chosen from a grid search.

  • C5 decision tree
  • Random forest
  • Elastic net regression
  • KNN
  • Xgboost

Then the ‘unknown’ decks or decks with an archetype with low sample size were predicted by each model returns a probability that the deck belongs to each training archetype. The results were aggregated by averaging the probability returned by each model that a deck belonged to one of the training archetypes. For decks with an average probability greater than 0.3, they were integrated into the most likely archetype on average according to the models.

Tabler summarise how the archetypes are aggregated
Custom corresponds to my definition of archetypes, also shown as Base_archetype in the data

Reference corresponds to Badaro definition of archetypes, also shown as Reference_archetype in the data

Parser
Custom
Reference
Custom Reference Percent Archetype Percent Sub Archetype Percent Archetype Percent Sub Archetype
Initiative (n :1238) Initiative Initiative 1016/1238(82.1%) 1016/1016(100%) 983/1238(79.4%) 983/983(100%)
Aggrovine Aggrovine 9/1238(0.7%) 9/9(100%) 9/1238(0.7%) 9/9(100%)
Other Aggro Other Aggro 158/1238(12.8%) 158/158(100%) 158/1238(12.8%) 158/158(100%)
Red Prison Red Prison 46/1238(3.7%) 46/46(100%) 46/1238(3.7%) 46/46(100%)
Eldrazi Eldrazi 5/1238(0.4%) 5/40(12.5%) 5/1238(0.4%) 5/40(12.5%)
Initiative Unknown 1016/1238(82.1%) 1016/1016(100%) 33/1238(2.7%) 33/58(56.9%)
Other Combo Other Combo 4/1238(0.3%) 4/6(66.7%) 4/1238(0.3%) 4/6(66.7%)
Jewel Shops (n :1211) Other Shops Other Shops 29/1211(2.4%) 29/39(74.4%) 29/1211(2.4%) 29/39(74.4%)
Jewel Shops Jewel Shops 719/1211(59.4%) 719/719(100%) 719/1211(59.4%) 719/719(100%)
Eldrazi Eldrazi 23/1211(1.9%) 23/40(57.5%) 23/1211(1.9%) 23/40(57.5%)
Raker Shops Raker Shops 440/1211(36.3%) 440/440(100%) 440/1211(36.3%) 440/440(100%)
Esper Lurrus Control (n :1097) Esper Lurrus Control Esper Lurrus Control 814/1097(74.2%) 814/814(100%) 811/1097(73.9%) 811/811(100%)
Other Lurrus Other Lurrus 258/1097(23.5%) 258/258(100%) 258/1097(23.5%) 258/258(100%)
Merfolk Merfolk 23/1097(2.1%) 23/25(92%) 23/1097(2.1%) 23/25(92%)
Other Shops Other Shops 1/1097(0.1%) 1/39(2.6%) 1/1097(0.1%) 1/39(2.6%)
Esper Lurrus Control Unknown 814/1097(74.2%) 814/814(100%) 3/1097(0.3%) 3/58(5.2%)
Other Combo Other Combo 1/1097(0.1%) 1/6(16.7%) 1/1097(0.1%) 1/6(16.7%)
Dredge (n :780) Dredge Dredge 780/780(100%) 780/780(100%) 780/780(100%) 780/780(100%)
Lurrus PO (n :684) PO PO 145/684(21.2%) 145/145(100%) 145/684(21.2%) 145/145(100%)
Lurrus Vault Key Lurrus Vault Key 53/684(7.7%) 53/53(100%) 53/684(7.7%) 53/53(100%)
Lurrus PO Unknown 486/684(71.1%) 486/486(100%) 3/684(0.4%) 3/58(5.2%)
Lurrus PO Lurrus PO 486/684(71.1%) 486/486(100%) 483/684(70.6%) 483/483(100%)
UB Lurrus Control (n :662) UB Lurrus Control UB Lurrus Control 662/662(100%) 662/662(100%) 662/662(100%) 662/662(100%)
Breach (n :564) Lurrus Breach Lurrus Breach 199/564(35.3%) 199/203(98%) 199/564(35.3%) 199/199(100%)
Tinker Tinker 134/564(23.8%) 134/134(100%) 134/564(23.8%) 134/134(100%)
Breach Breach 230/564(40.8%) 230/230(100%) 230/564(40.8%) 230/230(100%)
Other Combo Other Combo 1/564(0.2%) 1/6(16.7%) 1/564(0.2%) 1/6(16.7%)
Oath (n :428) Oath Oath 428/428(100%) 428/428(100%) 428/428(100%) 428/428(100%)
Doomsday (n :407) Doomsday Doomsday 407/407(100%) 407/407(100%) 407/407(100%) 407/407(100%)
Sphere Shops (n :378) Sphere Shops Sphere Shops 369/378(97.6%) 369/369(100%) 369/378(97.6%) 369/369(100%)
Other Shops Other Shops 9/378(2.4%) 9/39(23.1%) 9/378(2.4%) 9/39(23.1%)
Sultai (n :374) Sultai Midrange Sultai Midrange 255/374(68.2%) 255/255(100%) 255/374(68.2%) 255/255(100%)
Lurrus DRS Lurrus DRS 119/374(31.8%) 119/119(100%) 119/374(31.8%) 119/119(100%)
Lurrus Breach (n :193) Blue Control Blue Control 189/193(97.9%) 189/189(100%) 189/193(97.9%) 189/189(100%)
Lurrus Breach Unknown 4/193(2.1%) 4/203(2%) 4/193(2.1%) 4/58(6.9%)
Counter Vine (n :136) Countervine Countervine 136/136(100%) 136/136(100%) 136/136(100%) 136/136(100%)
PO (n :119) Beseech Storm Beseech Storm 119/119(100%) 119/119(100%) 119/119(100%) 119/119(100%)
Scam (n :74) Scam Scam 74/74(100%) 74/74(100%) 73/74(98.6%) 73/73(100%)
Scam Unknown 74/74(100%) 74/74(100%) 1/74(1.4%) 1/58(1.7%)
Oops All Spells (n :72) Oops All Spells Oops All Spells 72/72(100%) 72/72(100%) 72/72(100%) 72/72(100%)
Unknown (n :14) Unknown Unknown 13/14(92.9%) 13/14(92.9%) 13/14(92.9%) 13/58(22.4%)
Other Aggro (n :12) Eldrazi Eldrazi 12/12(100%) 12/40(30%) 12/12(100%) 12/40(30%)
Merfolk (n :2) Merfolk Merfolk 2/2(100%) 2/25(8%) 2/2(100%) 2/25(8%)
Invalid deck (< 60 cards) Tabler summarise how the archetypes are aggregated
Parser
Custom
Reference
Custom Reference Percent Archetype Percent Sub Archetype Percent Archetype Percent Sub Archetype
Unknown (n :14) Unknown Unknown 1/14(7.1%) 1/14(7.1%) 1/14(7.1%) 1/58(1.7%)

4.2 Proximity aggregation

If the median jaccard distance between 2 archetypes is smaller than the 3 quartiles of the internal distance within the archetype, these 2 archetypes will be grouped together. The table below shows the grouped archetypes:

Proximity aggregation
Total archetype name Base archetype name group
Breach Lurrus Breach
Breach Tinker
Esper Lurrus Control Other Lurrus
Initiative Other Aggro
Jewel Shops Other Shops
Jewel Shops Raker Shops
Lurrus Breach Blue Control
Lurrus PO Lurrus Vault Key
Lurrus PO PO
Other Aggro Eldrazi
PO Beseech Storm
Sultai Lurrus DRS