Vintage data analysis

Published

August 12, 2025

Please note that this document is not optimised and is difficult to read on mobile.

To access the other parts of the book on mobile, click here.

1 Acknowledgements

Thanks to Badaro for his data collection work and the archetype parser.
Thanks to Aliquanto for his initial analysis, which served as an inspiration for me.
Thanks to Jiliac for integrating Qonfused data and maintaining Format data.

2 Data include

This book contains data in Vintage format since 26/08/2024. If the period in question contains ban announcements, decks containing non-legal cards have been excluded, as have the associated matchups. Decks containing more than 30 copies of a single card have also been excluded. For decklist analyses, only decks valid for the format have been included (number of main deck cards > 60 and number of side cards <= 15).

3 Principal chapter

3.1 Metagame

3.1.1 Presence archetype

This chapter shows the representation of differences over time. Leagues are excluded from this analysis.

File in 3 parts :

The first shows the presence curves over time for each archetype or archetype base (weeks are expressed in 2 last digits years.weeks of the year). Archetypes with too low a presence are deactivated by default but can be reactivated by clicking on the desired decks. By default, certain Archetypes are hidden if their number is less than 0.25%.

Leagues are includes in this part

The second part shows the presence barchart of the different archetypes (archetype is define as other if their number is less than the minimum of 50 or 1%) and base archetypes for different time intervals:

all data
one moth
two weeks
one week

Additional information is available in tooltip (for Archetype and base archetype):

Number of copies of the deck
The delta in percent compared to the upper time interval
Deck rank and its evolution compared to the previous time interval
Win rates and confidence interval
- The confidence interval graphs show the averages and 95% confidence intervals (calculated using the Agresti-Coull method).
- The vertical red line represents the mean of the winrates and the dotted blue lines represent the mean of the upper and lower bounds of the confidence interval.
- Top player 10 (top 10 lower CI winrate bound) average win rate and CI (A player need at least 20 rounds for Archetype and 10 for base archetype) result are show above error barIn particular, the publication of the top32 only for results from MTGO led to an overestimation of the winrates, the winrates were centred.

The last part present :

The représentation of each colors combinations in the format
Number of target for 2 cmc black removal.
The presence of different cards in the format.

Leagues are includes in this part

3.1.2 Matrix WR

This chapter focuses on the data for which we know the result of each match and the Archetype of the opponent.

In order to be included, an archetype must be represented more than 50 times in the dataset.

Matrix considers the matches as a whole (for example, a 2-1 score counts as 1 game won).

Part one focus on Archetype (aggragated) and part two on base archetype (parser archetype).

They are built on the following model (additionnal information in tooltip) one tab for all data and one tab for 1 month data:

Summary of data.
Bar chart shows the presence of each archetype and base archetype, as well as their win rate and some additional information in tooltips.
The confidence interval graphs show the averages winrates (without miror matchs) and 95% confidence intervals (calculated using the Agresti-Coull method). The vertical red line represents the mean of the winrates and the dotted blue lines represent the mean of the upper and lower bounds of the confidence interval. - Top player 10 (top 10 lower CI winrate bound) average win rate and CI (A player need at least 20 rounds for Archetype and 10 for base archetype) result are show above error bar
A complete matrix with all the information.
Multiple table with win rate of cards per matchup (only valid deck and matchup with more than 10 games are presented) : the first concentrates on the aggregated maccro archetypes and the second presents the sub-archetypes. Each part is organised in the same way, repeated 2 times, one for the maindeck and one for the sideboard.
Base cards: These are the cards present in decks almost exclusively in a given number of copies (deck numbers without the most common count less than 10).
Side/Mainboard cards: Cards present in variable numbers in the decks The third part explores the notion of the best deck according to a given metagame using the winrates obtained using the complete games obtained on the data set and the presence of each archetype over time (weeks are expressed in 2 last digits years.weeks of the year).

In order to determine an expected number of victories 2 criteria are used the average winrate and the lower bounds of the confidence interval.Please note that this part is still under construction as some decks with too few matchups are included.

3.2 Deck winrate

3.2.1 Card win rate table

Presents the win rate of each card in each archetype in the form of multiple tables.

The definition of each column is given in a tooltip accessible by passing the cursor over the column names.

This file is split into 2 parts : the first concentrates on the aggregated maccro archetypes and the second presents the sub-archetypes.

Each part is organised in the same way, repeated 2 times, one for the maindeck and one for the sideboard.

Base cards: These are the cards present in decks almost exclusively in a given number of copies (deck numbers without the most common count less than 10).
Side/Mainboard cards: Cards present in variable numbers in the decks

3.2.2 Cards WR models

This analysis attempts to use regression to determine the cards with the best performance inside archetype or base archetype.

A binomial regression is initially trained on a set of decks. In order to be included in this analysis the archetype must be present at least 50 times in the dataset.

In order to be considered a card must be included at least 50 times in either the main deck or the sideboards, one or the other being considered separately. In models comparing the number of copies of each card, when a number of copies is less than 50 it is grouped with an adjacent number of copies. For example, a card that is present 32 times in 1 copy 200 times in 2 copies, 15 times in 3 copies and 47 times in 4 copies would lead to the following result 1/2 : 232 and 3/4 : 62. The formulation 2-4 indicates that the numbers of copies 2, 3 and 4 have been grouped together.

Be careful, this part leads to results that I’m not really sure of. The interpretation of the regression coefficients seems really questionable, particularly in relation to the collinearity problem and the very large number of variables with sometimes small sample sizes. I would therefore encourage you to be very careful.

Templates are created separately for the maindeck and the sideboard and maindeck and side board pull together (Total 75) according to the following scheme :

Base Cards cards systematically present in decks with an almost fixed number of copies less than 50 decks that do not have the most common number of copies. decks with zero copies are grouped with the majority class) contained in the decks, for which the number of copies varies, quasibinomial regression models are created using the wins and losses of each deck :
- Comparing for each card presence Most common count vs absence Other
- Comparing each card count with a sufficient sample size Most common count vs 1 vs 3-4 for example
Uncommon Cards, These cards are not always included in decks, quasibinomial regression models are created using the wins and losses of each deck :
- Comparing for each card presence +1 vs absence 0
- Comparing each card count with a sufficient sample size 0 vs 1 vs 3-4 for example.

3.3 Best performing deck

3.3.1 Best deck analysis

This analysis attempts to use regression to determine the decks with the best performance inside archetype or base archetype.

A binomial regression is initially trained on a set of decks. In order to be included in this analysis the archetype must be present at least 50 times in the dataset.

In order to be considered a card must be included at least 25 times in either the main deck or the sideboards, one or the other being considered separately. In models comparing the number of copies of each card, when a number of copies is less than 25 it is grouped with an adjacent number of copies. For example, a card that is present 32 times in 1 copy 200 times in 2 copies, 15 times in 3 copies and 47 times in 4 copies would lead to the following result 1/2 : 232 and 3/4 : 62. The formulation 2-4 indicates that the numbers of copies 2, 3 and 4 have been grouped together. Be careful, this part leads to results that I’m not really sure of. The interpretation of the regression coefficients seems really questionable, particularly in relation to the collinearity problem and the very large number of variables with sometimes small sample sizes. I would therefore encourage you to be very careful.

A total of 6 quasibinomial regression models are created using the wins and losses of each deck:.

Two models using the deck as a whole (maindeck and sideboard)
- Comparing for each card presence +1 vs absence 0.
- Comparing each card count with a sufficient sample size 0 vs 1 vs 3-4 for example
Four separate models 2 for maindeck and 2 for sideboard
- Comparing for each card presence +1 vs absence 0
- Comparing each card count with a sufficient sample size 0 vs 1 vs 3-4 for example

These different models are then used to determine the 7 complete decks (maindeck and sideboard) with the highest probability of victory for each archetype (weeks are expressed in 2 last digits years.weeks of the year).

As well as the 7 maindecks and 7 sideboards with the highest probability of victory are presented for each archetype.Warning: this second part can lead to inconsistent combinations. It seemed useful if you want explore the maindecks and sides separately.

Table shows the top7 decks:

Firstly base cards (present in all decklist).
Variables cards are present as card name average number of cards[minimum; maximum number of cards]number of base cards* (if this card is also in base cards)

3.3.2 Top 8 deck

This chapter is divided by week over the last 3 weeks (weeks are expressed in 2 last digits years.weeks of the year). For each week the different tournaments with more than64players.-For each tournament, a bar graph shows the presence of each archetype and base archetype, as well as their win rate and some additional information in tooltips.

A table shows the top8 decks, their basic archetype Archetype the player (which is a link to the decklist), and the decklist itself.

3.4 New card

This chapter focuses on the cards that have recently entered the format (the latest 5 months). The aim is to present the number of times they have been included in decks and their winrates. The file is split into 3 parts:

A first part aggregating all the cards whether they are maindeck or sideboard and whatever the archetypes.
The second part is stratified by archetype and shows the presence and winrate of new cards when they are present in the main deck.
The third part is stratified by archetype and shows the presence and winrate of new cards when they are present in the sideboard.

For parts 2 and 3, the win rates of the cards are only described in situations with a number of wins and losses (excluding 5-0 leagues), but the presence of a card also includes 5-0 leagues.

4 Archetype aggregation

For the grouping of decks, the analyses are mainly centred around 2 concepts: archetype and base archetype. Base archetypes are very close to the archetypes returned by the XXX parser. The archetypes are a personal construction to try to solve two problems:

giving more flexibility to predict certain decks considered unknown by the parser
Group together decks with a small number of players that would be very close to a deck with a larger number of players.

Deck with banned cards or with 40 copies or more of a single card are excluded.

4.1 Predict model

5 models were trained on decks with a defined archetype over the last 6 months, or over the entire period of interest if it was longer than 6 months, with cross-validation on 5 folds. The hyper parameters of each model were chosen from a grid search.

C5 decision tree
Random forest
Elastic net regression
KNN
Xgboost

Then the ‘unknown’ decks or decks with an archetype with low sample size were predicted by each model returns a probability that the deck belongs to each training archetype. The results were aggregated by averaging the probability returned by each model that a deck belonged to one of the training archetypes. For decks with an average probability greater than 0.3, they were integrated into the most likely archetype on average according to the models.

Tabler summarise how the archetypes are aggregated
Custom corresponds to my definition of archetypes, also shown as Base_archetype in the data Reference corresponds to Badaro definition of archetypes, also shown as Reference_archetype in the data
	Parser		Custom		Reference
	Custom	Reference	Percent Archetype	Percent Sub Archetype	Percent Archetype	Percent Sub Archetype
Initiative (n :1414)	Initiative	Initiative	1145/1414(81%)	1145/1146(99.9%)	1100/1414(77.8%)	1100/1100(100%)
	Aggrovine	Aggrovine	9/1414(0.6%)	9/9(100%)	9/1414(0.6%)	9/9(100%)
	Other Aggro	Other Aggro	168/1414(11.9%)	168/169(99.4%)	168/1414(11.9%)	168/169(99.4%)
	Red Prison	Red Prison	47/1414(3.3%)	47/47(100%)	47/1414(3.3%)	47/47(100%)
	Eldrazi	Eldrazi	39/1414(2.8%)	39/45(86.7%)	39/1414(2.8%)	39/45(86.7%)
	Initiative	Unknown	1145/1414(81%)	1145/1146(99.9%)	45/1414(3.2%)	45/85(52.9%)
	Other Combo	Other Combo	4/1414(0.3%)	4/5(80%)	4/1414(0.3%)	4/5(80%)
Esper Lurrus Control (n :1173)	Esper Lurrus Control	Esper Lurrus Control	900/1173(76.7%)	900/901(99.9%)	895/1173(76.3%)	895/896(99.9%)
	Other Lurrus	Other Lurrus	249/1173(21.2%)	249/249(100%)	249/1173(21.2%)	249/249(100%)
	Merfolk	Merfolk	23/1173(2%)	23/25(92%)	23/1173(2%)	23/25(92%)
	Esper Lurrus Control	Unknown	900/1173(76.7%)	900/901(99.9%)	5/1173(0.4%)	5/85(5.9%)
Dredge (n :866)	Dredge	Dredge	866/866(100%)	866/866(100%)	866/866(100%)	866/866(100%)
Breach (n :865)	Lurrus Breach	Lurrus Breach	232/865(26.8%)	232/232(100%)	232/865(26.8%)	232/232(100%)
	Blue Control	Blue Control	190/865(22%)	190/190(100%)	190/865(22%)	190/190(100%)
	Tinker	Tinker	189/865(21.8%)	189/189(100%)	189/865(21.8%)	189/189(100%)
	Other Shops	Other Shops	4/865(0.5%)	4/41(9.8%)	4/865(0.5%)	4/41(9.8%)
	Breach	Breach	249/865(28.8%)	249/249(100%)	239/865(27.6%)	239/239(100%)
	Other Combo	Other Combo	1/865(0.1%)	1/5(20%)	1/865(0.1%)	1/5(20%)
	Breach	Unknown	249/865(28.8%)	249/249(100%)	10/865(1.2%)	10/85(11.8%)
Jewel Shops (n :831)	Jewel Shops	Jewel Shops	828/831(99.6%)	828/828(100%)	828/831(99.6%)	828/828(100%)
Jewel Shops (n :831)	Other Shops	Other Shops	3/831(0.4%)	3/41(7.3%)	3/831(0.4%)	3/41(7.3%)
UB Lurrus Control (n :821)	Scam	Scam	78/821(9.5%)	78/78(100%)	78/821(9.5%)	78/78(100%)
	UB Lurrus Control	UB Lurrus Control	743/821(90.5%)	743/743(100%)	741/821(90.3%)	741/741(100%)
	UB Lurrus Control	Unknown	743/821(90.5%)	743/743(100%)	2/821(0.2%)	2/85(2.4%)
Lurrus PO (n :777)	PO	PO	176/777(22.7%)	176/176(100%)	176/777(22.7%)	176/176(100%)
	Lurrus Vault Key	Lurrus Vault Key	53/777(6.8%)	53/53(100%)	53/777(6.8%)	53/53(100%)
	Lurrus PO	Unknown	548/777(70.5%)	548/548(100%)	2/777(0.3%)	2/85(2.4%)
	Lurrus PO	Lurrus PO	548/777(70.5%)	548/548(100%)	546/777(70.3%)	546/546(100%)
Raker Shops (n :545)	Other Shops	Other Shops	20/545(3.7%)	20/41(48.8%)	20/545(3.7%)	20/41(48.8%)
	Raker Shops	Raker Shops	521/545(95.6%)	521/521(100%)	521/545(95.6%)	521/521(100%)
	Eldrazi	Eldrazi	4/545(0.7%)	4/45(8.9%)	4/545(0.7%)	4/45(8.9%)
Doomsday (n :468)	Doomsday	Doomsday	468/468(100%)	468/468(100%)	468/468(100%)	468/468(100%)
Oath (n :454)	Oath	Oath	454/454(100%)	454/454(100%)	454/454(100%)	454/454(100%)
Sphere Shops (n :423)	Sphere Shops	Sphere Shops	409/423(96.7%)	409/409(100%)	409/423(96.7%)	409/409(100%)
Sphere Shops (n :423)	Other Shops	Other Shops	14/423(3.3%)	14/41(34.1%)	14/423(3.3%)	14/41(34.1%)
Sultai (n :396)	Sultai Midrange	Sultai Midrange	267/396(67.4%)	267/267(100%)	267/396(67.4%)	267/267(100%)
Sultai (n :396)	Lurrus DRS	Lurrus DRS	129/396(32.6%)	129/129(100%)	129/396(32.6%)	129/129(100%)
Counter Vine (n :148)	Countervine	Countervine	148/148(100%)	148/148(100%)	148/148(100%)	148/148(100%)
Tinker (n :129)	Beseech Storm	Beseech Storm	129/129(100%)	129/129(100%)	129/129(100%)	129/129(100%)
Oops All Spells (n :74)	Oops All Spells	Oops All Spells	74/74(100%)	74/74(100%)	74/74(100%)	74/74(100%)
Stiflenought (n :54)	Stiflenought	Stiflenought	54/54(100%)	54/54(100%)	54/54(100%)	54/54(100%)
Unknown (n :20)	Unknown	Unknown	12/20(60%)	12/20(60%)	12/20(60%)	12/85(14.1%)
Merfolk (n :2)	Merfolk	Merfolk	2/2(100%)	2/25(8%)	2/2(100%)	2/25(8%)
Eldrazi (n :2)	Eldrazi	Eldrazi	2/2(100%)	2/45(4.4%)	2/2(100%)	2/45(4.4%)

Invalid deck (< 60 cards) Tabler summarise how the archetypes are aggregated
	Parser		Custom		Reference
	Custom	Reference	Percent Archetype	Percent Sub Archetype	Percent Archetype	Percent Sub Archetype
Initiative (n :1414)	Initiative	Unknown	1/1414(0.1%)	1/1146(0.1%)	1/1414(0.1%)	1/85(1.2%)
Initiative (n :1414)	Other Aggro	Other Aggro	1/1414(0.1%)	1/169(0.6%)	1/1414(0.1%)	1/169(0.6%)
Esper Lurrus Control (n :1173)	Esper Lurrus Control	Esper Lurrus Control	1/1173(0.1%)	1/901(0.1%)	1/1173(0.1%)	1/896(0.1%)
Unknown (n :20)	Unknown	Unknown	8/20(40%)	8/20(40%)	8/20(40%)	8/85(9.4%)

4.2 Proximity aggregation

If the median jaccard distance between 2 archetypes is smaller than the 3 quartiles of the internal distance within the archetype, these 2 archetypes will be grouped together. The table below shows the grouped archetypes:

Proximity aggregation
Total archetype name	Base archetype name group
Breach	Blue Control
Breach	Lurrus Breach
Breach	Tinker
Esper Lurrus Control	Other Lurrus
Initiative	Other Aggro
Lurrus PO	Lurrus Vault Key
Lurrus PO	PO
Raker Shops	Other Shops
Sultai	Lurrus DRS
Tinker	Beseech Storm
UB Lurrus Control	Scam