Talk:AlphaZero

Page contents not supported in other languages.
Source: Wikipedia, the free encyclopedia.
WikiProject iconGoogle Low‑importance
WikiProject iconThis article is within the scope of WikiProject Google, a collaborative effort to improve the coverage of Google and related topics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
LowThis article has been rated as Low-importance on the project's importance scale.
WikiProject Google To-do:

products
  • Other :
    {{subst:Wikipedia:WikiProject Google/Invite Members}}
    • Infobox Images with transparent areas needing a different background color
  • Dubious (AlphaZero performance in chess)

    "Only 1 GB RAM hash." - hash size is not a significant factor, this could have weakened Stockfish by only 10 Elo or so.

    "Non-standard time control" - the engines do not care.

    "No endgame tablebases." - just like AlphaZero. GregorB (talk) 15:56, 7 December 2017 (UTC)[reply]

    BTW, I don't even think that Kaufman's claim ("AlphaZero had effectively built its own opening book") is correct - it's not really a "book" in any meaningful sense of the word if one cannot extract its content, i.e. if there is no way to distinguish a book move from a non-book move. GregorB (talk) 16:00, 7 December 2017 (UTC)[reply]

    I don't know if -10 Elo is about right for hash mis-sizing but Figure 2 in the arxiv doc suggests a 25-ELO advantage for Alphazero over Stockfish at 1s/move. -10 Elo sounds important if Alphazero appears to be only 25 Elo better.

    Stockfish will normally take longer on moves which give rise to better/worse positions than expected, removing this ability will affect its play since it expects to be able to do this. — Preceding unsigned comment added by 212.159.20.170 (talkcontribs) 17:01, 7 December 2017 (UTC)[reply]

    Removed the bullets - they are first and foremost
    WP:OR, dubious or not. Should not be restored without a supporting source. GregorB (talk) 18:56, 7 December 2017 (UTC)[reply
    ]
    Any unsupported claims that the match was an "enormous victory" for AlphaZero should be considered dubious simply because each engine was supported by non-symmetrical resources. Unless specifically designed as a handicapped match, chess games in general should be held with equal resources assigned to each player. Claims about the notability of this match should be in the article only if supported by references, not the opinion of (us) Wiki editors alone.—LithiumFlash (talk) 17:32, 13 December 2017 (UTC)[reply]
    The statement tagged is that 28 wins, no losses out of 100 games is "an enormous margin of victory". Nakamura in his criticism of the match stated that '"I am pretty sure God himself could not beat Stockfish 75 percent of the time with White without certain handicaps"', thus buttressing the point that it is an enormous margin of victory, so the statement seems both accurate and sourced.
    Nobody in the news coverage, to my knowledge, has complained that Stockfish was unfairly denied access to these TPU's that it would in no meaningful way be able to make use of, but if you can find a good source we can add this complaint to the article. Rolf H Nelson (talk) 03:25, 14 December 2017 (UTC)[reply]
    The quotes in the reference certainly do not point to Nakamura regarding the AZ wins as an enormous victory. But any of his commentary can certainly be added to this article (as some of it is already in "Reaction").
    As for each engine's access to computing resources, there is already content in the article: "Stockfish was basically running on what would be my laptop. If you wanna have a match that's comparable you have to have Stockfish running on a super computer as well." (last paragraph in "Reaction"), and: "Stockfish...was playing with far more search threads than has ever received any significant amount of testing" (in Notes). The papers on this topic appear to indicate that games were played without the benefit of a tournament director, or a formal invitation for the developers of Stockfish to submit their preferred engine and settings.—LithiumFlash (talk) 15:52, 14 December 2017 (UTC)[reply]
    You're right, the "super computer" quote is decently sourced (to chess.com). I guess I would still personally advocate though, per
    WP:DUE opinion that takes into account the possibility that porting, tuning, and testing Stockfish on a given supercomputer setup may be a large project, and if all it does is allow Stockfish to look a couple more turns ahead, it won't make any significant difference if Stockfish still thinks it's looking at a draw, in positions when even humans can see that it lost ten turns ago when all its pieces got trapped in the corner. Rolf H Nelson (talk) 18:45, 16 December 2017 (UTC)[reply
    ]
    Using a super-computer to pre-calculate various constants used in AlphaZero's neural network is not the same as using the super-computer at run-time during the tournament. JRSpriggs (talk) 03:37, 17 December 2017 (UTC)[reply]
    According to the preprint, end of page 4, AlphaZero used a single machine with 4 TPU's during actual play. A TPU is allegedly 15x to 30x faster than modern GPU or CPUs for certain machine learning tasks, so calling it a 4 TPU machine a supercomputer does not seem completely unreasonable, depending on your definition of supercomputer. Rolf H Nelson (talk) 23:28, 19 December 2017 (UTC)[reply]
    I believe the TPU google used ran at 180 TFLOPS, as stated in a wikipedia article on google's TPUs. AlphaZero used 4 TPUs. I'd estimate that 4 TPUs is equivalent to roughly 4 * 3000 = 12,000 modern high end PCs. That's a bit of an advantage lol. It would be informative if that's added to the article. 108.93.181.106 (talk) 20:57, 27 February 2018 (UTC)[reply]
    The '180 TFLOPS' claim isn't as straightforward as it sounds since we don't know how many bits of precision. The first version of the TPU was only 8-bit integer, not even floating point. Take a look at the last 2 paragraphs of the referenced article which points out that the nVidia Volta card provides 120 'deep learning TFLOPS' whatever that's supposed to mean. So basically for all we know it could be equivalent to a single PC with one or two Volta cards. 86.4.77.7 (talk) 22:26, 12 March 2018 (UTC)[reply]
    Stockfish is running on a general CPU. Alpha-Zero is running on a high end CPU that was specifically designed for this type of neural networking. — Preceding unsigned comment added by 23.240.1.247 (talk) 23:53, 23 November 2018 (UTC)[reply]

    I think training times before printed match results are 9h/12h/34h, not 4h/2h/8h

    This edit appears correct; revisiting the preprint (p.4, p. 15) makes me think that the matches had AlphaZero fully trained at 9h/12h/34h for chess/shogi/go. Does that sound right? The Guardian states "It took just four hours to learn the rules to chess before beating the world champion chess program, Stockfish 8, in a 100-game match up", but maybe they were confused, or maybe they were talking about the 1-second-per-trial Elo games? In any case, we should just go ahead and report the preprint values instead for the reported 100-game tournament results, right?

    From the preprint:

    p. 4:

    (Figure 1 shows at what point AlphaZero starts to outperform the other programs)

    In chess, AlphaZero outperformed Stockfish after just 4 hours (300k steps); in shogi, AlphaZero outperformed Elmo after less than 2 hours (110k steps); and in Go, AlphaZero outperformed AlphaGo Lee (29) after 8 hours (165k steps).

    We evaluated the fully trained instances of AlphaZero against Stockfish, Elmo and the previous version of AlphaGo Zero (trained for 3 days) in chess, shogi and Go respectively, playing 100 game matches at tournament time controls of one minute per move.

    p. 15:

    (Table S3 contains the 9h/12h/34h figure)

    Elo ratings were computed from the results of a 1 second per move tournament between iterations of AlphaZero during training, and also a baseline player: either Stockfish, Elmo or AlphaGo Lee respectively. The Elo rating of the baseline players was anchored to publicly available values.

    We also measured the head-to-head performance of AlphaZero against each baseline player. Settings were chosen to correspond with computer chess tournament conditions: each player was allowed 1 minute per move...

    Rolf H Nelson (talk) 02:17, 2 January 2018 (UTC)[reply]

    I blind-emailed a source within DeepMind who graciously confirmed for me that they "used the full length of training (otherwise it would tautologically always be about the same performance as the benchmark)", and that the journal article would make it easier to follow the distinction between the two sets of times than the preprint did. So the 9h/12h/34h is correct for the reported 100-game match results. Rolf H Nelson (talk) 16:34, 6 January 2018 (UTC)[reply]

    Rolf H Nelson (talk) 16:34, 6 January 2018 (UTC)[reply]

    Leela Zero

    @Rolf h nelson: please explain what kind of sources you are looking for with this edit. Are you looking for a non-Chessdom source that Leela Zero is based on Alpha Zero? That it was the first NN-based engine that competed? That Leela Zero went +1 =2 -25 in TCEC season 12 Div 4? Banedon (talk) 10:11, 26 April 2018 (UTC)[reply]

    Wikipedia is not an indiscriminate collection of trivia; instead the content is based on prominent reliable sources such as mainstream news articles and prominent peer-reviewed journal articles. You need to present us with a stronger source, in order to demonstrate to your fellow editors that the existence of Leela Zero provides substantial insight about AlphaZero, that it would be of interest to people looking to read about AlphaZero, or that is otherwise considered important enough to note in this article about AlphaZero. If that can be established with a source of sufficient
    WP:WEIGHT, I'm fine with chessdom filling in background information. Rolf H Nelson (talk) 03:10, 27 April 2018 (UTC)[reply
    ]
    Why would the existence of Leela Zero not provide substantial insight about AlphaZero? AlphaZero is a private Google project, closed-source and already retired. Unless there's something AlphaZero is doing that Leela isn't, Leela should eventually come to the same strength. Leela is not only open source, and it generates tons of games for people to pore over. There's a reason why Leela attracts so much attention among the chess engine community. I find your argument very weak. If you continue to disagree with this I'll notify WP:Chess for more opinions. Banedon (talk) 04:07, 27 April 2018 (UTC)[reply]
    Please review
    WP:WEIGHT we should defer to the lack of mainstream media coverage and hold off until we see more sources. Rolf H Nelson (talk) 20:56, 28 April 2018 (UTC)[reply
    ]
    When your argument is "I don't personally see that the existence of Leela Zero provides substantial insight about AlphaZero", skipping past everything I wrote, there's nothing to discuss. Notifying WP:Chess. Banedon (talk) 21:31, 28 April 2018 (UTC)[reply]
    Sorry for coming late to this. It appears that LC0 is not by the same programmers as A0, and indeed, the programmers of LC0 relied exclusively on the published paper for their information about A0, which, as they acknowledged, gave them an incomplete picture. LC0 can give us insight about LC0, but not about A0. Bruce leverett (talk) 15:20, 24 May 2018 (UTC)[reply]
    Didn't know we had a Leela Chess Zero article. In that case yeah, I agree this information should be in that article and not this one. Banedon (talk) 04:02, 28 May 2018 (UTC)[reply]

    Huge misunderstanding

    The article says "AlphaZero searches just 80,000 positions per second in chess" Alpha-Zero is made of a tree search and a neural network. The tree search itself is making 80,000 calls per second to the neural network. DeepMind has no idea how many positions per second each neural network call looks at. For example, if on average each NN has an equivalent to looking at a million positions, then Alpha Zero would be looking at 80,000 * 1,000,000 = 80 billion positions per second. The problem is that it's nearly impossible to know what's happening inside the NN, but it's a guarantee that through it's learning period the NN is definitely analyzing the board during each of the 80,000 calls per second. I hope someone can correct this error in the article. — Preceding unsigned comment added by 23.240.1.247 (talk) 23:39, 23 November 2018 (UTC)[reply]

    You sure about this? My understanding is that the neural network takes a position as input, so with 80k calls to the neural network, that's equivalent to 80k positions searched. Banedon (talk) 02:03, 9 December 2018 (UTC)[reply]

    On the new AlphaZero paper

    For the interested: based on what other engine developers told me, they're not particularly impressed by the results because the score (+155 -6 =839) is a mere elo difference of ~50. The current version of Stockfish, SF10, is ~100 elo stronger than the version the AlphaZero team tested against, SF8 (see [1]). This isn't conclusive that SF10 is still stronger than AlphaZero because elo isn't transitive; if engine A beats engine B by 50 elo, and engine C beats engine B by 100 elo, it doesn't mean engine C beats engine A by 50 elo. But it does mean that AlphaZero isn't some new godlike machine that beats all the conventional engines no questions asked. One developer even told me that this AlphaZero is just Leela, which is currently weaker than Stockfish on what is consensus fair hardware. Some also pointed out that AlphaZero does lose some TCEC openings to Stockfish 8 as well.

    I'd write this into the article, but I don't have any reliable source on this (everything I wrote is personal communication). Banedon (talk) 02:03, 9 December 2018 (UTC)[reply]

    I haven't done all of the math, but it is around 50 elo points, or maybe a little more. A difference of 100 elo points means that the higher-rated player should score 0.625. That score is 0.5745 (a little more than half the way from 0.5 to 0.625, but it isn't linear). Bubba73 You talkin' to me? 06:49, 22 January 2019 (UTC)[reply]

    games instead of hours

    You should put the number of games used in training rather than hours. The time is a measure of hardware really, whereas the number of games gives us a measure of how much data is required for the algorithm to get good weights. The amount of games also gives the reader a way to compare what other replication projects (like Leela for chess, AobaZero for shogi) are doing, who dont have the same hardware as DeepMind. – ishwar  (speak) 09:13, 30 July 2019 (UTC)[reply]

    Alphazero's score vs. Stockfish

    @Coastside: I'm using this venue since others are more likely to find it here. The chess.com article [2] says in the third paragraph:

    The updated AlphaZero crushed Stockfish 8 in a new 1,000-game match, scoring +155 -6 =839. (See below for three sample games from this match with analysis by Stockfish 10 and video analysis by GM Robert Hess.)

    The paragraph you are referring to is in turn referring to paragraph 5:

    In additional matches, the new AlphaZero beat the "latest development version" of Stockfish, with virtually identical results as the match vs Stockfish 8, according to DeepMind. The pre-release copy of journal article, which is dated Dec. 7, 2018, does not specify the exact development version used.

    This is the "additional matches", not the 1000-game match. That's natural: 1000-game matches at long time control takes time to run, and Stockfish is updated too frequently to keep using the latest development version.

    If you are unconvinced by this you can also look at the original paper: [3]. They write "The chess match was played against the 2016 TCEC (season 9) world champion Stockfish"; in TCEC season 9 the version of Stockfish playing was version 8. Shortly afterwards they write:

    We played additional matches against the most recent development version of Stockfish (27) and a variant of Stockfish that uses a strong opening book (28). AlphaZero won all matches by a large margin (Fig. 2).

    This is clearly indicated by the figure, where the matches against the latest version is marked "latest Stockfish". Most of the games were against Stockfish 8, not 9, including the 1000-game match you are referring to. Banedon (talk) 03:46, 11 December 2019 (UTC)[reply]

    @Bandeon: Thanks for clarifying this. I misunderstood the update in that article because it referred to "the match" when talking about the update, and I thought that meant the 1000-game match. I appreciate your going back to the sciencemag article to verify. Coastside (talk) 16:49, 11 December 2019 (UTC)[reply]