For a long time, music generation has been modeled as a “sequence prediction” problem. As in estimating the stock price, the task of music generation is to predict a music sequence based on some contexts, say, to predict the melody given the underlying chords, or to predict the upcoming notes based on existing ones. However, we observed an intrinsic defect of such generation via prediction approach – in any given contexts, there are many ways/directions to develop the music, so which one shall the model follow?