To content

NeurIPS 2022 Deep RL workshop: Three papers accepted!

Cyclophobic Reinforcement Learning

Authors: Stefan Sylvius Wagner Martinez, Peter Arndt, Jan Robine, Stefan Harmeling

Abstract: In environments with sparse rewards finding a good inductive bias for exploration is crucial to the agent’s success. However, there are two competing goals: novelty search and systematic exploration. While existing approaches such as curiousity- driven exploration find novelty, they sometimes do not systematically explore the whole state space, akin to depth-first-search vs breadth-first-search. In this paper, we propose a new intrinsic reward that is cyclophobic, i.e. it does not reward novelty, but punishes redundancy by avoiding cycles. Augmenting the cyclophobic intrinsic reward with a sequence of hierarchical representations based on the agent’s cropped observations we are able to achieve excellent results in the MiniGrid and MiniHack environments. Both are particularly hard, as they require complex interactions with different objects in order to be solved. Detailed comparisons with previous approaches and thorough ablation studies show that our newly proposed cyclophobic reinforcement learning is vastly more efficient than other state of the art methods.

 

Time-Myopic Go-Explore: Learning A State Representation for the Go-Explore Paradigm 

Authors: Marc Hörftmann, Jan Robine, Stefan Harmeling

Abstract: Very large state spaces with a sparse reward signal are difficult to explore. The lack of a sophisticated guidance results in a poor performance for numerous reinforcement learning algorithms. In these cases, the commonly used random exploration is often not helpful. The literature shows that this kind of environments require enormous efforts to systematically explore large chunks of the state space. Learned state representations can help here to improve the search by providing semantic context and build a structure on top of the raw observations. In this work we introduce a novel time-myopic state representation that clusters temporal close states together while providing a time prediction capability between them. By adapting this model to the Go-Explore paradigm (Ecoffet et al., 2021b), we demonstrate the first learned state representation that reliably estimates novelty instead of using the hand-crafted representation heuristic. Our method shows an improved solution for the detachment problem which still remains an issue at the Go-Explore Exploration Phase. We provide evidence that our proposed method covers the entire state space with respect to all possible time trajectories — without causing disadvantageous conflict-overlaps in the cell archive. Analogous to native Go-Explore, our approach is evaluated on the hard exploration environments MontezumaRevenge, Gravitar and Frostbite (Atari) in order to validate its capabilities on difficult tasks. Our experiments show that time-myopic Go-Explore is an effective alternative for the domain-engineered heuristic while also being more general. The source code of the method is available on GitHub.

 

Transformer-based World Models Are Happy With 100k Interactions 

Authors: Jan Robine, Marc Höftmann, Tobias Uelwer, Stefan Harmeling

Abstract: Deep neural networks have been successful in many reinforcement learning settings. However, compared to human learners they are overly data hungry. To build a sample-efficient world model, we apply a transformer to real-world episodes in an autoregressive manner: not only the compact latent states and the taken actions but also the experienced or predicted rewards are fed into the transformer, so that it can attend flexibly to all three modalities at different time steps. The transformer allows our world model to access previous states directly, instead of viewing them through a compressed recurrent state. By utilizing the Transformer-XL architecture, it is able to learn long-term dependencies while staying computationally efficient. Our transformer-based world model (TWM) generates meaningful, new experience, which is used to train a policy that outperforms previous model-free and model-based reinforcement learning algorithms on the Atari 100k benchmark.

 

Location & approach

The campus of TU Dort­mund University is located close to interstate junction Dort­mund West, where the Sauerlandlinie A 45 (Frankfurt-Dort­mund) crosses the Ruhrschnellweg B 1 / A 40. The best interstate exit to take from A 45 is “Dort­mund-Eichlinghofen” (closer to South Campus), and from B 1 / A 40 “Dort­mund-Dorstfeld” (closer to North Campus). Signs for the uni­ver­si­ty are located at both exits. Also, there is a new exit before you pass over the B 1-bridge leading into Dort­mund.

To get from North Campus to South Campus by car, there is the connection via Vogelpothsweg/Baroper Straße. We recommend you leave your car on one of the parking lots at North Campus and use the H-Bahn (suspended monorail system), which conveniently connects the two campuses.

TU Dort­mund University has its own train station (“Dort­mund Uni­ver­si­tät”). From there, suburban trains (S-Bahn) leave for Dort­mund main station (“Dort­mund Hauptbahnhof”) and Düsseldorf main station via the “Düsseldorf Airport Train Station” (take S-Bahn number 1, which leaves every 15 or 30 minutes). The uni­ver­si­ty is easily reached from Bochum, Essen, Mülheim an der Ruhr and Duisburg.

You can also take the bus or subway train from Dort­mund city to the uni­ver­si­ty: From Dort­mund main station, you can take any train bound for the Station “Stadtgarten”, usually lines U41, U45, U 47 and U49. At “Stadtgarten” you switch trains and get on line U42 towards “Hombruch”. Look out for the Station “An der Palmweide”. From the bus stop just across the road, busses bound for TU Dort­mund University leave every ten minutes (445, 447 and 462). Another option is to take the subway routes U41, U45, U47 and U49 from Dort­mund main station to the stop “Dort­mund Kampstraße”. From there, take U43 or U44 to the stop “Dort­mund Wittener Straße”. Switch to bus line 447 and get off at “Dort­mund Uni­ver­si­tät S”.

The AirportExpress is a fast and convenient means of transport from Dortmund Airport (DTM) to Dortmund Central Station, taking you there in little more than 20 minutes. From Dortmund Central Station, you can continue to the university campus by interurban railway (S-Bahn). A larger range of international flight connections is offered at Düsseldorf Airport (DUS), which is about 60 kilometres away and can be directly reached by S-Bahn from the university station.

The H-Bahn is one of the hallmarks of TU Dort­mund University. There are two stations on North Campus. One (“Dort­mund Uni­ver­si­tät S”) is directly located at the suburban train stop, which connects the uni­ver­si­ty directly with the city of Dort­mund and the rest of the Ruhr Area. Also from this station, there are connections to the “Technologiepark” and (via South Campus) Eichlinghofen. The other station is located at the dining hall at North Campus and offers a direct connection to South Campus every five minutes.

The facilities of TU Dortmund University are spread over two campuses, the larger Campus North and the smaller Campus South. Additionally, some areas of the university are located in the adjacent “Technologiepark”.

Site Map of TU Dortmund University (Second Page in English).