Graph generation

Design environment

class rostok.graph_generators.environments.design_environment.DesignEnvironment(rule_vocabulary: ~rostok.graph_grammar.rule_vocabulary.RuleVocabulary, reward_calculator: ~rostok.trajectory_optimizer.control_optimizer.GraphRewardCalculator, initial_graph: ~rostok.graph_grammar.node.GraphGrammar = <rostok.graph_grammar.node.GraphGrammar object>, verbosity=0)

data2state(data: GraphGrammar) → int | str

Convert data to state. Convert graph to int of sorted id of nodes.

Args:: data (GraphGrammar): graph to convert
Returns:: STATESTYPE: state of graph

get_available_actions(state: int | str) → ndarray

Get mask of available actions for state.

Args:: state (STATESTYPE): state to get mask of available actions
Returns:: np.ndarray: mask of available actions

get_info_state(state: int | str, verbosity=None) → str

Get info about state. If verbosity is None, use the value of self.verbosity.

Args:: state (STATESTYPE): state to get info verbosity (int, optional): Define verbosity of info and state_info. Maximum is 3. Defaults to None.
Returns:: str: String with information about state

get_nonterminal_actions() → ndarray

Get mask of nonterminal actions. Nonterminal actions are actions that apply nonterminal rules.

Returns:: np.ndarray: mask of nonterminal actions

get_terminal_actions() → ndarray

Get mask of terminal actions. Terminal actions are actions that apply terminal rules.

Returns:: np.ndarray: mask of terminal actions

info(verbosity=None) → str

Get info about environment. If verbosity is None, use the value of self.verbosity.

Args:: verbosity (int, optional): Define verbosity of info and state_info. Maximum is 3. Defaults to None.
Returns:: str: String with information about environment

load_environment(path_to_folder)

Load environment from folder. The method don’t clear the current state of the environment. It add saved data to current state.

Args:: path_to_folder (str): Path to folder with saved environment

next_state(state: int | str, action: int) → tuple[int | str, float, bool, bool]

Get next state by action. If next state is not in state2graph dictionary, apply rule to graph of state and save it in state2graph dictionary. If next state is not in transition_function dictionary, calculate reward and save it in transition_function dictionary.

Args:: state (STATESTYPE): State to get next state action (int): Action to get next state
Returns:: StepType: tuple of next state, reward, is_terminal_state, bool if state is in terminal_state dictionary

possible_next_state(state: int | str, mask_actions=None) → list[int | str]

Return list of possible next states by mask of available actions.

Args:: state (STATESTYPE): state to get possible next states mask_actions (np.ndarray, optional): Mask of desired actions. Defaults to None. None means all available actions.
Returns:: list[STATESTYPE]: List of possible next states

save_environment(prefix, path='./environments/', rewrite=False, use_date=True)

Save environment to folder. If folder does not exist, create it. If folder exist, create new folder with postfix.

Args:: prefix (str): Prefix of folder name path (str, optional): Folder to save environment. Defaults to “./environments/”. rewrite (bool, optional): Rewrite folder if it exist. Defaults to False. use_date (bool, optional): Use date in folder name. Defaults to True.

update_environment(graph: GraphGrammar, action: int, next_graph: GraphGrammar)

Update environment. If next state is not in state2graph dictionary, save it in state2graph dictionary. If next state is not in transition_function dictionary, calculate reward and save it in transition_function dictionary. Save reward and data of state in terminal_states dictionary.

Args:: graph (GraphGrammar): Previous state in the form of a graph action (int): Action that was applied to the previous state next_graph (GraphGrammar): Next state in the form of a graph
Returns:: _type_: reward and bool if state is in terminal_states table

class rostok.graph_generators.environments.design_environment.EnvironmentTerminalReward(initial_state: int | str, actions: ndarray, verbosity=0)

abstract data2state(data) → int | str

Convert data to state.

Args:: data (_type_): data to convert
Returns:: STATESTYPE: state

abstract get_available_actions(state: int | str) → ndarray

Get mask of available actions for state.

Args:: state (STATESTYPE): state to get mask of available actions
Returns:: np.ndarray: mask of available actions

get_best_states(num=1) → list[int | str]

Get best states by reward.

Args:: num (int, optional): Number of best states. Defaults to 1.
Returns:: list[STATESTYPE]: List of best states

get_info_state(state: int | str, verbosity=None) → str

Get info about state.

Args:: state (STATESTYPE): state to get info verbosity (int, optional): Define verbosity of information. If it is None, use the value of self.verbosity. Defaults to None.
Returns:: str: Information about state

get_reward(state: int | str) → tuple[float, bool]

Get reward of state. If state is terminal, return reward and True if state is in terminal_states table, else False. For nonterminal states return 0.0 and False.

Args:: state (STATESTYPE): state to get reward
Returns:: tuple[float, bool]: reward of state and condition of state in terminal_states table

info(verbosity=None) → str

Get info about environment.

Args:: verbosity (int, optional): Define verbosity of information. If it is None, use the value of self.verbosity. Defaults to None.
Returns:: str: Information about environment

is_terminal_state(state: int | str) → tuple[bool, bool]

Check if state is terminal. If state is terminal, return True and True if state is in terminal_states table, else False.

Args:: state (STATESTYPE): state to check
Returns:: tuple[bool, bool]: condition of terminal state and condition of state in terminal_states table

abstract next_state(state: int | str, action: int) → tuple[int | str, float, bool, bool]

Get next state by action.

Args:: state (STATESTYPE): state to get next state action (int): action to get next state
Returns:: StepType: tuple of next state, reward, is_terminal_state, bool if state is in terminal_states table

class rostok.graph_generators.environments.design_environment.StringDesignEnvironment(rule_vocabulary: ~rostok.graph_grammar.rule_vocabulary.RuleVocabulary, reward_calculator: ~rostok.trajectory_optimizer.control_optimizer.GraphRewardCalculator, initial_graph: ~rostok.graph_grammar.node.GraphGrammar = <rostok.graph_grammar.node.GraphGrammar object>, verbosity=0)

data2state(data: GraphGrammar) → int | str

Convert data to state. Convert graph to string of sorted id of nodes.

Args:: data (GraphGrammar): graph to convert
Returns:: STATESTYPE: state of graph

class rostok.graph_generators.environments.design_environment.SubDesignEnvironment(rule_vocabulary: ~rostok.graph_grammar.rule_vocabulary.RuleVocabulary, reward_calculator: ~rostok.trajectory_optimizer.control_optimizer.GraphRewardCalculator, max_number_nonterminal_rules, initial_graph: ~rostok.graph_grammar.node.GraphGrammar = <rostok.graph_grammar.node.GraphGrammar object>, verbosity=0)

get_available_actions(state: int | str) → ndarray

Get mask of available actions for state. If number of nonterminal rules of state is greater than max_number_nonterminal_rules, return mask of terminal actions.

Args:: state (STATESTYPE): state to get mask of available actions
Returns:: np.ndarray: mask of available actions

get_info_state(state: int | str, verbosity=None) → str

Get info about state. If verbosity is None, use the value of self.verbosity.

Args:: state (STATESTYPE): state to get info verbosity (int, optional): Define verbosity of info and state_info. Maximum is 3. Defaults to None.
Returns:: str: String with information about state

info(verbosity=None) → str

Get info about environment. If verbosity is None, use the value of self.verbosity.

Args:: verbosity (int, optional): Define verbosity of info and state_info. Maximum is 3. Defaults to None.
Returns:: str: String with information about environment

load_environment(path_to_folder)

Load environment from folder. The method don’t clear the current state of the environment. It add saved data to current state.

Args:: path_to_folder (str): Path to folder with saved environment

save_environment(prefix, path='./environments/', rewrite=False, use_date=True)

Save environment to folder. If folder does not exist, create it. If folder exist, create new folder with postfix.

Args:: prefix (str): Prefix of folder name path (str, optional): Folder to save environment. Defaults to “./environments/”. rewrite (bool, optional): Rewrite folder if it exist. Defaults to False. use_date (bool, optional): Use date in folder name. Defaults to True.

update_environment(graph: GraphGrammar, action: int, next_graph: GraphGrammar)

Update environment. If next state is not in state2graph dictionary, save it in state2graph dictionary.

Args:: graph (GraphGrammar): Previous state in the form of a graph action (int): Action that was applied to the previous state next_graph (GraphGrammar): Next state in the form of a graph
Returns:: tuple[float, bool]: reward and bool if state is in terminal_states table

class rostok.graph_generators.environments.design_environment.SubStringDesignEnvironment(rule_vocabulary: ~rostok.graph_grammar.rule_vocabulary.RuleVocabulary, reward_calculator: ~rostok.trajectory_optimizer.control_optimizer.GraphRewardCalculator, max_number_nonterminal_rules, initial_graph: ~rostok.graph_grammar.node.GraphGrammar = <rostok.graph_grammar.node.GraphGrammar object>, verbosity=0)

data2state(data: GraphGrammar) → int | str

Convert data to state. Convert graph to string of sorted id of nodes.

Args:: data (GraphGrammar): graph to convert
Returns:: STATESTYPE: state of graph

rostok.graph_generators.environments.design_environment.prepare_state_for_optimal_simulation(state: int | str, env: DesignEnvironment) → tuple

Prepare state for simulation. Convert state to data and graph.

Args:: state (STATESTYPE): state to prepare env (DesignEnvironment): environment
Returns:: tuple: data and graph of state

MCTS

class rostok.graph_generators.search_algorithms.mcts.MCTS(environment: DesignEnvironment, c=1.4)

default_policy(state, num_actions=0)

Default policy for unkown states. We use random actions until we reach terminal state. If num_actions = 0, then we explore all actions. Otherwise we explore random num_actions actions.

Args:: state: Root state for which we want to get default policy to terminal state. num_actions (int, optional): Number of actions which be explored. Defaults to 0.
Returns:: float: Return mean reward on actions.

get_data_state(state: int | str)

Get data for state. Data is a dictionary with keys: “Qa” - Q function for each action, “pi” - policy for state, “V” - value of state, “N” - number of visits of state, “Na” - number of visits of each action.

Args:
state (STATESTYPE): State for which we want to get data.

Returns:
dict: Dictionary with data for state.

get_policy_by_N(state: int | str, weighted=False)

Get policy for state. Policy is a probability distribution over actions. Probability of action is proportional to number of visits of this action.

Args:
state (STATESTYPE): State for which we want to get policy.

Returns:
pi (np.ndarray): Policy for state.

get_policy_by_Q(state: int | str)

Get policy for state. Policy is a probability distribution over actions. Probability of action is proportional to Q function of this action.

Args:
state (STATESTYPE): State for which we want to get policy.

Returns:
pi (np.ndarray): Policy for state.

load(path)

Load MCTS data from path.

Args:: path (str): Path to folder where we want to load MCTS data.

save(prefix, path='./LearnedMCTS/', rewrite=False, use_date=True)

Save MCTS data in path. If path does not exist, then create it.

Args:: prefix (str): Prefix for folder name. path (str, optional): Path to folder where we want to save MCTS data. Defaults to “./LearnedMCTS/”. rewrite (bool, optional): If True, then rewrite data in path. Defaults to False. use_date (bool, optional): If True, then add date to folder name. Defaults to True.
Returns:: str: Path to folder where we save MCTS data.

search(state: int | str, num_actions=0)

Search for best action for state. The method use recursive tree search. If state is not in tree, then we use default policy for this state. If state is in tree, then we use tree policy for this state. If state is terminal, then we return reward of this state.

Args:
state (STATESTYPE): State for which we want explore tree of actions.

Returns:
float: Value reward of state.

tree_policy(state)

Tree policy for known states. We use UCT formula for choosing best action.

Args:
state: State for which we want to get best action.

Returns:: best_action: Best action for state.

uct_score(state)

UCT formula for choosing best action.

Args:: state: State for which we want to get UCT score.
Returns:: float: uct score for each action.

update_Q_function(state, action, reward)

Update Q function for pair (state, action).

Args:: state: State for which we want update Q function. action: Action for which we want update Q function. reward: Reward for pair (state, action) based on Monte Carlo estimation.

MCTS Manager

class rostok.graph_generators.mcts_manager.MCTSManager(mcts_algorithm: MCTS, folder_name: str, verbosity=1, use_date: bool = True)

plot_test_mcts(save=False, name='test_mcts.svg')

Plot the mean and std of the reward for the test of the MCTS algorithm.

Args:: save (bool, optional): If True, the plot will be saved. Defaults to False. name (str, optional): The name of file. Defaults to “test_mcts.svg”.

plot_v_trajectory(trajectory, save=False, name='v_trajectory.svg')

Plot the V-function and Q-function for the trajectory.

Args:: trajectory (list, np.ndarray): The trajectory of the states and actions. save (bool, optional): If True, the plot will be saved. Defaults to False. name (str, optional): The name of file. Defaults to “v_trajectory.svg”.

run_search(max_iteration_mcts: int, max_simulation: int, iteration_checkpoint: int = 0, num_test: int = 0, state: int | str | None = None)

Run the MCTS algorithm for a given number of iterations. Max simulation is the number of simulations in one iteration. The search stores the trajectory of the states and actions.

Args:: max_iteration_mcts (int): max number of iterations of the MCTS algorithm. max_simulation (int): max number of simulations in one iteration. iteration_checkpoint (int, optional): The number of iterations after which the checkpoint will be saved. Defaults to 0. num_test (int, optional): The number of tests after which the mean and std of the reward will be calculated. Defaults to 0. state (STATESTYPE): The root state for search algorithm. Defaults to None. If None, the initial state of the environment will be used.

save_checkpoint(iteration: int, state, time_search)

Save the checkpoint of the MCTS algorithm. The checkpoint contains the state of the MCTS algorithm and the state of the environment. Write the log of the search to the file.

Args:: iteration (int): The number of iterations of the MCTS algorithm. state: The state of the environment. time_search: The time of the search.

save_information_about_search(hyperparameters, grasp_object: EnvironmentBodyBlueprint | list[EnvironmentBodyBlueprint])

Save the information about the search to the file.

Args:: hyperparameters: The hyperparameters of the MCTS algorithm. grasp_object (EnvironmentBodyBlueprint): The object to grasp.

save_results(save_plot=True): Save the trajectories of the states and actions to the file.

test_mcts(num_test)

Test the MCTS algorithm. The test is to run the MCTS algorithm for a given number of iterations and calculate the mean and std of the reward.

Args:: num_test (int): The number of tests.
Returns:: tuple[float, float]: The mean and std of the reward.

rostok.graph_generators.mcts_manager.load_last_state(path_checkpoint: str)

Load the last state of the MCTS algorithm from the checkpoint.

Args:: path_checkpoint (str): The path to the checkpoint.
Returns:: STATETYPE: The last state of the MCTS algorithm.