grannules package#
Subpackages#
Submodules#
grannules.neural_net module#
- class grannules.neural_net.NNPredictor(model_name: str | None = None, train_data: DataFrame | None = None, test_data: DataFrame | None = None, features: list[str] = ['M', 'R', 'Teff', 'FeH', 'KepMag', 'phase'], e_features: list[str] | None = None, targets: list[str] = ['H', 'P', 'tau', 'alpha'], e_targets: list[str] | None = None, X_transformer=None, y_transformer=None, state: TrainState = None, trial: Trial = None, params: dict = None, random_state: int = None)[source]#
Bases:
objectInitializes a
NNPredictor.Important
The user shouldn’t call this directly. See
NNPredictor.get_default_predictor(),NNPredictor.train_new(),NNPredictor.from_study(), orNNPredictor.deserialize().- Parameters:
train_data (pandas.DataFrame, optional) – The training data to train the model on in _train_net. Defaults to
None.test_data (pandas.DataFrame, optional) – The testing data to train the model on in _train_net. Defaults to
None.features (list[str], optional) – The features to train the model on. Defaults to
NNPredictor.DEFAULT_FEATURES.e_features (list[str], optional) – The uncertainties of the features. Defaults to
featureswith an'e_'prefix on each element.targets (list[str], optional) – The targets to train the model on. Defaults to
NNPredictor.DEFAULT_TARGETS.e_targets (list[str], optional) – The uncertainties of the targets. Defaults to
targetswith an'e_'prefix on each element.X_transformer (object, optional) – The transformer to use for the features. Calls
fit_transformon the training data, andtransformwhen usingpredictor.predict(). Defaults toneural_net.DefaultXTransformer.y_transformer (object, optional) – The transformer to use for the targets. Calls
fit_transformon the training data,transformwhen training, andinverse_transformwhen predicting. Defaults toneural_net.DefaultyTransformer.nn_state (train_state.TrainState, optional) – The state of the neural network. Defaults to
None.random_state (int, optional) – The random state to use for training the net. Defaults to
None.
- DEFAULT_FEATURES = ['M', 'R', 'Teff', 'FeH', 'KepMag', 'phase']#
- DEFAULT_TARGETS = ['H', 'P', 'tau', 'alpha']#
- classmethod deserialize(path: str | Path = None)[source]#
Deserialize a neural network from a directory.
This method reads in the same format as
NNPredictor.serialize()- Parameters:
path (str or Path) –
Path to the directory containing the serialized model files. The directory should include:
params.json: JSON file with model parameters.
state.pkl: Pickle file with the model’s state dictionary.
transform.npy: Numpy file with transformation parameters for input and output scaling.
- Returns:
An instance of NNPredictor initialized with the deserialized model, state, and transformers.
- Return type:
- classmethod from_study(study_or_path: Study | str, data: DataFrame | None = None, train_data: DataFrame | None = None, test_data: DataFrame | None = None, study_name: str | None = None, random_state: int | None = None, **kwargs) NNPredictor[source]#
Creates an NNPredictor from the best trial in an Optuna study.
- Parameters:
study_or_path – Union[optuna.study.Study, str] An Optuna study or a path to an Optuna database. If a path is provided, the study is loaded from the database, and study_name must be specified.
data – Optional[pandas.DataFrame] The complete dataset to train the model. It will be split using neural_net.split_data(). Required if train_data and test_data are not provided.
train_data – Optional[pandas.DataFrame] The training dataset. Required if data is not provided.
test_data – Optional[pandas.DataFrame] The testing dataset. Required if data is not provided.
study_name – Optional[str] The name of the study to load from the database. This is required if study_or_path is a path. Ignored if study_or_path is an Optuna study.
random_state – Optional[int] The random state to use for splitting the data and training the neural network.
kwargs – dict
- Returns:
NNPredictor An instance of NNPredictor initialized with the best trial from the study.
- classmethod get_default_predictor(*args, **kwargs)[source]#
Loads a pre-trained NNPredictor singleton.
- Returns:
A pre-trained NNPredictor
- Return type:
- predict(X: DataFrame, to_df=False) ndarray[source]#
Predicts the parameters \(H,\, P,\, \tau,\) and \(\alpha\) for red giant stars using a pre-trained neural network.
- Parameters:
X (pandas.DataFrame) –
A pandas DataFrame with columns ‘M’, ‘R’, ‘Teff’, ‘FeH’, ‘KepMag’, and ‘phase’.
’M’: Mass of the star in solar masses.
’R’: Radius of the star in solar radii.
’Teff’: Effective temperature of the star in Kelvin.
’FeH’: Metallicity of the star.
’KepMag’: Apparent magnitude of the star in the Kepler band.
’phase’: Phase of the star.
to_df (bool) – If True, returns the predictions as a pandas DataFrame. Otherwise, returns a NumPy array.
- Returns:
Predicted values for \(H,\, P,\, \tau,\,\) and \(\alpha\). If to_df is True, the result is a pandas DataFrame with columns [‘H’, ‘P’, ‘tau’, ‘alpha’]. Otherwise, it is a NumPy array.
- Return type:
pandas.DataFrame or numpy.ndarray
- serialize(path: str | Path = None, overwrite: bool = False)[source]#
Serialize the neural network model, its parameters, and data transformations to the specified directory.
This method saves the model’s parameters, state, and data transformation details into a directory for later use. If the directory already exists, it can optionally overwrite it.
- Parameters:
path (str | pathlib.Path, optional) – The directory path where the model will be serialized. Defaults to the current working directory with the name “grannules-predictor”.
overwrite (bool) – Whether to overwrite the directory if it already exists. Defaults to False.
- Raises:
RuntimeError – If attempting to overwrite the current working directory, root, or home.
FileExistsError – If the directory already exists and overwrite is set to False.
- classmethod train_new(study_or_path: Study | str, data: DataFrame | None = None, train_data: DataFrame | None = None, test_data: DataFrame | None = None, study_name: str | None = None, random_state: int | None = None, load_study_if_exists: bool = True, pruner: BasePruner = <optuna.pruners._nop.NopPruner object>, study_kwargs: dict = {}, n_trials: int = 100, optuna_kwargs: dict = {}, **kwargs) tuple[NNPredictor, Study][source]#
Train a new neural network model using Optuna for hyperparameter optimization. This method allows training a neural network model by either creating a new Optuna study or using an existing one. It supports splitting data into training and testing sets, and optimizing the model’s hyperparameters through Optuna’s study framework.
- Parameters:
study_or_path (optuna.study.Study | str) – Either an Optuna study object or a string path to the study’s storage.
data (pandas.DataFrame | None) – The complete dataset to be split into training and testing sets. If provided, train_data and test_data will be ignored.
train_data (pandas.DataFrame | None) – Pre-split training data. Used if data is not provided.
test_data (pandas.DataFrame | None) – Pre-split testing data. Used if data is not provided.
study_name (str | None) – Name of the Optuna study. Required if creating a new study.
random_state (int | None) – Random seed for reproducibility in data splitting and training.
load_study_if_exists (bool) – Whether to load an existing study if it already exists.
pruner (optuna.pruners.BasePruner) – Optuna pruner to use for early stopping during optimization.
study_kwargs (dict) – Additional keyword arguments for creating the Optuna study.
n_trials (int) – Number of trials to run for hyperparameter optimization.
optuna_kwargs (dict) – Additional keyword arguments for the study.optimize method.
kwargs (dict) – Additional keyword arguments for the neural network predictor initialization.
- Returns:
A tuple containing the trained neural network predictor and the Optuna study.
- Return type:
tuple[NNPredictor, optuna.study.Study]
- grannules.neural_net.predict(X: DataFrame, to_df: bool = False, *args, **kwargs) ndarray | DataFrame[source]#
Predicts the parameters \(H,\, P,\, \tau,\) and \(\alpha\) for red giant stars using a pre-trained neural network.
- Parameters:
X (pandas.DataFrame) – A pandas DataFrame with columns ‘M’, ‘R’, ‘Teff’, ‘FeH’, ‘KepMag’, and ‘phase’. - ‘M’: Mass of the star in solar masses. - ‘R’: Radius of the star in solar radii. - ‘Teff’: Effective temperature of the star in Kelvin. - ‘FeH’: Metallicity of the star. - ‘KepMag’: Apparent magnitude of the star in the Kepler band. - ‘phase’: Phase of the star.
to_df (bool) – If True, returns the predictions as a pandas DataFrame. Otherwise, returns a NumPy array.
- Returns:
Predicted values for \(H,\, P,\, \tau,\,\) and \(\alpha\). If to_df is True, the result is a pandas DataFrame with columns [‘H’, ‘P’, ‘tau’, ‘alpha’]. Otherwise, it is a NumPy array.
- Return type:
numpy.ndarray | pandas.DataFrame