Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(data preprocess): remove the cut off options from info.json #200

Merged
merged 14 commits into from
Aug 13, 2024

Conversation

QG-phy
Copy link
Collaborator

@QG-phy QG-phy commented Aug 1, 2024

refactor(data preprocess): remove the cut off options from info.json and collect the values from input.json. when run model no need to supply the atomicdata options.

Fix: #155

QG-phy added 7 commits August 1, 2024 11:25
Previous the ase data will be transferred into text file and then loaded by the _TrajData. now i refactor the function.

both text and ase data are treated equally. will works as a class funtion to initial the _TrajData class.
…g96.

For powerlaw and varTang96, the rs is not exactly the hard cutoff. so when extract the r_max for data. we have to use rs + 5 * w; but for other method just use rs.
@QG-phy QG-phy marked this pull request as draft August 5, 2024 07:27
QG-phy added 3 commits August 5, 2024 16:02
…s instance and add from_model class function.

note, compared to the previous build_dataset, this one is more flexible.
previous build_dataset is a function. now i define a class DataBuilder and re-defined __call__ function.  then build_dataset is an instance of DataBuilder class. so i can use build_dataset.from_model() to build dataset from model. at the same time the previous way to use  build_dataset is still available. like build_dataset(...).
@QG-phy QG-phy marked this pull request as ready for review August 5, 2024 10:02
@@ -500,7 +500,7 @@ def from_points(
def from_ase(
cls,
atoms,
r_max,
r_max: Union[float, int, dict],
er_max: Optional[float] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这边er_max和oer_max不要同步改下嘛?

dptb/data/build.py Show resolved Hide resolved
# same cell size, then copy it to all frames.
cell = np.expand_dims(cell, axis=0)
data["cell"] = np.broadcast_to(cell, (info["nframes"], 3, 3))
elif cell.shape[0] == info["nframes"] * 3:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nframes现在是保留在info里的?

pos = np.loadtxt(os.path.join(root, "positions.dat"))
if len(pos.shape) == 1:
pos = pos.reshape(1,3)
natoms = info["natoms"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok看起来nframes和natoms的逻辑是没动的

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nframes natoms 如果想去掉。就必须修改文本数据的存储格式。比如一帧结构存为一行这样。不然没办法从数据中提取这个信息。因此我没办法去掉。

@@ -170,7 +170,7 @@ def __init__(self, model:torch.nn.Module, results_path: str=None, use_gui: bool=
self.results_path = results_path
self.use_gui = use_gui

def get_bands(self, data: Union[AtomicData, ase.Atoms, str], kpath_kwargs: dict, AtomicData_options: dict={}):
def get_bands(self, data: Union[AtomicData, ase.Atoms, str], kpath_kwargs: dict, pbc:Union[bool,list]=None, AtomicData_options:dict=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为啥单加pbc,以及,前面数据部分AtomicData_options 被info取代掉了,这里为啥保留?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

单加pbc 是因为这信息不一定能从给的结构文件中提取到。需要支持外部的指定。 这里支持AtomicData_options 是为了兼容以前的一些存档。这个后续使用是可以不提供。但是对于一些旧存档,就必须加上,不然存档不能用。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

而数据部分取消,是因为以后做训练任务,我们就不需要这个了。以后新训练下来的模型,后处理算能带的时候,也可以不提供这个。这个参数现在是 optional的。
这一切都是为了软件的兼容性,所不得不做的设置。

@floatingCatty floatingCatty merged commit caa903d into deepmodeling:main Aug 13, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow run model without providing AtomicData_options
2 participants