As of cryoDRGN v3.3.3, here are the places where the old Starfile class was used:

grep -rn 'Starfile' cryodrgn/* --exclude="****/__**pycache**__/****":

cryodrgn/commands/parse_pose_star.py:55:    stardata = starfile.Starfile.load(args.input)
cryodrgn/commands/parse_ctf_star.py:53:    s = starfile.Starfile.load(args.star)
cryodrgn/commands_utils/filter_star.py:42:    s = starfile.Starfile.load(args.input)
cryodrgn/commands_utils/filter_star.py:60:            new_optics = starfile.Starfile(headers=None, df=new_optics_df)
cryodrgn/commands_utils/filter_star.py:87:                    micro_optics = starfile.Starfile(headers=None, df=micro_optics_df)
cryodrgn/commands_utils/filter_star.py:93:            micro_star = starfile.Starfile(
cryodrgn/commands_utils/filter_star.py:104:        s = starfile.Starfile(
cryodrgn/commands_utils/write_star.py:16:from cryodrgn.source import ImageSource, StarfileSource
cryodrgn/commands_utils/write_star.py:17:from cryodrgn.starfile import Starfile
cryodrgn/commands_utils/write_star.py:120:        assert isinstance(particles, StarfileSource)
cryodrgn/commands_utils/write_star.py:169:        optics = Starfile(df=optics_groups, relion31=False, headers=None)
cryodrgn/commands_utils/write_star.py:179:    s = Starfile(headers=None, df=df, relion31=not args.relion30, data_optics=optics)
cryodrgn/dataset.py:138:        s = starfile.Starfile.load(tiltstar)
cryodrgn/dataset.py:201:        s = starfile.Starfile.load(tiltstar)
cryodrgn/source.py:58:            return StarfileSource(
cryodrgn/source.py:373:class StarfileSource(_MRCDataFrameSource):
cryodrgn/source.py:382:        from cryodrgn.starfile import Starfile
cryodrgn/source.py:384:        df = Starfile.load(filepath).df
cryodrgn/starfile.py:11:class Starfile:

We can categorize these cases by how the data was loaded, and also list the Starfile attributes that were used after loading along with some general notes:

  1. Either created from file using Starfile.load(filename):

    commands.parse_pose_star

    stardata.df, stardata.data_optics, stardata.relion31, stardata.headers

    This command extracts information from the primary data table and the data optics table if present to get the poses.

    commands.parse_ctf_star

    s.data_optics, s.relion31, s.df

    As above, we extract CTF information using pandas operations on the data tables.

    commands_utils.filter_star

    .write

    Again, basic pandas operations on the data tables, but now we also write a star file (s.write(args.o))

    cryodrgn.dataset

    .df

    Just uses the primary data table when loading data for a tilt series.

  2. or created directly from df and data_optics DataFrames: starfile.Starfile(headers=None, df=df, data_optics=data_optics):

    filter_star

    .data_optics, .df

    When splitting up output using --micrograph-files, we use Starfile to represent the new chunks.

    write_star

    .write

    We use Starfile in order to be able to access the Starfile.write() method

The refactor

We hence reorganize the class structures in cryodrgn.starfile and cryodrgn.source to make the code across these use cases cleaner:

Thus in the new v3.4.0 code we can still support more complicated logic for parsing RELION3.1 files using parse_star/write_star, while allowing users much more flexibility in only using the classes Starfile and StarfileSource when requiring more advanced attributes and more advanced batch loading methods for training respectively.

Examples

In filter_star, we can replace the previous use of Starfile with parse_starstardf, data_optics and write_star(stardf, data_optics), since all we are doing is performing simple filtering operations on these two DataFrames, which before were being called s.df and s.data_optics.

Likewise in write_star, where all we need at the very end is to call write_star() where necessary, instead of going through the trouble of creating a Starfile object.

parse_pose_star and parse_ctf_star were also just using .data_optics and .df but we would like to have the per-sample A/px and resolution values to support RELION3.1 features. For this we can now use the newly-created methods in Starfile: apix() and resolution() .

Finally, we would like to have access to Starfile methods in StarfileSource; this may come in handy when designing methods to deal with tilt data in dataset that can now use self.src instead of having to create a new Starfile object. As mentioned above, we redesigned StarfileSource to inherit from the Stardata class and thus to be initialized with an .data_optics attribute and methods such as .apix(). We use .df to access the .data Stardata attribute in this class, which will also be further parsed according to StarfileSource, MRCDataFrameSource, and ImageSource instantiation routines.

Summary

There are now three different ways to load and operate on data stored in .star files, each of which incorporate the code used in the ways preceding it:

  1. parse_star() and write_star() for basic access to the data tables
  2. Starfile to do non-trivial operations on these data tables, such as getting sample-wise optics values
  3. StarfileSource for getting the images themselves