As of cryoDRGN v3.3.3, here are the places where the old Starfile class was used:
grep -rn 'Starfile' cryodrgn/* --exclude="****/__**pycache**__/****":
cryodrgn/commands/parse_pose_star.py:55: stardata = starfile.Starfile.load(args.input)
cryodrgn/commands/parse_ctf_star.py:53: s = starfile.Starfile.load(args.star)
cryodrgn/commands_utils/filter_star.py:42: s = starfile.Starfile.load(args.input)
cryodrgn/commands_utils/filter_star.py:60: new_optics = starfile.Starfile(headers=None, df=new_optics_df)
cryodrgn/commands_utils/filter_star.py:87: micro_optics = starfile.Starfile(headers=None, df=micro_optics_df)
cryodrgn/commands_utils/filter_star.py:93: micro_star = starfile.Starfile(
cryodrgn/commands_utils/filter_star.py:104: s = starfile.Starfile(
cryodrgn/commands_utils/write_star.py:16:from cryodrgn.source import ImageSource, StarfileSource
cryodrgn/commands_utils/write_star.py:17:from cryodrgn.starfile import Starfile
cryodrgn/commands_utils/write_star.py:120: assert isinstance(particles, StarfileSource)
cryodrgn/commands_utils/write_star.py:169: optics = Starfile(df=optics_groups, relion31=False, headers=None)
cryodrgn/commands_utils/write_star.py:179: s = Starfile(headers=None, df=df, relion31=not args.relion30, data_optics=optics)
cryodrgn/dataset.py:138: s = starfile.Starfile.load(tiltstar)
cryodrgn/dataset.py:201: s = starfile.Starfile.load(tiltstar)
cryodrgn/source.py:58: return StarfileSource(
cryodrgn/source.py:373:class StarfileSource(_MRCDataFrameSource):
cryodrgn/source.py:382: from cryodrgn.starfile import Starfile
cryodrgn/source.py:384: df = Starfile.load(filepath).df
cryodrgn/starfile.py:11:class Starfile:
We can categorize these cases by how the data was loaded, and also list the Starfile attributes that were used after loading along with some general notes:
Either created from file using Starfile.load(filename):
commands.parse_pose_star
stardata.df, stardata.data_optics, stardata.relion31, stardata.headers
This command extracts information from the primary data table and the data optics table if present to get the poses.
commands.parse_ctf_star
s.data_optics, s.relion31, s.df
As above, we extract CTF information using pandas operations on the data tables.
commands_utils.filter_star
.write
Again, basic pandas operations on the data tables, but now we also write a star file (s.write(args.o))
cryodrgn.dataset
.df
Just uses the primary data table when loading data for a tilt series.
or created directly from df and data_optics DataFrames: starfile.Starfile(headers=None, df=df, data_optics=data_optics):
filter_star
.data_optics, .df
When splitting up output using --micrograph-files, we use Starfile to represent the new chunks.
write_star
.write
We use Starfile in order to be able to access the Starfile.write() method
We hence reorganize the class structures in cryodrgn.starfile and cryodrgn.source to make the code across these use cases cleaner:
_parse_block and write that were quasi-static methods in Starfile get turned back into stand-alone methods acting on objects such as starfile, data, data_optics, etc. that handle both RELION3.0 and RELION3.1 logic. The two key functions now are:
parse_star(*starfile*) now handles the logic of _parse_relion31, load, and _parse_block, returning data and data_optics which is None if RELION3.1 format has been detected, and now has the new feature of handling data blocks in any orderwrite_star(*starfile, data, data_optics*) now handles the logic of write, using _write_star_block as necessaryStarfile is now used when something more than just access to data and data_optics is required, such as per-sample Apix and resolution values for parse_ctf_star when using .star files with multiple optics groups
Starfile.apix for how access to a particular attribute is handled, and Starfile.optics_values() how to get any attribute on a per-sample basis, making it far easier to support RELION3.1 featuresStarfile(starfile) — more intuitive than Starfile.load(starfile)Starfile(data=data, data_optics=data_optics) — we must use keyword arguments for everything other than starfile to make it easier to differentiate between these two signatures; see the use of a bare * in the signature of Starfile.__init__()Starfile.load(starfile) for backwards compatibilitywrite, which in turn employ write_star.relion31 (self.data_optics is not None) for convenienceImageSource class hierarchy, part of which was updating StarfileSource to be a child class of Starfile as well as MRCDataFrameSource — thus for instance, we use Starfile.*__init__*(*self*, data=sdata, data_optics=data_optics) in the place of df = Starfile.load(filepath).df in StarfileSource.__init__().Thus in the new v3.4.0 code we can still support more complicated logic for parsing RELION3.1 files using parse_star/write_star, while allowing users much more flexibility in only using the classes Starfile and StarfileSource when requiring more advanced attributes and more advanced batch loading methods for training respectively.
In filter_star, we can replace the previous use of Starfile with parse_star → stardf, data_optics and write_star(stardf, data_optics), since all we are doing is performing simple filtering operations on these two DataFrames, which before were being called s.df and s.data_optics.
Likewise in write_star, where all we need at the very end is to call write_star() where necessary, instead of going through the trouble of creating a Starfile object.
parse_pose_star and parse_ctf_star were also just using .data_optics and .df but we would like to have the per-sample A/px and resolution values to support RELION3.1 features. For this we can now use the newly-created methods in Starfile: apix() and resolution() .
Finally, we would like to have access to Starfile methods in StarfileSource; this may come in handy when designing methods to deal with tilt data in dataset that can now use self.src instead of having to create a new Starfile object. As mentioned above, we redesigned StarfileSource to inherit from the Stardata class and thus to be initialized with an .data_optics attribute and methods such as .apix(). We use .df to access the .data Stardata attribute in this class, which will also be further parsed according to StarfileSource, MRCDataFrameSource, and ImageSource instantiation routines.
There are now three different ways to load and operate on data stored in .star files, each of which incorporate the code used in the ways preceding it:
parse_star() and write_star() for basic access to the data tablesStarfile to do non-trivial operations on these data tables, such as getting sample-wise optics valuesStarfileSource for getting the images themselves