[chemfp] Open Babel Fastsearch to FPS conversion tool

Andrew Dalke dalke at dalkescientific.com
Tue Dec 19 21:17:13 EST 2017


Hi chemfp users,

  I've been working on a tool to convert an Open Babel "Fastsearch" index file (typically with a .fs extension) into FPS fingerprints.  I've included the current --help at the bottom of this email.


Is anyone here interested in helping me test it out?

I developed it so I could do a side-by-side comparison of Open Babel's and chemfp's similarity search. I wanted to make sure chemfp used the same fingerprints.

It might also be useful for those who want both OB and chemfp fingerprints, because the structures will only need to be processed once, to produce the .fs file, rather than processed again with ob2fps.

What's mostly needed is feedback about the --help documentation and the error messages/error handling. I also don't know if my code works correctly for large files where the original data file is more than 2GB long.


If you are interested, let me know by private email. 

				Andrew
				dalke at dalkescientific.com

usage: fs2fps.py [-h] [-d FILENAME] [--in FORMAT]
                 [--delimiter {to-eol,tab,whitespace,space}]
                 [--type TYPE | --no-type] [--source SOURCE | --no-source]
                 [--date DATE | --no-date] [--output FILENAME] [--out FORMAT]
                 INDEX_FILE

Convert an Open Babel fastsearch index into FPS format.

positional arguments:
  INDEX_FILE            fastsearch index filename (usually ends with '.fs')

optional arguments:
  -h, --help            show this help message and exit
  -d FILENAME, --datafile FILENAME
                        Location of structure data filename . Used to extract
                        record identifiers. Only SD and SMILES files are
                        supported.
  --in FORMAT           Structure data file format (default: guess based on
                        the extension and default to 'smi')
  --delimiter {to-eol,tab,whitespace,space}
                        delimiter style a SMILES file (default: 'to-eol')
  --type TYPE           specify the 'type' metadata value (default: use the
                        type from the index file)
  --no-type             do not include the type in the metadata
  --source SOURCE       specify the 'source' metadata value (default: use the
                        --datafile or the data filename in the index)
  --no-source           do not include the source in the metadata
  --date DATE           specify the 'date' metadata value (default: use the
                        creation data of the index). This must be in ISO
                        format in UTC, for example: '2017-12-20T02:12:18'
  --no-date             do not include the source in the metadata
  --output FILENAME, -o FILENAME
                        save the fingerprints to FILENAME (default=stdout)
  --out FORMAT          output fingerprint format. One of fps, fps.gz, or fpb.
                        (default guesses from output filename, or is 'fps')


More information about the chemfp mailing list