[chemfp] new FPC format

Andrew Dalke dalke at dalkescientific.com
Fri Aug 22 13:39:52 EDT 2025


Hi subscribers,

  Chemfp 5.0 will add support for sparse count fingerprints.

I've developed the FPC format as the count equivalent to the FPS format.

Here is an example, which should look familiar:

  #FPC1
  #type=RDKit-MorganCount/2 radius=3 useFeatures=0
  #software=RDKit/2024.09.5 chemfp/5.0
  #date=2025-08-15T11:17:47+00:00
  847433064,2245897107,2551683561 CHEMBL183419
  864674487,1215180924,1600340860:2,2215059400,2245384272:2,2246728737:2,3542456614:2,3994088662:2 CHEMBL16264

Each feature is a feature id then colon then count. If the count is 1 then the colon and count can be omitted.

Empty fingerprints are represented with a "*".

The format definition is at https://chemfp.com/fpc_format/ .

The "chemfp" command in 5.0 support several new subcommands:

  rdkit2fpc - generate count fingerprints from structures
  fpc2fps - convert count fingerprints to binary, using one of several methods
  fps2fpc - convert binary fingerprints to count.

NOTE: chemfp 5.0 does not support direct search of count fingerprints, nor is count functionality available from the Python API. You will need to convert the count fingerprints to binary ones for search.

I have long term goals to add better support count fingerprints. This is just a first step. Contact me if you are interested in funding that work. 

Best regards,

					Andrew
					dalke at dalkescientific.com





More information about the chemfp mailing list