From dalke at dalkescientific.com Fri Aug 22 13:39:52 2025 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 22 Aug 2025 19:39:52 +0200 Subject: [chemfp] new FPC format Message-ID: <85B5937F-52AC-4CFF-AF30-FFCFCCBE76FB@dalkescientific.com> Hi subscribers, Chemfp 5.0 will add support for sparse count fingerprints. I've developed the FPC format as the count equivalent to the FPS format. Here is an example, which should look familiar: #FPC1 #type=RDKit-MorganCount/2 radius=3 useFeatures=0 #software=RDKit/2024.09.5 chemfp/5.0 #date=2025-08-15T11:17:47+00:00 847433064,2245897107,2551683561 CHEMBL183419 864674487,1215180924,1600340860:2,2215059400,2245384272:2,2246728737:2,3542456614:2,3994088662:2 CHEMBL16264 Each feature is a feature id then colon then count. If the count is 1 then the colon and count can be omitted. Empty fingerprints are represented with a "*". The format definition is at https://chemfp.com/fpc_format/ . The "chemfp" command in 5.0 support several new subcommands: rdkit2fpc - generate count fingerprints from structures fpc2fps - convert count fingerprints to binary, using one of several methods fps2fpc - convert binary fingerprints to count. NOTE: chemfp 5.0 does not support direct search of count fingerprints, nor is count functionality available from the Python API. You will need to convert the count fingerprints to binary ones for search. I have long term goals to add better support count fingerprints. This is just a first step. Contact me if you are interested in funding that work. Best regards, Andrew dalke at dalkescientific.com