[chemfp] chemfp-1.1 is out!

Andrew Dalke dalke at dalkescientific.com
Tue Feb 5 18:18:53 EST 2013


Hi chemfp-ers!

I've released chemfp-1.1. This is mostly the same as the 1.1b4 which
I put out a year ago, but I've finally finished updating the documentation.

There's a new web site, chemfp.com, which is more of a marketing site.
The project page is still http://code.google.com/p/chem-fingerprints/ .

I'll break the following up into three parts: 1) new features, 2) API
changes, and 3) business model changes.

=== New features

These are the biggest new features:

- Added OpenMP support for chemfp_knearest_tanimoto_arena_symmetric
(Turned out I had overlooked this one.)

- New methods to look up arena records given an identifier:
arena.get_by_id(id)
arena.get_index_by_id(id)
arena.get_fingerprint_by_id(id)

- New arena.copy(indices=None, reorder=None) which will make new arena
from an old one. If you give it indices, like arena.copy([1,5,9])
then the new arena will have those three fingerprints. This makes
it very simple and fast to make a new arena which samples 100
fingerprints from an arena, with:

indices = random.sample(xrange(len(arena)), 100)
random_subset = arena.copy(indices)

Or you can make an arena of those fingerprints which are 0.8
similar to "ABC123" using:

import chemfp
from chemfp import search
query_fp = arena.get_fingerprint_by_id("ABC123")
hits = search.threshold_tanimoto_search_fp(query_fp, arena, 0.8)
neighbors = arena.copy(query_fp.get_indices())


- New methods on the SearchResult and SearchResults for getting the hit count
and cumulative raw score for a specified range.

For example:

result = search.threshold_tanimoto_search_fp(query_fp, arena, 0.2)
for threshold in (0.2, 0.5, 0.8, 0.9, 0.95, 1.0):
print "%.2f -> %d" % (threshold, result.count(min_score=threshold)

The full details are in the CHANGELOG in the distribution.

=== API changes

There are several API changes. The biggest is a change in viewpoint. Up until
last week the documentation suggested that you search and arena by
using a method call, like this:

arena = chemfp.load_fingerprints(...)
results = arena.knearest_tanimoto_search_fp(query_fp, k=10)

This was nice at the start, because it doesn't involved importing a
search-specific module. But as I started to add new functions, like
the symmetric variations, the number of methods was starting to be
excessive.

What I've done is make the "chemfp.search" module part of the public
API. You should use that instead of the arena methods.

The methods are still there. The only change is the docstring, which
says that they are 'deprecated'. For the next release, I'll make them
issue a DeprecationWarning, and I'll remove the methods for a release
after that.


I've also renamed the 'chemfp.decoders' module to 'chemfp.encodings".
This is because I'll be adding encoders to the module. The decoders
module wasn't documented as part of the public API, but I've told a
few people about it. In order to limit compatibility problems, the
'decoders' module is still present. It generates a DeprecationWarning
then imports the actual decoders from the 'encodings' module.

Finally, I've renamed 'readers' to 'fps_io'. My original intent was
that the FPS and FPB readers would be in the 'readers' module, but
since the FPB reader and writer code is rather coupled, I figured it
was best to have an module just for the binary code.

The 'readers' module wasn't documented and I don't think anyone
uses it, so I went and did a module rename.


==== Business model change

With chemfp-1.1, I've switched to an inventive model. People who
buy the commercial license will get access to new features first,
and the general public will get it at no cost at least one year
later.

For example, when chemfp-1.2 comes out, which should be by summer,
it will go to the current customers but it won't be available to
others until fall 2014.

The time delay will be at least one year, since that's how long
a support contract lasts, and no more than three years. Also,
commercial users will be able to renew a license for 1/3rd of the
current license fee.

To make it more interesting, and to fit in with my personal views,
both groups will get the software under the BSD license. So,
technically this is "distributing free software for a fee."

If I can make 10 sales per year then I can fund the chemfp
development. I think that's an achievable goal, though I don't
expect it to happen until 1.2 or later, once the commercial
package has gotten far enough ahead of the no-cost version.

Why do I still have the no-cost version? I started chemfp in
part because of the frustration of having to write new
fingerprint code for each project, including have to write
project-specific fingerprint parsers.

I want to promote the FPS format so I can stop worrying about
that, and focus instead on the more interesting tasks. But no
one will use a new format without good, useful tools.

Plus, I think the no-cost version is good marketing.

Cheers,


Andrew
dalke at dalkescientific.com




More information about the chemfp mailing list