This is a very simple little snippet to create a search query in Django:
import re
from django.db.models import Q
group_exp = re.compile(‘((“[^”]+”)|(\’[^\’]+\’)|(\S+))’)
quotes_exp = re.compile(‘(^[\’”])|([\’”]$)’)
def get_query(query_string, fields):
return reduce(Q.__or__, (Q(**{‘%s__icontains’ % field: quotes_exp.sub(”, groups[0])}) for groups in group_exp.findall(query_string) if groups[0] not in [‘and’, ‘or’] for field in fields), Q())
mattw has changed the get_query function to this, which only matches whole words:
def get_query(query_string, fields):
return reduce(Q.__or__, (Q(**{‘%s__iregex’ % field: r”\b%s\b” % re.escape(quotes_exp.sub(”, groups[0]))}) for groups in group_exp.findall(query_string) if groups[0] not in [‘and’, ‘or’] for field in fields), Q())
mattw also points out that i can remove a regexp or two by using django.utils.text.smart_split that does quoted text splitting properly:
def get_query(query_string, fields):
return reduce(Q.__or__, (Q(**{‘%s__iregex’ % field: r”\b%s\b” % term.strip(‘'”’)}) for term in smart_split(query_string) if term not in [‘and’, ‘or’] for field in fields), Q())
mattw is using this in a pretty large-ish soon-to-be live system, and it’s performing equitably. It does, however, have certain limitations, in particular no way of ordering the relevance of matches. Probably it needs a way to say ‘all these terms should be found at least once across all fields.’
Straight forward to use, eg:
posts = BlogItem.objects.filter(get_query(“‘Comic Books’ hellboy batman”, [‘title’, ‘contents’]))
This code has been ungolfed from a twitter message, and extended to support the features of Julien Phalip’s implementation
Following Matt’s suggestion, it will ignore and and or if they occur in the string.
Note that smartpants is being a bit too smart here and smart-quoting the expression quotes. Sorry!
Another straight forward application of this is:
from django.db.models.query import QuerySet
def query_over_models(query, models_to_fields):
return reduce(QuerySet.__or__, (model.objects.filter(get_query(query, fields)) for model, fields in models_to_fields.iteritems()), QuerySet())
Which will return a single query set of all the objects that match eg:
query_over_models(“‘Comic Books’ hellboy batman”, {BlogItem:[‘title’,’contents’], Bookmark:[‘description’, ‘name’]})
