Basics of Free text searching


Boolean Search | Fielded Search | Wildcard SearchFuzzy Search Range Search | Boosting a Term | Grouping Fields

 

The functionality of the full-text search engine within the GCMD database has been improved with new features and more intuitive search functions that closely resemble the behavior of commercial search engines. The following was adapted from the Jakarta Lucene query syntax guide.


As with most modern full-text search engines, a query is divided into terms and operators. There are two types of terms: single (or multiple) terms and phrases.


Type in a single term such as ozone and click enter to retrieve a list of relevant titles.


Type in multiple terms such as ozone TOMS and the search engine will interpret this as ozone AND TOMS and retrieve only those descriptions with the words ozone and TOMS somewhere in the description.


Type in a phrase as a group of words surrounded by double quotes such as:

and click "Enter" to retrieve a list of descriptions.  Descriptions retrieved will contain those words together somewhere in the description. In this example, the resulting descriptions returned will contain the phrase "sea ice".

Note: The search engine is not case sensitive. Proper results will be returned if you type antarctica, ANTARCTICA, or Antarctica.



Boolean Search

Boolean operators are offered through the words:


  • AND or "+" - Two or more terms or phrases must be in the description.  "AND" is the default operator.

  • OR - Either one or the other of the multiple terms specified must be in the description.

  • NOT or "-" - A term or phrase specified is excluded from the search.

  • Note: The Boolean operators OR and NOT must be specified explicitly and must be in CAPITAL LETTERS. If you have multiple terms in the query without any operators or quotes, an AND operator is assumed.

Search Example: The search engine interprets this query as 'ozone AND TOMS AND polar AND antarctica' and will return only descriptions that contain all of those words.


  • AND Query Examples:






  • If two (or more) terms are specified in a query, descriptions containing both words will be retrieved. An AND Boolean operator is assumed between the terms (sea AND winds). This is equivalent to an intersection using sets. The symbol, &&, can be used in place of the word AND.

  • OR Query Example:

  • If two (or more) terms in a query are separated by the Boolean operator, OR, (Note the Boolean operator MUST be capitalized, otherwise the search engine will assume it is a word to be searched). The query will retrieve descriptions with either sea or topography. As a general rule, "OR" queries will return more hits than "AND" queries. This is equivalent to a union using sets. The symbol || can be used in place of the word OR.

  • NOT Query Examples:




  • In this query, the Boolean, "NOT", separates the two terms. This query will retrieve all descriptions with the word, "sea", but not the word, "ice". In other words, if the description contains both sea and ice, the description will not be retrieved. This is equivalent to a difference using sets. The symbol "!"can be used in place of the word " NOT".

 

Return back to the top of this page.


Fielded Searching

The search engine allows you to restrict your search to any DIF or SERF metadata field (see DIF and SERF user guides for the list of metadata fields).  The syntax is as follows:


DIF/dif_field_name: query
or
SERF/serf_field_name: query


For example, if you want to restrict your search to the DIF title field, simply specify
and only those descriptions with "AVHRR" in the title will be returned.


and only those description with "software" in the title will be returned. The fielded searching also allows you to drill through subfields.  For example, you can specify the exact Science Keyword (Parameter) hierarchy or Personnel field to conduct your search:


will return all descriptions with the phrase "carbon dioxide" as a Variable_Level_1 keyword.


will return all descriptions with the phrase "carbon dioxide" within the parameter field.


will return all descriptions with "Personnel" with the last name "Smith".



Modified Queries

The search engine supports the following term modifiers for enhanced searching options:

  • Wildcard searches
  • Fuzzy searches
  • Proximity searches
  • Range searches
  • Term boosting


Wildcard Searches
The search engine supports single and multiple character wildcard searches.

  • To perform a single character wildcard search use the "?" symbol.
  • To perform a multiple character wildcard search use the "*" symbol.

The single character wildcard search looks for terms that match with a single character replaced. For example, to search for "text" or "test" you can use the search:


Multiple character wildcard searches look for 0 or more characters. For example, to search for wind, winds or windy, you can use the search:


You can also use the wildcard search in the middle of a term.


Note: You cannot use a "*" or "?" symbol as the first character of a search.

 


Return back to the top of this page.


Fuzzy Searches
The search engine supports fuzzy searches based on the "Levenshtein Distance", or "Edit Distance" algorithm. To do a fuzzy search use the tilde, "~", symbol at the end of a word Term.  For example, to search for a term similar in spelling to "roam", use the fuzzy search: roam~

This search will identify terms like foam and roams.

An additional (optional) parameter may be used to specify the required similarity.  With a value closer to "1", only terms with a higher similarity will be matched. For example: Note: The default is 0.5.

Proximity Searches
The search engine supports finding words that are within a specific distance away from the query term. To do a proximity search, use the tilde, "~", symbol at the end of a phrase. For example to search for "greenhouse" and "carbon" within 10 words of each other in a description use the search:



Range Searches
Range queries allow one to match descriptions where the field(s) values are between the lower and upper bound specified by the Range Query. Range Queries can be inclusive or exclusive of the upper and lower bounds. Inclusive range queries are denoted by square brackets [ ]. Exclusive range queries are denoted by curly brackets { }. Sorting is done lexicographically.

This query will identify all descriptions with titles between Greenhouse and IPCC, but not including Greenhouse and IPCC.



Boosting a Term
The search engine provides the relevance level of matching descriptions based on the terms found. To boost a term use the caret, "^", symbol with a boost factor (a number) at the end of the term you are searching. The higher the boost factor, the more relevant the term will be.

Boosting allows you to control the relevance of a description by boosting its term.  For example, if you are searching for: greenhouse carbon and you want the term "greenhouse" to be more relevant, you can boost it by using the ^ symbol, along with the boost factor next to the term.  You would type: greenhouse^4 carbon

This will make descriptions with the term "greenhouse" appear more relevant. You can also boost Phrase Terms as in the example:

Note: By default, the boost factor is 1. Although the boost factor must be positive, it can be less than 1 (e.g. 0.2) .

 


Return back to the top of this page.

 



Grouping
The search engine supports using parentheses to group clauses to form sub queries. This can be very useful if you want to control the Boolean logic for a query.

To search for either "greenhouse" or "carbon" and "emissions" use the query:

This eliminates any confusion and ensures that emissions must exist and that either term greenhouse or carbon may exist.



Field Grouping
The search engine supports using parentheses to group multiple clauses to a single field.

To search for a title that contains both the word, "emissions" and the phrase, "global warming" use the query:


NASA Logo - nasa.gov
Link to Web Site