CoffeeScript Application Development Cookbook
上QQ阅读APP看书,第一时间看更新

Working with strings

In this section, we will look at the various aspects of working with strings or text-based data.

String interpolation

In this section, we will demonstrate the CoffeeScript feature of string interpolation.

Getting ready

In JavaScript, creating strings that include variable values involves concatenating the various pieces together. Consider the following example:

var lineCount = countLinesInFile('application.log');
var message = "The file has a total of " + lineCount + " lines";
console.log(message);

This can get pretty messy and CoffeeScript provides an elegant solution to avoid this called string interpolation.

How to do it...

CoffeeScript provides the ability to perform string interpolation by using double quoted strings containing one or more #{} delimiters.

The preceding example can be written as follows:

lineCount = countLinesInFile 'application.log'
message = "The file has a total of #{lineCount} lines"
console.log message

This not only requires less typing, but it can also be easier to read.

How it works...

String interpolation will evaluate the expression inside the delimiter and its placeholder is replaced by the expression's result.

Consider the following simple expression:

console.log "Simple expressions are evaluated: 5 x 6 = #{ 5 * 6 }"

The output of the preceding expression will be as follows:

Simple expressions are evaluated: 5 x 6 = 30

String interpolation can also evaluate complex expressions as follows:

num = 23
console.log "num is #{ if num % 2 is 0 then 'even' else 'odd' }."

The output of the preceding expression will be as follows:

num is odd.

Tip

In the two previous examples, we evaluated expressions inside the string for demonstration only. It is generally discouraged and it is almost always better to separate that logic into its own method. When in doubt, pull it out.

There's more...

String interpolation works by evaluating the expression inside the #{} delimiter and having JavaScript coerce the value into a string. We can control this on our own objects by creating a toString() function that will be used by the coercion mechanism. By default, coercion for an Object will display [object Object].

In the following example, we create an Employee class with a toString() function to override the default coercion value:

class Employee
  constructor: (@firstName, @lastName, @empNum) ->
  toString: ->
    return "#{@firstName} #{@lastName} (No: #{@empNum})"

We can now use an Employee instance with string interpolation and receive a more valuable result:

employee = new Employee('Tracy', 'Ouellette', 876)
console.log "Employee Info: #{employee}"

Its output will be:

Employee Info: Tracy Ouellette (No: 876)

Wrapping text

When working with text, you may need to wrap a long piece of text over a number of lines in order to not exceed the maximum width.

In this section, we will see how to accomplish this using a regular expression.

Tip

Regular expressions are patterns to be matched against strings and can be used to perform pattern matching, string manipulations, or testing. Regular expressions have been highly optimized and perform better than other string manipulations.

How to do it...

In the following steps, we create a wrapText() function that uses a regular expression to split a piece of text at a specified maximum length:

  1. Define the function as follows:
    wrapText = (text, maxLineWidth = 80, lineEnding = '\n') ->
  2. Create a regular expression instance:
      regex = RegExp \".{1,#{maxLineWidth}}(\\s|$)|\\S+?(\\s|$)", 'g'
  3. Extract matching segments in text, join them with lineEnding, and return the result:
      text.match(regex).join lineEnding

How it works...

The wrapText() function takes a text parameter that represents the text data to be processed and a second optional maxLineWidth parameter representing the desired maximum width. The maximum width parameter will default to 80 characters if no value is passed. There is another optional parameter allowing you to specify the line ending, which defaults to a new line character.

We create a regular expression instance using the RegExp() constructor function passing a string interpolated value representing our expression and a modifier.

If we break the regular expression down into its basic blocks, we are requesting segments containing 1 to maxLineWidth characters {1, maxLineWidth}, separating each by a whitespace character or the end of the line (\s|$). We also provide an additional rule to handle scenarios where there are no whitespace characters within 1 to maxLineWidth, which will break at the next available whitespace character \S+?(\\s|$).

We use the String.match() function, which takes a regular expression and returns the segment or segments that match the expression. By default, only the first match is returned, which is not what we want in this case. We use the g (global) modifier when we create our RegExp instance, which will return all matching segments as an array.

Our function ends by calling the Array.join() function, which will join all of the array elements and separate each one with lineEnding.

To demonstrate the method in action, we call the wrapText() method with some sample text from Homer's Odyssey:

homersOdyssey = "He counted his goodly coppers and cauldrons, his
  gold and all his clothes, but there was nothing missing; still
  he kept grieving about not being in his own country, and 
  wandered up and down by the shore of the sounding sea bewailing
  his hard fate. Then Minerva came up to him disguised as a young
  shepherd of delicate and princely mien, with a good cloak folded
  double about her shoulders; she had sandals on her comely feet
  and held a javelin in her hand. Ulysses was glad when he saw
  her, and went straight up to her."

console.log wrapText(homersOdyssey, 40, '<br />\n')

Tip

Notice that we used CoffeeScript's ability to declare a text variable that spans multiple lines. If we use single double quotes, strings that span multiple lines are joined by a space. If we wish to preserve formatting, including line breaks and indentation, we can use triple double quotes """. Consider the following example:

title = """
<title>
    CoffeeScript Strings
</title>
"""

This code will produce a string such as <title>\n CoffeeScript Strings\n</title>.

For the preceding example, the output is as follows:

He counted his goodly coppers and <br />
cauldrons, his gold and all his clothes, <br />
but there was nothing missing; still he <br />
kept grieving about not being in his own <br />
country, and wandered up and down by the <br />
shore of the sounding sea bewailing his <br />
hard fate. Then Minerva came up to him <br />
disguised as a young shepherd of <br />
delicate and princely mien, with a good <br />
cloak folded double about her shoulders; <br />
she had sandals on her comely feet and <br />
held a javelin in her hand. Ulysses was <br />
glad when he saw her, and went straight <br />
up to her.

See also

Our wrapText() method made use of a simple regular expression to split text into individual words. See the Using regular expressions recipe for more information on using this powerful JavaScript feature.

Truncating text

In this section, we will see how we can truncate text into the desired size without truncating the middle of words.

How to do it...

Truncating text can be handled in much the same way as we handled word wrapping:

  1. Define your function:
    truncateText = (text, maxLineWidth  = 80, ellipsis = '...') ->
  2. Reduce the maximum line width by the length of the ellipsis:
      maxLineWidth -= ellipsis.length
  3. Create your regular expression:
      regex = RegExp \
        ".{1,#{maxLineWidth}}(\\s|$)|\\S+?(\\s|$)"
  4. Return the first element of the match() result after it has been trimmed with the desired ellipsis:
      "#{text.match(regex)[0].trim()}#{ellipsis}"

How it works...

Our truncateText() function takes a text parameter representing the text data to be truncated and two optional parameters: maxLineWidth representing the maximum width of the text desired, and ellipsis representing a string to end our resultant line.

We use the same regular expression as we did in the previous Wrapping text recipe. In this case, however, we reduce the maximum line length by the length of the ellipsis. This will ensure that our result will not exceed the maximum line length.

Because we are not using a regular expression modifier, only the first match is returned.

Consider this example:

homersOdessy = 'He counted his goodly coppers and cauldrons, his gold and all his clothes, but there was nothing missing;'

console.log truncateText homersOdessy, 30

The output for this code will be:

He counted his goodly...

Converting character casing

In this recipe, we will demonstrate how to convert text from one casing scheme to another:

  • Sentence case, for example, This is an example of sentence case
  • Title case, for example, This Is an Example of Title Case
  • Pascal case, for example, PascalCase
  • Camel case, for example, camelCase
  • Snake case, for example, snake_case

How to do it...

We will define our case conversion methods as a utility module that we can use for any application:

  1. Create a constant array with the list of those words that are not capitalized within titles:
    WORD_EXCEPTIONS_FOR_TITLECASE = \
      ['a','an','and','but','for','nor','or','the']
  2. Create some helper methods to split words on whitespace or capitalization and another to capitalize the first letter of the word:
    capitalizeWord = (word) ->
      word[0].toUpperCase() + word[1..].toLowerCase()
    
    upperSplit = (item) ->
      words = []
      word = ''
    
      for char in item.split ''
        if /[A-Z]/.test char
          words.push word if word.length
          word = char
        else
          word += char
    
      words.push word if word.length
    
      return words
    
    splitStringIntoTokens = (text) ->
      results = []
    
      for token in text.split /[ _]+/
        token = token.trim()
        words = upperSplit token
        for word in words
          results.push word.toLowerCase()
    
      results
  3. Create a function to return a string in title case:
    toTitleCase = (text, wordsToIgnore = WORD_EXCEPTIONS_FOR_TITLECASE) ->
      words = splitStringIntoTokens text
      words[0] = capitalizeWord words[0]
      for word, index in words[1..]
        unless word in wordsToIgnore
          words[index+1] = capitalizeWord word
    
      words.join ' '
  4. Create a function to return a string in sentence case:
    toSentenceCase = (text) ->
      words = splitStringIntoTokens text
      words[0] = capitalizeWord words[0]
      words.join ' '
  5. Create a function to return a string in snake case:
    toSnakeCase = (text) ->
      splitStringIntoTokens(text).join '_'
  6. Create a function to return a string in Pascal case:
    toPascalCase = (text) ->
      (capitalizeWord word for word in splitStringIntoTokens(text)).join ''
  7. Create a function to return a string in camel case:
    toCamelCase = (text) ->
      text = toPascalCase text
      text[0].toLowerCase() + text[1..]
  8. Assign your functions to the module.exports object so they are made available to your applications:
    module.exports =
      toSentenceCase: toSentenceCase
      toTitleCase: toTitleCase
      toPascalCase: toPascalCase
      toCamelCase: toCamelCase
      toSnakeCase: toSnakeCase

How it works...

The module starts with a capitalizeWord() method that takes a single word as a parameter and returns the word capitalized. For example, capitalizeWord 'hello' returns Hello.

The splitStringIntoTokens() method is the workhorse of our module and is responsible for breaking up a string of text into various words. For sentences, this is easily accomplished by splitting the string by spaces. We also want to be able to parse text that contains Pascal and camel case words. This will allow us to convert from Pascal case to snake case, camel case, and so on. We accomplish this by passing each token (word) to the inner upperSplit() method, which reviews the letters of each word, looking for an uppercase value representing the start of a new word.

The splitStringIntoTokens 'Hello world' annotation will return an array containing two words ['hello', 'world']. splitStringIntoTokens 'HelloWorld'. Notice that the words are all lowercase. This helps to normalize the tokens for later processing.

The following methods are responsible for using the individual words that have been split from the text provided and returning the text in the various casing formats. Each takes a single parameter representing the text to be parsed. The toTitleCase() function takes an optional array of words to ignore when performing title case conversion. If no array is provided, the default WORD_EXCEPTIONS_FOR_TITLECASE array is used.

We finish by exporting toTitleCase(), toSentenceCase(), toPascalCase(), toCamelCase(), and toSnakeCase()as the public API for our casing utility module.

The following code is a small application to demonstrate our casing module:

caseUtils = require './casing_utils'

console.log 'Title:', caseUtils.toTitleCase 'an author and his book'
console.log 'Sentence:', caseUtils.toSentenceCase 'this should be in sentence case'
console.log 'Pascal:', caseUtils.toPascalCase 'this should be in pascal case'
console.log 'Camel:', caseUtils.toCamelCase 'this should be in camel case'
console.log 'Snake:', caseUtils.toSnakeCase 'this should be in snake case'

The output for this code is as follows:

Title: An Author and His Book
Sentence: This should be in sentence case
Pascal: ThisShouldBeInPascalCase
Camel: thisShouldBeInCamelCase
Snake: this_should_be_in_snake_case

Using regular expressions

Regular expressions can be used when working with text data and provide a powerful tool to process text. This is accomplished by passing or using processing instructions to the various methods that accept regular expressions as parameters or by executing the regular expression directly.

We have already seen regular expressions used to split strings and test a value. These can be used as parameters to the split() and replace() methods. In these cases, the regular expression is used as a matcher.

How to do it...

Let's look at how we can utilize regular expressions using split(), replace(), and test():

# SPLIT() USING A REGULAR EXPRESSION
whiteSpaceRegex = /[\s]/

words = "A happy\tday\nis here"
console.log "Value:", words
console.log (words.split whiteSpaceRegex)

# REPLACE() USING A REGULAR EXPRESSION
phrase = 'The blue balloon is bright'
console.log "Red balloon:", (phrase.replace /blue/, 'red')

# TEST() USING A REGULAR EXPRESSING
validIpAddress = '192.168.10.24'
invalidIpAddress = '192.168-10.24'
testRegex = /\d+\.\d+\.\d+\.\d+/
console.log "#{validIpAddress} valid?", (testRegex.test validIpAddress)
console.log "#{invalidIpAddress} valid?", (testRegex.test invalidIpAddress)

How it works...

The following example uses a regular expression to split a string on whitespaces \s including spaces, tabs, newlines, and others. Note that the regular expression is enclosed in two forward slashes /.

The output for the preceding example is:

Value: A happy  day
is here
[ 'A', 'happy', 'day', 'is', 'here' ]

Tip

Note that regular expressions can also be created using the RegExp constructor. In our example, the whiteSpaceRegex expression could have also been written as follows:

whiteSpaceRegex = new RegExp '\s'

In the replace() example, we replace all instances of blue with red. This updates our phrase to The red balloon is bright.

By default, regular expressions are case sensitive. You can make the matching pattern case insensitive by adding the \i modifier. For example, "It's a Wonderful Life".replace /life/i, "Book" will return It's a Wonderful Book.

You can use the RegExp test() method to see whether a string matches the regular expression pattern. In our example, we have two IP addresses, one that is valid and one that is not. We have a pattern that represents a sequence of four numbers separated by periods. Our invalid IP address uses a hyphen.

Running the example, we have:

192.168.10.24 valid? true
192.168-10.24 valid? False

Tip

Note that our test for IP address that the IP address consists of four positive integers separated by periods. To validate that each segment is between 0 and 255, we can use the following regular expression:

/(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[09][0-9]?)/

There's more...

There are many great online resources to learn more about regular expressions including the following: