Class Word<I>

  • Type Parameters:
    I - symbol type
    All Implemented Interfaces:
    Iterable<I>, ArrayWritable<I>, Printable

    public abstract class Word<I>
    extends AbstractPrintable
    implements ArrayWritable<I>, Iterable<I>
    A word is an ordered sequence of symbols. Words are generally immutable, i.e., a single Word object will never change (unless symbol objects are modified, which is however highly discouraged).

    This class provides the following static methods for creating words in the most common scenarios:

    Modification operations like append(Object) or concat(Word...) create new objects, subsequently invoking these operations on the respective objects returned is therefore highly inefficient. If words need to be dynamically created, a WordBuilder should be used.

    This is an abstract base class for word representations. Implementing classes only need to implement

    However, for the sake of efficiency it is highly encouraged to overwrite the other methods as well, providing specialized realizations.

    • Constructor Detail

      • Word

        public Word()
    • Method Detail

      • canonicalComparator

        public static <I> Comparator<Word<I>> canonicalComparator​(Comparator<? super I> symComparator)
      • fromSymbols

        @SafeVarargs
        public static <I> Word<I> fromSymbols​(I... symbols)
        Creates a word from an array of symbols.
        Parameters:
        symbols - the symbol array
        Returns:
        a word containing the symbols in the specified array
      • fromLetter

        public static <I> Word<I> fromLetter​(I letter)
        Constructs a word from a single letter.
        Parameters:
        letter - the letter
        Returns:
        a word consisting of only this letter
      • fromArray

        public static <I> Word<I> fromArray​(I[] symbols,
                                            int offset,
                                            int length)
        Creates a word from a subrange of an array of symbols. Note that to ensure immutability, internally a copy of the array is made.
        Parameters:
        symbols - the symbols array
        offset - the starting index in the array
        length - the length of the resulting word (from the starting index on)
        Returns:
        the word consisting of the symbols in the range
      • fromList

        public static <I> Word<I> fromList​(List<? extends I> symbolList)
        Creates a word from a list of symbols.
        Parameters:
        symbolList - the list of symbols
        Returns:
        the resulting word
      • fromWords

        public static <I> Word<I> fromWords​(Collection<? extends Word<? extends I>> words)
      • length

        public abstract int length()
        Retrieves the length of this word.
        Returns:
        the length of this word.
      • upcast

        public static <I> Word<I> upcast​(Word<? extends I> word)
        Performs an upcast of the generic type parameter of the word. Since words are immutable, the type parameter <I> is covariant (even though it is not possible to express this in Java), making this a safe operation.
        Parameters:
        word - the word to upcast
        Returns:
        the upcasted word (reference identical to word)
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class Object
      • isEmpty

        public boolean isEmpty()
        Checks if this word is empty, i.e., contains no symbols.
        Returns:
        true if this word is empty, false otherwise.
      • stream

        public Stream<I> stream()
      • subWord

        public final Word<I> subWord​(int fromIndex)
        Retrieves the subword of this word starting at the given index and extending until the end of this word. Calling this method is equivalent to calling
        w.subWord(fromIndex, w.length())
        Parameters:
        fromIndex - the first index, inclusive
        Returns:
        the word representing the specified subrange
      • subWord

        public final Word<I> subWord​(int fromIndex,
                                     int toIndex)
        Retrieves a word representing the specified subrange of this word. As words are immutable, this function usually can be realized quite efficient (implementing classes should take care of this).
        Parameters:
        fromIndex - the first index, inclusive.
        toIndex - the last index, exclusive.
        Returns:
        the word representing the specified subrange.
      • subWordInternal

        protected Word<I> subWordInternal​(int fromIndex,
                                          int toIndex)
        Internal subword operation implementation. In contrast to subWord(int, int), no range checks need to be performed. As this method is flagged as protected, implementations may rely on the specified indices being valid.
        Parameters:
        fromIndex - the first index, inclusive (guaranteed to be valid)
        toIndex - the last index, exclusive (guaranteed to be valid)
        Returns:
        the word representing the specified subrange
      • writeToArray

        public void writeToArray​(int offset,
                                 @Nullable Object[] array,
                                 int tgtOffset,
                                 int length)
        Description copied from interface: ArrayWritable
        Writes the contents of this container to an array. The behavior of calling this method should be equivalent to System.arraycopy(this.toArray(), offset, array, tgtOfs, num);
        Specified by:
        writeToArray in interface ArrayWritable<I>
        Parameters:
        offset - how many elements of this container to skip.
        array - the array in which to store the elements.
        tgtOffset - the starting offset in the target array.
        length - the maximum number of elements to copy.
      • getSymbol

        public abstract I getSymbol​(int index)
        Return symbol that is at the specified position.
        Parameters:
        index - the position
        Returns:
        symbol at position i.
        Throws:
        IndexOutOfBoundsException - if there is no symbol with this index.
      • size

        public final int size()
        Description copied from interface: ArrayWritable
        The size of this container.
        Specified by:
        size in interface ArrayWritable<I>
      • asList

        public List<I> asList()
        Retrieves a List view on the contents of this word.
        Returns:
        an unmodifiable list of the contained symbols.
      • asIntSeq

        public IntSeq asIntSeq​(ToIntFunction<I> indexFunction)
        Retrieves a IntSeq view on the contents of this word for a given indexing function (e.g. an Alphabet).
        Returns:
        an IntSeq view of the contained symbols.
      • prefixes

        public List<Word<I>> prefixes​(boolean longestFirst)
        Retrieves the list of all prefixes of this word. In the default implementation, the prefixes are lazily instantiated upon the respective calls of List.get(int) or Word.Iterator.next().
        Parameters:
        longestFirst - whether to start with the longest prefix (otherwise, the first prefix in the list will be the shortest).
        Returns:
        a (non-materialized) list containing all prefixes
      • suffixes

        public List<Word<I>> suffixes​(boolean longestFirst)
        Retrieves the list of all suffixes of this word. In the default implementation, the suffixes are lazily instantiated upon the respective calls of List.get(int) or Word.Iterator.next().
        Parameters:
        longestFirst - whether to start with the longest suffix (otherwise, the first suffix in the list will be the shortest).
        Returns:
        a (non-materialized) list containing all suffix
      • canonicalNext

        public Word<I> canonicalNext​(Alphabet<I> sigma)
        Retrieves the next word after this in canonical order. Figuratively speaking, if there are k alphabet symbols, one can think of a word of length n as an n-digit radix-k representation of the number. The next word in canonical order is the representation for the number represented by this word plus one.
        Parameters:
        sigma - the alphabet
        Returns:
        the next word in canonical order
      • lastSymbol

        public I lastSymbol()
        Retrieves the last symbol of this word.
        Returns:
        the last symbol of this word.
      • firstSymbol

        public I firstSymbol()
        Retrieves the first symbol of this word.
        Returns:
        the first symbol of this word
      • append

        public Word<I> append​(I symbol)
        Appends a symbol to this word and returns the result as a new word.
        Parameters:
        symbol - the symbol to append
        Returns:
        the word plus the given symbol
      • prepend

        public Word<I> prepend​(I symbol)
        Prepends a symbol to this word and returns the result as a new word.
        Parameters:
        symbol - the symbol to prepend
        Returns:
        the given symbol plus to word.
      • concat

        @SafeVarargs
        public final Word<I> concat​(Word<? extends I>... words)
        Concatenates this word with several other words and returns the result as a new word.

        Note that this method cannot be overridden. Implementing classes need to override the concatInternal(Word...) method instead.

        Parameters:
        words - the words to concatenate with this word
        Returns:
        the result of the concatenation
        See Also:
        concatInternal(Word...)
      • concatInternal

        protected Word<I> concatInternal​(Word<? extends I>... words)
        Realizes the concatenation of this word with several other words.
        Parameters:
        words - the words to concatenate
        Returns:
        the results of the concatenation
      • isPrefixOf

        public boolean isPrefixOf​(Word<?> other)
        Checks if this word is a prefix of another word.
        Parameters:
        other - the other word
        Returns:
        true if this word is a prefix of the other word, false otherwise.
      • longestCommonPrefix

        public Word<I> longestCommonPrefix​(Word<?> other)
        Determines the longest common prefix of this word and another word.
        Parameters:
        other - the other word
        Returns:
        the longest common prefix of this word and the other word
      • prefix

        public final Word<I> prefix​(int prefixLen)
        Retrieves a prefix of the given length. If length is negative, then a prefix consisting of all but the last -length symbols is returned.
        Parameters:
        prefixLen - the length of the prefix (may be negative, see above).
        Returns:
        the prefix of the given length.
      • isSuffixOf

        public boolean isSuffixOf​(Word<?> other)
        Checks if this word is a suffix of another word.
        Parameters:
        other - the other word
        Returns:
        true if this word is a suffix of the other word, false otherwise.
      • longestCommonSuffix

        public Word<I> longestCommonSuffix​(Word<?> other)
        Determines the longest common suffix of this word and another word.
        Parameters:
        other - the other word
        Returns:
        the longest common suffix
      • suffix

        public final Word<I> suffix​(int suffixLen)
        Retrieves a suffix of the given length. If length is negative, then a suffix consisting of all but the first -length symbols is returned.
        Parameters:
        suffixLen - the length of the suffix (may be negative, see above).
        Returns:
        the suffix of the given length.
      • flatten

        public Word<I> flatten()
        Retrieves a "flattened" version of this word, i.e., without any hierarchical structure attached. This can be helpful if Word is subclassed to allow representing, e.g., a concatenation dynamically, but due to performance concerns not too many levels of indirection should be introduced.
        Returns:
        a flattened version of this word.
      • trimmed

        public Word<I> trimmed()
      • toIntArray

        public int[] toIntArray​(ToIntFunction<? super I> toInt)
        Transforms this word into an array of integers, using the specified function for translating an individual symbol to an integer.
        Parameters:
        toInt - the function for translating symbols to integers
        Returns:
        an integer-array representation of the word, according to the specified translation function
      • transform

        public <T> Word<T> transform​(Function<? super I,​? extends T> transformer)
        Transforms a word symbol-by-symbol, using the specified transformation function.
        Parameters:
        transformer - the transformation function
        Returns:
        the transformed word
      • collector

        public static <I> Collector<I,​?,​Word<I>> collector()
        Returns a Collector that collects individual symbols (in order) and aggregates them in a Word.
        Type Parameters:
        I - input symbol type
        Returns:
        a Collector that collects individual symbols in order and aggregates them in a Word