I disagree with the characterization that they are a nuisance. Cyclic terms can be effectively used to model graphs and graphs are a fundamental data structure in mathematics and computer science. Real world examples of things that can be modelled by graphs include transportation, communication and power networks, electrical circuits, finite state machines, and computer programs.
Pretty much any programming language supports the equivalent of cyclic terms, usually by some form of reference that permit the construction referential loops in data structures. And any algorithm that attempts to convert them to a linear format, e.g., a write procedure producing text, must take that into account. (Note that you can’t actually output a graph; at best you can output a “program” that produces that graph if executed.)
Graphs can modelled without cyclic terms by breaking them down into acyclic pieces that can be labelled, and references can be replaced by the corresponding labels. (And sometimes that’s a natural way to build them in the first place.) The pieces get put back together by adding something akin to a symbol table that map labels back to the pieces. That’s fine but adds complexity (the symbol table) and degrades performance by having to “resolve” the labels whenever a reference is accessed. And now you have multiple pieces so it requires a custom output “function” in any case to produce a textual representation. So representing graphs as cyclic terms has some obvious advantages IMO.
As for examples, I have two in recent experience:

Grammars (e.g., a BNF or PEG) consist of one or more rules and rule bodies can reference other rules so loops are not uncommon on grammars of any complexity, e.g., a grammar for parenthetical arithmetic expressions, e.g., 2*(3+5). So a grammar can be represented by a cyclic term where “calls” to a rule are represented by variables which then get bound to the term corresponding to that rule.
Now grammars can be viewed as declarative programs  given a suitable “virtual machine”, a grammar term can be executed to parse an input string in that grammar. Since parsing efficiency is usually an issue, the ability to represent a grammar as a cyclic term is an advantage.

The interval constraint network in pack clpBNR
is one large cyclic term (incrementally constructed). Intervals are attributed variables whose attributes include a list of constraints, e.g., {X*Y=<1}
. Each constraint is an expression between that interval and others and so cycles are impossible to avoid. These constraint networks can be quite large  hundreds of intervals and constraints  but that’s all hidden behind the interval instances.
In both examples, performance is an important success criteria. Fortunately the cyclic terms can be largely hidden behind a module API. For example, the need to write them out in some text form is rarely, if ever, necessary, although debugging can sometimes be a challenge.
My 2¢.