LOSS: A Simple Text Sanitization Tool
LOSS: A Practical Text Sanitization Tool
LOSS is a command line tool written in Go that cleans and normalizes text. It is designed primarily for text produced by large language models, but it works just as well on ordinary prose.
Rather than rewriting content from scratch, LOSS focuses on removing stylistic artifacts that make text feel artificial or templated.
- Why LOSS Exists
- How LOSS Works
- Command Line Usage
- Determinism and Seeds
- What LOSS Is Not
Why LOSS Exists
Modern language models tend to produce recognizable patterns. These include boilerplate phrases, overly balanced sentence rhythm, excessive polish, markdown formatting, and emojis. While none of these are inherently wrong, they often make text feel less human.
LOSS exists to remove those patterns while preserving meaning. It does not attempt to disguise authorship or guarantee detection avoidance. Its goal is normalization, not obfuscation.
How LOSS Works
LOSS processes text through a sequence of simple, ordered stages. Each stage makes small changes and passes the result to the next stage. No single step performs aggressive rewriting.
- Structural normalization and markdown removal
- Punctuation normalization
- Emoji removal
- Sentence rhythm variation
- Vocabulary flattening
- Partial sentence rephrasing
- LLM phrase suppression
- Final cleanup and capitalization
The cumulative effect is text that feels less mechanical and more naturally written.
Command Line Usage
LOSS reads from standard input and writes to standard output. This makes it easy to integrate into scripts, pipelines, and build tools.
type input.txt | loss
Behavior can be tuned using flags. You can control how much vocabulary flattening or rephrasing occurs, or provide a seed for reproducible output.
type input.txt | loss --flatten-vocab=high --rephrase-ratio=0.3 --seed=42
Determinism and Seeds
LOSS uses randomness to avoid uniform output. This randomness affects punctuation choices, sentence rhythm, and rephrasing selection.
When a seed is provided, all randomness becomes deterministic. The same input will always produce the same output.
Optional Local Model Support
By default, LOSS is fully rule based and does not require any language model.
Optionally, LOSS can call a local OpenAI compatible API to improve sentence rephrasing and vocabulary simplification. When enabled, all processing remains local to your environment.
What LOSS Is Not
- LOSS is not a detection evasion tool
- LOSS does not guarantee perfect grammar
- LOSS does not perform full paraphrasing
- LOSS does not alter meaning
LOSS is intentionally conservative. It cleans text, it does not reinvent it.
Conclusion
LOSS is a small, focused utility for people who want cleaner, more natural text without aggressive rewriting. It favors transparency, control, and simplicity over heavy automation.
If you want text that feels less templated and more human, LOSS is worth a look.
Comments
Post a Comment