FileLister — Command-Line File Inventory & ReportingFileLister is a compact, command-line utility designed to quickly create inventories of files and generate flexible reports for system administrators, developers, data stewards, and power users. It focuses on speed, low resource usage, and rich output options (CSV, JSON, plain text, and simple HTML). This article explains FileLister’s purpose, core features, typical workflows, configuration and example commands, data formats, performance considerations, integration points, and best practices for real-world use.
Why use a command-line file inventory tool?
A command-line tool like FileLister excels in situations where GUI tools are too slow, unavailable, or impractical—remote servers, automated scripts, cron jobs, and environments with limited resources. Key advantages include:
- Automation-friendly: easily invoked from scripts, CI pipelines, and configuration management tools.
- Low overhead: minimal memory and CPU usage compared with full-featured file managers.
- Repeatability: deterministic, reproducible output for auditing and compliance.
- Flexibility: filter, format, and aggregate results in many ways for different audiences.
Core features
- Fast recursive scanning of directory trees with optional depth limits.
- File attribute collection: name, path, size (bytes), human-readable size, owner, group, permissions (symbolic and octal), modification/access/change times (ISO 8601), MIME type detection, and checksum (MD5/SHA1/SHA256) options.
- Advanced filtering: by name/glob, regex, size ranges, file age, owner/group, permissions, and MIME type.
- Output formats: CSV, JSON, plain text, and simple HTML.
- Sorting and aggregation: by size, date, MIME type, owner, extension, or directory.
- Incremental/changed-file reporting mode to list only files added/changed since a previous inventory.
- Exclusion rules via patterns, .filelisterignore, or CLI flags.
- Extensible via plugin hooks or output templates for custom reporting.
- Lightweight: single binary (or script) with few external dependencies.
Typical workflows
- Ad-hoc inventory
- Quick scan for an overview of a directory:
filelister --path /var/www --depth 2 --format plain
- Quick scan for an overview of a directory:
- Generate CSV for spreadsheet analysis
- Export file list with sizes and modification times:
filelister -p /home/user/projects -o projects.csv --format csv --fields path,size,mtime
- Export file list with sizes and modification times:
- Produce JSON for ingestion by another tool
- Useful for dashboards or further processing:
filelister -p /data -f json --recursive --checksum sha256 > data-index.json
- Useful for dashboards or further processing:
- Scheduled incremental reporting
- Run daily in cron to capture changed files:
filelister -p /srv -o /var/log/filelister/daily-$(date +%F).csv --since /var/log/filelister/last-scan.json
- Run daily in cron to capture changed files:
- Compliance and audit
- Capture permissions and ownership for audit trails:
filelister -p /etc -f csv --fields path,owner,group,mode,mtime > etc-permissions.csv
- Capture permissions and ownership for audit trails:
Command-line options (example CLI)
Common options you’ll find in FileLister:
- –path, -p: directory to scan (default: current directory)
- –recursive, -r: scan recursively (default: true)
- –depth: limit recursion depth
- –format, -f: output format (csv, json, text, html)
- –output, -o: output file (default: stdout)
- –fields: comma-separated list of fields to include (e.g., path,name,size,mtime,mode,owner,checksum)
- –sort: field to sort by (prefix with “-” for descending)
- –filter, –exclude: filter expression or exclude patterns
- –since: path to previous inventory or timestamp for change reporting
- –checksum: none, md5, sha1, sha256
- –follow-symlinks: follow symbolic links (use with caution)
- –threads: number of worker threads for scanning/checksums
- –ignore-file: path to ignore patterns file (similar to .gitignore)
- –mime: detect MIME types (may use libmagic)
- –human: show human-readable sizes
Output formats and examples
CSV (for spreadsheets):
path,size,mtime,owner,group,mode /home/user/file.txt,1024,2025-09-09T12:34:56Z,user,users,0644
JSON (for programmatic use):
[ {"path":"/var/log/syslog","size":12345,"mtime":"2025-09-09T12:00:00Z","owner":"root","group":"adm","mode":"0640"} ]
Plain text (human-friendly):
/home/user/file.txt 1.0K 2025-09-09T12:34:56Z user:users 0644
Simple HTML (for sharing):
<table> <tr><th>Path</th><th>Size</th><th>Modified</th></tr> <tr><td>/home/user/file.txt</td><td>1.0K</td><td>2025-09-09T12:34:56Z</td></tr> </table>
Performance considerations
- Disk I/O is typically the bottleneck; using more threads improves throughput for networked filesystems or when computing checksums, but offers diminishing returns on local SSDs.
- Avoid checksum calculation for large inventories unless needed — it’s CPU and I/O intensive.
- Use depth limits and excludes to reduce scan surface.
- Prefer incremental mode with a saved inventory for frequent runs.
- For very large datasets, stream output (newline-delimited JSON or CSV) to avoid large memory usage.
Integration points
- CI/CD pipelines: run as part of build/test steps to record artifacts.
- Backup systems: pre-check and report files that match backup policies or exceed size thresholds.
- Asset management: produce inventories for media libraries, research data, or server file stores.
- Security and compliance: export permissions and ownership for audits.
- Monitoring: integrate with log collectors or dashboards by outputting JSON to stdout.
Best practices
- Store periodic inventories (timestamped) so you can detect trends or regressions.
- Use exclusion files (.filelisterignore) to omit cache directories, temporary files, or vendor dependencies.
- Run as a non-root user when possible to avoid exposing sensitive paths or changing ownership of files accidentally.
- Combine FileLister output with tools like jq, csvkit, or spreadsheet software for analysis.
- When sharing reports, strip checksums and sensitive metadata unless necessary.
Example: full pipeline
- Daily cron job creates a gzipped CSV index and uploads it to a secure S3 bucket:
0 2 * * * /usr/local/bin/filelister -p /data -f csv --fields path,size,mtime,owner,group --exclude '*.tmp' | gzip > /var/backups/file-index-$(date +%F).csv.gz && aws s3 cp /var/backups/file-index-$(date +%F).csv.gz s3://my-bucket/file-index/
Extensibility and advanced usage
- Plugins can add metadata collection (e.g., EXIF data for images, audio tags, or database row counts).
- Template-based output lets teams produce branded HTML or Markdown reports.
- Use a backend database (SQLite or PostgreSQL) for storing large inventories and running complex queries (e.g., find top 100 largest files across all hosts).
FileLister offers a practical, scriptable way to inventory files and produce reports suited for automation, auditing, and data management. Its command-line nature makes it adaptable to many environments while keeping resource usage low.
Leave a Reply