Skip to content

Usage Guide

This guide explains how to use evid to manage PDF documents through its PyQt6-based GUI or command-line interface (CLI).

Configuration

You can configure the default database location by creating a ~/.evidrc file in YAML format. Example:

default_dir: ~/my_custom_evid_db

If no .evidrc file is found, the default database location is ~/Documents/evid.

Launching the Application

Launch the GUI with:

poetry run evid gui

This opens the GUI with two tabs: Add and Browse.

Alternatively, use the CLI to view available commands:

poetry run evid --help

Listing Datasets

To see all available datasets, use the CLI:

poetry run evid list

This displays a numbered list of existing datasets in the default database location.

Managing Datasets

Creating Datasets

Via GUI

In the Add tab, enter a new dataset name in the "New Dataset" field and click Create. If the dataset already exists, a warning will appear.

Via CLI

Create a new dataset with:

poetry run evid set create <dataset_name>

This creates a new dataset directory in the default database location. If the dataset already exists, the command will fail with an error message.

Tracking Datasets with Git

To enable Git version control for a dataset, use:

poetry run evid set track [<dataset_name>]

If no dataset name is provided, the CLI will prompt you to select an existing dataset. This command initializes a Git repository in the dataset's top-level directory with a .gitignore file that tracks only label.csv, label_table.bib, *.tex, info.yml, and *.pdf files, ignoring others (e.g., LaTeX byproducts like label.pdf). If the dataset is already a Git repository, the command will fail with an error message.

Adding Documents

Via GUI

Use the Add tab to log PDFs with metadata.

  1. Select or Create a Dataset:
  2. Choose an existing dataset from the dropdown or enter a new dataset name and click Create.
  3. Datasets are folders in the default database directory where documents are stored.

  4. Add a PDF:

  5. Click Browse to select a local PDF or enter a URL and click Quick Add URL.
  6. The GUI auto-fills metadata (title, authors, dates) from the PDF if possible.

  7. Fill Metadata:

  8. Edit fields like Title, Authors, Tags, Dates, Label, and URL.
  9. Preview the metadata in the preview pane.
  10. Note: Metadata fields like title and authors are stored as plain text in info.yml for readability, with Danish characters (æ, ø, å) preserved.

  11. Save Document:

  12. Click Add to save the PDF and metadata to a unique folder in the selected dataset.
  13. Metadata is stored in an info.yml file alongside the PDF.

Via CLI

Add PDFs using the CLI with the following command:

  • Add a PDF (from URL or local file):
poetry run evid add <url_or_path> [--dataset <dataset>]

If --dataset is not provided, the CLI prompts you to select an existing dataset. If the specified dataset does not exist, the command will fail with an error. The add command automatically detects whether the input is a URL (starting with http:// or https://) or a local file path. After adding, it prints the metadata to stdout and prompts to open the info.yml file in Visual Studio Code.

Browsing Documents

Use the Browse tab in the GUI to view and manage existing documents.

  1. Load a Dataset:
  2. Select a dataset from the dropdown and click Load.
  3. The table displays metadata (Author, Title, Date, File Name, UUID) for each document entry.

  4. View Details:

  5. Select a row and click Open Dir to open the document folder in Visual Studio Code.

  6. Create Labels:

  7. Select one or more entries (hold Ctrl or Shift to select multiple) and click Label Selected to generate LaTeX documents (label.tex) for each selected PDF.
  8. Each LaTeX file opens in a separate Visual Studio Code instance, allowing parallel editing without freezing the main application.
  9. Edit each LaTeX file in Visual Studio Code (Ctrl+L inserts a \lb snippet) to add labels.
  10. Save the file to generate a label.csv, which is then converted to label_table.bib.

  11. Generate BibTeX:

  12. Select one or more entries and click Generate BibTeX to convert existing label.csv files to label_table.bib for each selected PDF.
  13. This is useful for updating BibTeX files after manual edits to label.csv or LaTeX files.

  14. Generate Responses:

  15. Select an entry and click Rebut to create a response document (rebut.tex) using the BibTeX file.
  16. The response lists citations with notes, formatted in LaTeX, suitable for LLM integration.

Labelling

  • When selecting one or more documents and pressing the "Label Selected" button, a LaTeX document is generated for each PDF containing the extracted text. The LaTeX documents are saved in the same folder as their respective PDFs and opened in Visual Studio Code for editing.

  • The user can label using their text editor inside the LaTeX document. For VS Code, the following keybinding allows labelling by selecting text and pressing ctrl+l:

    [
        {
            "key": "ctrl+l",
            "command": "editor.action.insertSnippet",
            "when": "editorTextFocus && editorLangId == 'latex'",
            "args": {
                "snippet": "\\lb{$1}{${TM_SELECTED_TEXT}}{$2}"
            }
        }
    ]
    
    The first field is the label attached (generally a short descriptive string), the second field is the text that was highlighted, and the third field is a comment about the label (for possible use by an LLM).

  • The header in each LaTeX document causes LaTeX compilation to write the labels to label.csv.

  • The label.csv file can be converted to label_table.bib by clicking "Generate BibTeX" in the Browse tab or upon exiting the label editor (i.e., closing VS Code after editing label.tex).
  • The label_table.bib files for each PDF can be concatenated and used to formulate a rebuttal.
  • Note that the first 4 characters of the PDF's UUID are used as a prefix for the BibTeX label, ensuring labels only need to be unique within the same PDF, not across all PDFs in the dataset.

Tips

  • Date Extraction: evid automatically extracts dates from PDFs in various formats (e.g., "12/01/2023", "15. januar 2024").
  • LaTeX Setup: Ensure a LaTeX distribution is installed for label and response generation.
  • VS Code Integration: Use the provided .vscode/keybindings.json for a Ctrl+L shortcut in LaTeX files.
  • Git Tracking: After tracking a dataset with evid set track, use standard Git commands (git add, git commit, etc.) to manage changes to tracked files (label.csv, label_table.bib, *.tex, info.yml, *.pdf).

For development details, see the Development section.