Hi my friend, we are looking for annotators in estimating the utility of steps in reasoning problems from responses in LLMs and carefully estimate the human efforts.
Who are we looking?
- People with STEM background that could identify the correctness of reasoning step generated by large language models (LLMs). People from diverse backgrounds are also encouraged to participate.
What do you need to do?
- Step 1: Join the team and login the website with the API key received.
- Step 2: Choose the dataset you need to annotate.

- Step 3: Read the Instruction carefully.
- Step 4: Read the demonstration problem carefully to get familiar with the labeling rubric.
- You can use the discussion panel to start discussion for unclear points.

-
Step 5: Finish the Examination, you would need full marks on the examination to start the real annotation.
- You can always revisit the demonstration problem using the demonstration button
- If you have previously passed the examination, please directly click the “Submit Examination” button once revisited.

-
Step 6: Start real annotation (the same format as the examination).
Notice
- You can use the “apply” button to quickly extend the label of current step to all the following steps.

- You can always check the annotation progress on the dataset page.

- If there is a formatting issue that impacts the annotation (e.g. uninformative step splitting problem, null steps). Feel free to use the red reporting button.