Commit 4785dfe0 authored by Alexey Spiridonov's avatar Alexey Spiridonov Committed by Facebook Github Bot 1

DynamicParser to reliably and reversibly convert JSON to structs

Summary:We have a bunch of code that manually parses `folly::dynamic`s into program structures. I can be quite hard to get this parsing to be good, user-friendly, and concise. This diff was primarily motivated by the mass of JSON-parsing done by Bistro, but this pattern recurs in other pieces of internal code that parse dynamics.

This diff **not** meant to replace using Thrift structs with Thrift's JSON serialization / deserialization. When all you have to deal with is correct, structured plain-old-data objects produced by another program -- **not** manually entered user input -- Thrift + JSON is perfect. Go use that.

However, sometimes you need to parse human-edited configuration. The input JSON might have complex semantics, and require validation beyond type-checking. The UI for editing your configs can easily enforce correct JSON syntax. Perhaps, you can use `folly/experimental/JSONSchema.h` to have your edit UI provide type correctness. Despite all this, people can still make semantic errors, and those can be impossible to detect until you interpret the config at runtime. Also, as your system evolves, sometimes you need to break semantic backwards-compatibility for the sake of moving forward ? thus making previously valid configurations invalid, and requiring them to be fixed up manually.

So, people end up needing to write manual parsers for `dynamic`s. These all have very similar recurring issues:

 - Verbose: to get an int field out of an object, typical code: (i) tests if the field is present, (ii) checks if the field is an integer, (iii) extracts the integer. Sometimes, you also want to handle exceptions, and compose helpful error messages. This makes the code far longer than its intent, and encourages people to write bad parsers.

 - Unsystematic: sometimes, we use `if (const auto* p = dyn_obj.get_ptr("key")) { ... }`, other times we use `dyn_obj.getDefault()` or `if (dyn_obj.count())`, and so on. The patterns differ subtly in meaning. Exceptions sometimes get thrown, leading to error messages that cannot be understood by the user.

 - Imperative parses: a typical parse proceeds step by step, and throws at the earliest error. This is bad because (i) errors have to be fixed one-by-one, instead of getting a full list upfront, (ii) even if 99% of the config is parseable, the imperative code has no way of recording the information it would have parsed after the first error.

 `DynamicParser` fixes all of the above, and makes your parsing so clean that you might not even bother with `JSONSchema` as your first line of defense -- type-coercing, type-enforcing, friendly-error-generating C++ ends up being more concise. Besides all the sweet syntax sugar, `DynamicParser` lets you parse **all** the valid data in your config, while recording  *all* the errors in a way that does not lose the original, buggy config. This means your code can parse a config that has errors, and still be able to meaningfully export it back to JSON. As a result, stateless clients (think REST APIs) can provide a far better user experience than just discarding the user?s input, and returning a cryptic error message.

For the details, read the docs (and see the example) in `DynamicParser.h`. Here are the principles of `DynamicParser::RECORD` mode in a nutshell:
 - Pre-populate your program struct with meaningful defaults **before** you parse.
 - Any config part that fails to parse will keep the default.
 - Any config part that parses successfully will get to update the program struct.
 - Any errors will be recorded with a helpful error message, the portion of the dynamic that caused the error, and the path through the dynamic to that portion.

 I ported Bistro to use this in D3136954. I looked at using this for JSONSchema's parsing of schemas, but it seemed like too much trouble for the gain, since it would require major surgery on the code.

Reviewed By: yfeldblum

Differential Revision: D2906819

fb-gh-sync-id: aa997b0399b17725f38712111715191ffe7f27aa
fbshipit-source-id: aa997b0399b17725f38712111715191ffe7f27aa
parent 2a196d5a
......@@ -86,6 +86,8 @@ nobase_follyinclude_HEADERS = \
experimental/AutoTimer.h \
experimental/Bits.h \
experimental/BitVectorCoding.h \
experimental/DynamicParser.h \
experimental/DynamicParser-inl.h \
experimental/ExecutionObserver.h \
experimental/EliasFanoCoding.h \
experimental/EventCount.h \
......@@ -448,6 +450,7 @@ libfolly_la_SOURCES = \
Version.cpp \
experimental/bser/Dump.cpp \
experimental/bser/Load.cpp \
experimental/DynamicParser.cpp \
experimental/fibers/Baton.cpp \
experimental/fibers/Fiber.cpp \
experimental/fibers/FiberManager.cpp \
......
This diff is collapsed.
/*
* Copyright 2016 Facebook, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (c) 2015, Facebook, Inc.
* All rights reserved.
*
* This source code is licensed under the BSD-style license found in the
* LICENSE file in the root directory of this source tree. An additional grant
* of patent rights can be found in the PATENTS file in the same directory.
*
*/
#include <folly/experimental/DynamicParser.h>
#include <folly/CppAttributes.h>
#include <folly/Optional.h>
namespace folly {
namespace {
folly::dynamic& insertAtKey(
folly::dynamic* d, bool allow_non_string_keys, const folly::dynamic& key) {
if (key.isString()) {
return (*d)[key];
// folly::dynamic allows non-null scalars for keys.
} else if (key.isNumber() || key.isBool()) {
return allow_non_string_keys ? (*d)[key] : (*d)[key.asString()];
}
// One cause might be oddness like p.optional(dynamic::array(...), ...);
throw DynamicParserLogicError(
"Unsupported key type ", key.typeName(), " of ", detail::toPseudoJson(key)
);
}
} // anonymous namespace
void DynamicParser::reportError(
const folly::dynamic* lookup_k,
const std::exception& ex) {
// If descendants of this item, or other keys on it, already reported an
// error, the error object would already exist.
auto& e = stack_.errors(allowNonStringKeyErrors_);
// Save the original, unparseable value of the item causing the error.
//
// value() can throw here, but if it does, it is due to programmer error,
// so we don't want to report it as a parse error anyway.
if (auto* e_val_ptr = e.get_ptr("value")) {
// Failing to access distinct keys on the same value can generate
// multiple errors, but the value should remain the same.
if (*e_val_ptr != value()) {
throw DynamicParserLogicError(
"Overwriting value: ", detail::toPseudoJson(*e_val_ptr), " with ",
detail::toPseudoJson(value()), " for error ", ex.what()
);
}
} else {
// The e["value"].isNull() trick cannot be used because value().type()
// *can* be folly::dynamic::Type::NULLT, so we must hash again.
e["value"] = value();
}
// Differentiate between "parsing value" and "looking up key" errors.
auto& e_msg = [&]() -> folly::dynamic& {
if (lookup_k == nullptr) { // {object,array}Items, or post-key-lookup
return e["error"];
}
// Multiple key lookups can report errors on the same collection.
auto& key_errors = e["key_errors"];
if (key_errors.isNull()) {
// Treat arrays as integer-keyed objects.
key_errors = folly::dynamic::object();
}
return insertAtKey(&key_errors, allowNonStringKeyErrors_, *lookup_k);
}();
if (!e_msg.isNull()) {
throw DynamicParserLogicError(
"Overwriting error: ", detail::toPseudoJson(e_msg), " with: ",
ex.what()
);
}
e_msg = ex.what();
switch (onError_) {
case OnError::RECORD:
break; // Continue parsing
case OnError::THROW:
stack_.throwErrors(); // Package releaseErrors() into an exception.
LOG(FATAL) << "Not reached"; // silence lint false positive
default:
LOG(FATAL) << "Bad onError_: " << static_cast<int>(onError_);
}
}
void DynamicParser::ParserStack::Pop::operator()() noexcept {
stackPtr_->key_ = key_;
stackPtr_->value_ = value_;
if (stackPtr_->unmaterializedSubErrorKeys_.empty()) {
// There should be the current error, and the root.
CHECK_GE(stackPtr_->subErrors_.size(), 2)
<< "Internal bug: out of suberrors";
stackPtr_->subErrors_.pop_back();
} else {
// Errors were never materialized for this subtree, so errors_ only has
// ancestors of the item being processed.
stackPtr_->unmaterializedSubErrorKeys_.pop_back();
CHECK(!stackPtr_->subErrors_.empty()) << "Internal bug: out of suberrors";
}
}
folly::ScopeGuardImpl<DynamicParser::ParserStack::Pop>
DynamicParser::ParserStack::push(
const folly::dynamic& k,
const folly::dynamic& v) noexcept {
// Save the previous state of the parser.
folly::ScopeGuardImpl<DynamicParser::ParserStack::Pop> guard(
DynamicParser::ParserStack::Pop(this)
);
key_ = &k;
value_ = &v;
// We create errors_ sub-objects lazily to keep the result small.
unmaterializedSubErrorKeys_.emplace_back(key_);
return guard;
}
// `noexcept` because if the materialization loop threw, we'd end up with
// more suberrors than we started with.
folly::dynamic& DynamicParser::ParserStack::errors(
bool allow_non_string_keys) noexcept {
// Materialize the lazy "key + parent's type" error objects we'll need.
CHECK(!subErrors_.empty()) << "Internal bug: out of suberrors";
for (const auto& suberror_key : unmaterializedSubErrorKeys_) {
auto& nested = (*subErrors_.back())["nested"];
if (nested.isNull()) {
nested = folly::dynamic::object();
}
// Find, or insert a dummy entry for the current key
auto& my_errors =
insertAtKey(&nested, allow_non_string_keys, *suberror_key);
if (my_errors.isNull()) {
my_errors = folly::dynamic::object();
}
subErrors_.emplace_back(&my_errors);
}
unmaterializedSubErrorKeys_.clear();
return *subErrors_.back();
}
folly::dynamic DynamicParser::ParserStack::releaseErrors() {
if (
key_ || unmaterializedSubErrorKeys_.size() != 0 || subErrors_.size() != 1
) {
throw DynamicParserLogicError(
"Do not releaseErrors() while parsing: ", key_ != nullptr, " / ",
unmaterializedSubErrorKeys_.size(), " / ", subErrors_.size()
);
}
return releaseErrorsImpl();
}
void DynamicParser::ParserStack::throwErrors() {
throw DynamicParserParseError(releaseErrorsImpl());
}
folly::dynamic DynamicParser::ParserStack::releaseErrorsImpl() {
if (errors_.isNull()) {
throw DynamicParserLogicError("Do not releaseErrors() twice");
}
auto errors = std::move(errors_);
errors_ = nullptr; // Prevent a second release.
value_ = nullptr; // Break attempts to parse again.
return errors;
}
namespace detail {
std::string toPseudoJson(const folly::dynamic& d) {
std::stringstream ss;
ss << d;
return ss.str();
}
} // namespace detail
} // namespace folly
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment